[Bioperl-l] reading and writing GFF3

Robert Buels rmb32 at cornell.edu
Sat Jun 17 18:36:28 UTC 2006


I'd love to help more with this, since with the new tomato genome coming 
in my job is going to be working more and more with annotations, but I'm 
not a core person and I can't go to the meeting in NC.  In the interests 
of getting my job done right now, I've implemented a -feature argument 
to Bio::SeqFeature::Annotated's constructor, which calls uses a method 
from_feature() I added.  If you guys want it, it's attached to bug 2026.

 From the perspective of a casual bioperl user, anything you guys can do 
to make the handling of features and annotations less fragmented and 
more robust would be wonderful.  I'd be happy to help with 
implementation if one of you grizzled veterans would give me marching 
orders. :-)

Rob

Hilmar Lapp wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> You don't need a new method for this. Instead, support a -feature 
> argument.
>
>     my $bsfa = Bio::SeqFeature::Annotated->new(-feature => $feature);
>
> This should work for any instance of Bio::SeqFeatureI. If it is a 
> B::SF::Annotated already it is obviously just a deep copy (if copy is 
> desired - could be another parameter). Otherwise more will be involved.
>
> Alternatively, and possibly better, is to write a specialized 
> SeqFeatureI factory (that would implement 
> Bio::Factory::ObjectFactoryI) and then delegate this job to it:
>
>     my $feat_factory = Bio::SeqFeature::TypedFeatureFactory->new(
>         -type_ontology => $sequence_ontology,
>         -source_ontology => $feature_source_ontology,
>         -unflatten => 1);
>     my $bsfa = $feat_factory->create_object({-feature => $feature});
>
> This is preferable because it separates business logic that isn't 
> necessarily related into defined units. I.e., the logic necessary to 
> convert an ordinary feature into a strongly typed one is different 
> from how to represent a strongly typed feature. IMHO anyway ...
>
> Also, don't dismiss the Bio::SeqFeature::TypedSeqFeatureI that Ewan 
> started as the result of a discussion thread earlier this (or last?) 
> year. Bio::SeqFeature::Annotated as such may as well be obsoleted, 
> though not in concept.
>
> Maybe we need to get together again and thrash out a strategy; or a 
> BOF at the GMOD meeting? I feel this does need a core group of people 
> who care, hash out a strategy that will also solve the backwards 
> compatibility problem with the current Bio::SeqFeatureI 
> state-of-limbo, and allow us to implement the decisions with a few 
> people in a concentrated effort. This will then also remove the only 
> real large stumbling block towards a 1.6 release.
>
> Maybe we should think about a little pre-GMOD hackathon to clear up 
> this mess? Scott, you'll be there a day early? I'll be already back 
> and Jason I believe will still be in town, although he may have other 
> commitments already. Nonetheless, it shouldn't really take that much 
> but rather dedicated time, a whiteboard, and a few people who care 
> thrashing this out and then do it.
>
> Thoughts?
>
>     -hilmar
>
> On Jun 16, 2006, at 11:56 PM, Scott Cain wrote:
>
>> Rob,
>>
>> I came to the same conclusion as well; I wrote my response as I was
>> heading out the door and while I was running errands, I realized the
>> right thing to do is to write a Bio::SeqFeature::Annotated method called
>> new_from_object, whose usage would be:
>>
>>   my $my_BSFA = Bio::SeqFeature::Annotated->new_from_object($my_BSFI, 
>> %args);
>>
>> where you would give it a Bio::SeqFeatureI compliant object and try to
>> create a BSFA like use suggested below.  You could allow passing in args
>> to control how different things are handled, like mapping non-SO types
>> to SO types.  I'll think about this over the weekend and let you know if
>> brilliance strikes me.
>>
>> Scott
>>
>>
>> On Fri, 2006-06-16 at 13:31 -0700, Robert Buels wrote:
>>> Rather than cobble together some ad-hoc solution, I would be interested
>>> in working on a good solution to this problem, because it seems like
>>> it's just going to get more common as more people start wanting to 
>>> write
>>> GFF3.  What about some code in whatever customarily makes these objects
>>> (probably BSF::Annotated's new() method?) that could take another type
>>> of Feature object and attempt to shoehorn its data into a new
>>> BSF::Annotated?  If it failed (because the type isn't in SO or
>>> whatever), it could throw() some informative error message.
>>>
>>> Then, people could write straightforward code something like:
>>>
>>> while(my $oldstylefeature = $features_in->next_feature) {
>>>     $oldstylefeature->primary_tag('something_that_is_in_so');
>>>     $oldstylefeature->something_else('some other something that 
>>> needs to
>>> be changed for compliance');
>>>     my $newfeature = Bio::SeqFeature::Annotated->new($oldstylefeature);
>>>     $gff3_out->write_feature($newfeature);
>>> }
>>>
>>> Does that sound like a good idea?  I'd be more than willing to 
>>> implement
>>> this, since I'm going to need to do this sort of thing with many more
>>> things than just RepeatMasker.
>>>
>>> Rob
>>>
>>> Scott Cain wrote:
>>>> Um, yeah, good question.  The reason I didn't answer you when you 
>>>> wrote
>>>> before is that I was hoping for divine inspiration for an answer 
>>>> (or for
>>>> somebody else to answer, which would have been really great :-)
>>>>
>>>> The short answer (and easy one for me to type) is that you will 
>>>> probably
>>>> need an ad hoc method to do it, which is the same thing I do when I 
>>>> need
>>>> to convert gff2 to gff3, to make sure the things I need mapped get
>>>> mapped the 'right' way (that is, the way I want them to go).  I don't
>>>> have any sample code that does this, but if you want to start 
>>>> working up
>>>> an ad hoc method, I will certainly try to help you as much as I can.
>>>>
>>>> Scott
>>>>
>>>>
>>>> On Fri, 2006-06-16 at 12:34 -0700, Robert Buels wrote:
>>>>
>>>>> So about that converting ye olde feature objects into
>>>>> Bio::SeqFeature::Annotated objects.  How do I do it?
>>>>>
>>>>>
>>>>> Scott Cain wrote:
>>>>>
>>>>>> That's OK--You added a few items that should be escaped that 
>>>>>> weren't, so
>>>>>> I added those too.
>>>>>>
>>>>>> Thanks,
>>>>>> Scott
>>>>>>
>>>>>>
>>>>>> On Fri, 2006-06-16 at 12:30 -0700, Robert Buels wrote:
>>>>>>
>>>>>>
>>>>>>> Woops, I should have said something about that.  I submitted it 
>>>>>>> before
>>>>>>> I saw that Scott had already done the escaping in CVS.
>>>>>>>
>>>>>>> Chris Fields wrote:
>>>>>>>
>>>>>>>
>>>>>>>> Scott,
>>>>>>>>
>>>>>>>> Looks like Robert also submitted a bug report related to this 
>>>>>>>> as well=
>>>>>>>> ------------------------------------------------------------------------ 
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Bioperl-l mailing list
>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>> -------------------------------------------------------------------------- 
>>
>> Scott Cain, Ph. D.                                         cain at cshl.edu
>> GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
>> Cold Spring Harbor Laboratory
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> - --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.2.2 (Darwin)
>
> iD8DBQFElCvAuV6N2JxL7qsRAhw1AJ9SaMR4tMFZCTrzimnEnDdjKqbPGgCgk38V
> ImoAXD/jrbF0gXzSr2CY4tQ=
> =XfDq
> -----END PGP SIGNATURE-----

-- 
Robert Buels
SGN Bioinformatics Analyst
252A Emerson Hall, Cornell University
Ithaca, NY  14853
Tel: 503-889-8539
rmb32 at cornell.edu
http://www.sgn.cornell.edu




More information about the Bioperl-l mailing list