[Bioperl-l] reading and writing GFF3

Hilmar Lapp hlapp at gmx.net
Tue Jun 20 18:24:45 UTC 2006


Yes, this is the sore problem area. AnnotatableI used to have only a  
single method (annotation()), the *_tag_* methods are new since 1.5  
(and truly a developer release feature - don't rely on them staying).

Likewise, the tag2text is an utterly ugly artifact (after all, this  
is an interface) rooted in the above addition. If we can't manage to  
remove it I'll remove my name from that module ;)

	-hilmar

On Jun 20, 2006, at 2:09 PM, Robert Buels wrote:

> Getting to know this code a little better, I notice a couple of little
> things:
>
> 1.) my patch attached to bug 2026 draws unnecessary distinctions  
> between
> feature types that use tags, and those that use annotations, since all
> features are now Bio::AnnotatableI's and the *_tags_* methods are
> implemented in AnnotatableI in terms of annotation objects now.  You
> guys should probably just ignore it, since from the sound of it you're
> going to be changing all of this around anyway.  Wish I could be there
> to help and learn more.
>
> 2.) the %tag2text hash in Bio::AnnotatableI stores a list of scalar
> accessors to use when translating Bio::Annotation::* objects to and  
> from
> scalar tags.  Seems to me, this would be much better accomplished by
> using polymorphism of some sort, probably adding a multipurpose  
> as_tag()
> accessor in Bio::AnnotationI and the objects that implement it, then
> using that in Bio::AnnotatableI instead of %tag2text.  Does this make
> sense, or am I misinterpreting something here?  Reason I've noticed  
> this
> is because I've been wrestling with how to translate
> Bio::Annotation::Target objects to and from scalar tag values, since a
> Target is being represented as an ordered list of 3 or 4 scalar  
> tags in
> old things that were designed to interoperate with gff2, and I can't
> figure out a nice way to do it using the rather inflexible %tag2text
> mechanism.
>
> Sorry to be a pain, just wanted to get that in there before you guys
> start your jam session in Durham.
>
> Rob
>
> Scott Cain wrote:
>> Hi Hilmar,
>>
>> Of course you are right--I was under the influence of a perl  
>> module that
>> I work with that does something similar, but both of your  
>> solutions are
>> better.
>>
>> I wasn't familiar with Bio::SeqFeature::TypedSeqFeatureI; I'll take a
>> look this week.
>>
>> As for next week, I plan on spending the day at NESCent on Wednesday
>> (though I haven't told Todd or Jeff that I am arriving early yet)  
>> just
>> to make sure all the details are in place.  I imagine I'll have a  
>> fair
>> amount of free time to hash this stuff out.  Anyone else who is in  
>> town
>> (that is, in Durham, NC, USA) is welcome to come draw on a white  
>> board
>> too. :-)
>>
>> Scott
>>
>>
>> On Sat, 2006-06-17 at 12:20 -0400, Hilmar Lapp wrote:
>>
>>> -----BEGIN PGP SIGNED MESSAGE-----
>>> Hash: SHA1
>>>
>>> You don't need a new method for this. Instead, support a -feature
>>> argument.
>>>
>>> 	my $bsfa = Bio::SeqFeature::Annotated->new(-feature => $feature);
>>>
>>> This should work for any instance of Bio::SeqFeatureI. If it is a
>>> B::SF::Annotated already it is obviously just a deep copy (if  
>>> copy is
>>> desired - could be another parameter). Otherwise more will be  
>>> involved.
>>>
>>> Alternatively, and possibly better, is to write a specialized
>>> SeqFeatureI factory (that would implement
>>> Bio::Factory::ObjectFactoryI) and then delegate this job to it:
>>>
>>> 	my $feat_factory = Bio::SeqFeature::TypedFeatureFactory->new(
>>> 		-type_ontology => $sequence_ontology,
>>> 		-source_ontology => $feature_source_ontology,
>>> 		-unflatten => 1);
>>> 	my $bsfa = $feat_factory->create_object({-feature => $feature});
>>>
>>> This is preferable because it separates business logic that isn't
>>> necessarily related into defined units. I.e., the logic necessary to
>>> convert an ordinary feature into a strongly typed one is different
>>> from how to represent a strongly typed feature. IMHO anyway ...
>>>
>>> Also, don't dismiss the Bio::SeqFeature::TypedSeqFeatureI that Ewan
>>> started as the result of a discussion thread earlier this (or last?)
>>> year. Bio::SeqFeature::Annotated as such may as well be obsoleted,
>>> though not in concept.
>>>
>>> Maybe we need to get together again and thrash out a strategy; or a
>>> BOF at the GMOD meeting? I feel this does need a core group of  
>>> people
>>> who care, hash out a strategy that will also solve the backwards
>>> compatibility problem with the current Bio::SeqFeatureI state-of-
>>> limbo, and allow us to implement the decisions with a few people  
>>> in a
>>> concentrated effort. This will then also remove the only real large
>>> stumbling block towards a 1.6 release.
>>>
>>> Maybe we should think about a little pre-GMOD hackathon to clear up
>>> this mess? Scott, you'll be there a day early? I'll be already back
>>> and Jason I believe will still be in town, although he may have  
>>> other
>>> commitments already. Nonetheless, it shouldn't really take that much
>>> but rather dedicated time, a whiteboard, and a few people who care
>>> thrashing this out and then do it.
>>>
>>> Thoughts?
>>>
>>> 	-hilmar
>>>
>>> On Jun 16, 2006, at 11:56 PM, Scott Cain wrote:
>>>
>>>
>>>> Rob,
>>>>
>>>> I came to the same conclusion as well; I wrote my response as I was
>>>> heading out the door and while I was running errands, I realized  
>>>> the
>>>> right thing to do is to write a Bio::SeqFeature::Annotated method
>>>> called
>>>> new_from_object, whose usage would be:
>>>>
>>>>   my $my_BSFA = Bio::SeqFeature::Annotated->new_from_object
>>>> ($my_BSFI, %args);
>>>>
>>>> where you would give it a Bio::SeqFeatureI compliant object and  
>>>> try to
>>>> create a BSFA like use suggested below.  You could allow passing in
>>>> args
>>>> to control how different things are handled, like mapping non-SO  
>>>> types
>>>> to SO types.  I'll think about this over the weekend and let you
>>>> know if
>>>> brilliance strikes me.
>>>>
>>>> Scott
>>>>
>>>>
>>>> On Fri, 2006-06-16 at 13:31 -0700, Robert Buels wrote:
>>>>
>>>>> Rather than cobble together some ad-hoc solution, I would be
>>>>> interested
>>>>> in working on a good solution to this problem, because it seems  
>>>>> like
>>>>> it's just going to get more common as more people start wanting to
>>>>> write
>>>>> GFF3.  What about some code in whatever customarily makes these
>>>>> objects
>>>>> (probably BSF::Annotated's new() method?) that could take another
>>>>> type
>>>>> of Feature object and attempt to shoehorn its data into a new
>>>>> BSF::Annotated?  If it failed (because the type isn't in SO or
>>>>> whatever), it could throw() some informative error message.
>>>>>
>>>>> Then, people could write straightforward code something like:
>>>>>
>>>>> while(my $oldstylefeature = $features_in->next_feature) {
>>>>>     $oldstylefeature->primary_tag('something_that_is_in_so');
>>>>>     $oldstylefeature->something_else('some other something that
>>>>> needs to
>>>>> be changed for compliance');
>>>>>     my $newfeature = Bio::SeqFeature::Annotated->new
>>>>> ($oldstylefeature);
>>>>>     $gff3_out->write_feature($newfeature);
>>>>> }
>>>>>
>>>>> Does that sound like a good idea?  I'd be more than willing to
>>>>> implement
>>>>> this, since I'm going to need to do this sort of thing with  
>>>>> many more
>>>>> things than just RepeatMasker.
>>>>>
>>>>> Rob
>>>>>
>>>>> Scott Cain wrote:
>>>>>
>>>>>> Um, yeah, good question.  The reason I didn't answer you when you
>>>>>> wrote
>>>>>> before is that I was hoping for divine inspiration for an answer
>>>>>> (or for
>>>>>> somebody else to answer, which would have been really great :-)
>>>>>>
>>>>>> The short answer (and easy one for me to type) is that you will
>>>>>> probably
>>>>>> need an ad hoc method to do it, which is the same thing I do when
>>>>>> I need
>>>>>> to convert gff2 to gff3, to make sure the things I need mapped  
>>>>>> get
>>>>>> mapped the 'right' way (that is, the way I want them to go).  I
>>>>>> don't
>>>>>> have any sample code that does this, but if you want to start
>>>>>> working up
>>>>>> an ad hoc method, I will certainly try to help you as much as  
>>>>>> I can.
>>>>>>
>>>>>> Scott
>>>>>>
>>>>>>
>>>>>> On Fri, 2006-06-16 at 12:34 -0700, Robert Buels wrote:
>>>>>>
>>>>>>
>>>>>>> So about that converting ye olde feature objects into
>>>>>>> Bio::SeqFeature::Annotated objects.  How do I do it?
>>>>>>>
>>>>>>>
>>>>>>> Scott Cain wrote:
>>>>>>>
>>>>>>>
>>>>>>>> That's OK--You added a few items that should be escaped that
>>>>>>>> weren't, so
>>>>>>>> I added those too.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Scott
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, 2006-06-16 at 12:30 -0700, Robert Buels wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> Woops, I should have said something about that.  I submitted
>>>>>>>>> it before
>>>>>>>>> I saw that Scott had already done the escaping in CVS.
>>>>>>>>>
>>>>>>>>> Chris Fields wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Scott,
>>>>>>>>>>
>>>>>>>>>> Looks like Robert also submitted a bug report related to this
>>>>>>>>>> as well=
>>>>>>>>>> ------------------------------------------------------------- 
>>>>>>>>>> ---
>>>>>>>>>> --------
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Bioperl-l mailing list
>>>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>>
>>>> -- 
>>>> ------------------------------------------------------------------- 
>>>> ---
>>>> --
>>>> Scott Cain, Ph. D.
>>>> cain at cshl.edu
>>>> GMOD Coordinator (http://www.gmod.org/)
>>>> 216-392-3087
>>>> Cold Spring Harbor Laboratory
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>> - --
>>> ===========================================================
>>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>>> ===========================================================
>>>
>>>
>>>
>>>
>>>
>>> -----BEGIN PGP SIGNATURE-----
>>> Version: GnuPG v1.4.2.2 (Darwin)
>>>
>>> iD8DBQFElCvAuV6N2JxL7qsRAhw1AJ9SaMR4tMFZCTrzimnEnDdjKqbPGgCgk38V
>>> ImoAXD/jrbF0gXzSr2CY4tQ=
>>> =XfDq
>>> -----END PGP SIGNATURE-----
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> -------------------------------------------------------------------- 
>>> ----
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> -- 
> Robert Buels
> SGN Bioinformatics Analyst
> 252A Emerson Hall, Cornell University
> Ithaca, NY  14853
> Tel: 503-889-8539
> rmb32 at cornell.edu
> http://www.sgn.cornell.edu
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================








More information about the Bioperl-l mailing list