[Bioperl-l] Re: [SO-devel] GFF3 - Bioperl - SO

Allen Day allenday at ucla.edu
Thu Oct 14 13:05:09 EDT 2004


On Thu, 14 Oct 2004, Steffen Grossmann wrote:

> Dear Allen,
> 
> I meanwhile understood that Bio::Tools::GFF in connection with 
> Bio::SeqFeature::Tools::IDHandler is doing a lot of the stuff I'd like 
> to have. Somehow, Bio::FeatureIO::gff seems to be a parallel development 
> to the first alternative. I don't know in how far there are plans to 

I don't see from a quick look at the source how Bio::Tools::GFF is related
to Bio::SeqFeature::Tools::IDHandler.  There's nothing preventing use of
the IDHandler in Bio::FeatureIO::gff, in fact it sounds like what you've
proposed to add so part of your work is already done.

> bring the two approaches together, but at the moment that seems to be 
> more complicated than just bringing one approach to an acceptable state. 
> Since the Bio::Tools::GFF approach seems to be ahead, I will focus on it 
> at the moment.

I would advise against adding more features into Bio::Tools::GFF.  I can't
speak for all others, but my future development will not use it, and I'm
in the process of converting code which does use it to depend on
Bio::FeatureIO::gff.

-Allen

> 
> Nevertheless, I add a small patch which fixes the problem with making 
> features from the lines after the ##FASTA directive.
> 
> --- patch for Bio::FeatureIO::gff.pm starts here
> 251a252,255
>  >     while( my $gff_string = $self->_readline() ) {
>  >       # we just consume the rest of the file...
>  >     }
>  >
> --- patch ends here
> 
> Maybe you can add it, because I am not a bioperl developer yet...
> 
> Steffen
> 
> Allen Day wrote:
> 
> >We should keep this onlist, others may be interested as well.
> >
> >On Tue, 12 Oct 2004, Steffen Grossmann wrote:
> >
> >  
> >
> >>Dear Allen,
> >>
> >>I just had a look at your module and I think its a good start. I
> >>immediately have a bunch of ideas how to extend it to get where I think
> >>one should get to. So, I will accept your offer to work on the module
> >>and apply for a bioperl developer's account.
> >>
> >>So here are the first proposals:
> >>1) Very easy: The 'official' GFF3 specification (you know where, don't 
> >>you?) states that after the ##FASTA directive there are no more 
> >>    
> >>
> >
> >Yes, I've written bits of it.
> >
> >  
> >
> >>annotations to follow. So, although the ##FASTA directive is not yet 
> >>implemented, you should make sure that the rest of the file is not 
> >>parsed. At the moment you get back a nonsense-feature for every line 
> >>after the ##FASTA line.
> >>    
> >>
> >
> >Good.  Please add.
> >
> >  
> >
> >>2) Actually, it would be nice to be able to retrieve hierarchically 
> >>nested collections of features from a GFF-file, where the hierarchy 
> >>comes from the 'Parent' tag. The concept of parsing a GFF-file 
> >>line-by-line, is somehow not compatible with this, because it naturally 
> >>only can produce flat arrays of SeqFeatures. Possible workarounds are to 
> >>provide some 'unflattening'-mechanism (but where should it naturally 
> >>go?), or methods which directly retrieve an array holding the nested 
> >>SeqFeatures (which would be an extension to the standard 'next_feature' 
> >>approach). I strongly prefer the last option.
> >>    
> >>
> >
> >You might want to take advantage of the ### directive here.  Parse
> >everything up to it and cache, then start returning hierarchical features
> >from the cache.  When the cache empties and the filehandle is still
> >returning lines, fill the cache again.  Rinse, repeat.
> >
> >  
> >
> >>3) Instead of requiring exact compatibility with SOFA, one could also 
> >>simply complain about non SOFA-compatible terms. Additionally, if one 
> >>    
> >>
> >
> >No, this violates the spec.  If you want to do this you can give a
> >##Ontology directive to describe where the new terms came from.  Feature
> >type terms need to be SOFA extensions.
> >
> >  
> >
> >>would have a mechanism to map non-SOFA terms to SOFA terms, the module 
> >>could be used to create SOFA compatible versions of existing GFF files 
> >>(which would be a great tool, I think!).
> >>    
> >>
> >
> >You can do this via a callback mechanism to allow custom typemappings.  
> >Good idea.
> >
> >-Allen
> >
> >  
> >
> >>These are some thoughts I have. I am not sure whether a discussion about 
> >>the future development of the module, should be conducted within the 
> >>Bioperl-l list, or whether we should do it privately and then only post 
> >>our proposals once we agree...
> >>
> >>Greetings!
> >>
> >>Steffen
> >>
> >>Allen Day wrote:
> >>
> >>    
> >>
> >>>Look at Bio::FeatureIO::gff in bioperl-live.  It currently supports
> >>>lookup/validation of ontology terms via Bio::Ontology::OntologyStore, but
> >>>doesn't do and cardinality or type/relation enforcement which you seem to
> >>>be alluding to below.
> >>>
> >>>I'd be very pleased if you want to work on this too.  Or anyone else on
> >>>these lists, for that matter :-).
> >>>
> >>>-Allen
> >>>
> >>>
> >>>On Mon, 11 Oct 2004, Steffen Grossmann wrote:
> >>>
> >>> 
> >>>
> >>>      
> >>>
> >>>>Dear all,
> >>>>
> >>>>I fancy very much the approach taken by SO(FA) 
> >>>>(http://song.sourceforge.net/) to standardize the vocabulary used for 
> >>>>sequence annotation. Also, the GFF3 format is a nice way to represent 
> >>>>SO-compatible annotations and it would be a great thing to have this all 
> >>>>working seamlessly with bioperl.
> >>>>
> >>>>A first step towards such a seamless integration into bioperl would be a 
> >>>>parser which is able to read/write hierarchically nested feature 
> >>>>collections from/to GFF3 files. Such a parser should make use of the 
> >>>>GFF3 specific 'ID' and 'Parent' tags.
> >>>>
> >>>>Of course, I know about the 'Bio::Tools::GFF' and 
> >>>>'Bio::SeqFeature::Tools' modules, where some related stuff can be found. 
> >>>>The problem is that the 'Bio::Tools::GFF' module doesn't respect the 
> >>>>'Parent' and 'ID' tag structure and grouping in the 'Unflattener' 
> >>>>approach is also done conceptually different.
> >>>>
> >>>>Does anybody know about whether there is someone working on such a 
> >>>>project? Or, if there is no such project, is someone interested in 
> >>>>joining to start it?
> >>>>
> >>>>Thanks in advance for any response!
> >>>>
> >>>>Steffen
> >>>>
> >>>>
> >>>>   
> >>>>
> >>>>        
> >>>>
> >>>_______________________________________________
> >>>Bioperl-l mailing list
> >>>Bioperl-l at portal.open-bio.org
> >>>http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >>> 
> >>>
> >>>      
> >>>
> >>
> >>    
> >>
> >_______________________________________________
> >Bioperl-l mailing list
> >Bioperl-l at portal.open-bio.org
> >http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
> >  
> >
> 
> 
> 


More information about the Bioperl-l mailing list