[Bioperl-l] Re: [SO-devel] GFF3 - Bioperl - SO

Steffen Grossmann grossman at molgen.mpg.de
Thu Oct 14 11:50:41 EDT 2004


Dear Allen,

I meanwhile understood that Bio::Tools::GFF in connection with 
Bio::SeqFeature::Tools::IDHandler is doing a lot of the stuff I'd like 
to have. Somehow, Bio::FeatureIO::gff seems to be a parallel development 
to the first alternative. I don't know in how far there are plans to 
bring the two approaches together, but at the moment that seems to be 
more complicated than just bringing one approach to an acceptable state. 
Since the Bio::Tools::GFF approach seems to be ahead, I will focus on it 
at the moment.

Nevertheless, I add a small patch which fixes the problem with making 
features from the lines after the ##FASTA directive.

--- patch for Bio::FeatureIO::gff.pm starts here
251a252,255
 >     while( my $gff_string = $self->_readline() ) {
 >       # we just consume the rest of the file...
 >     }
 >
--- patch ends here

Maybe you can add it, because I am not a bioperl developer yet...

Steffen

Allen Day wrote:

>We should keep this onlist, others may be interested as well.
>
>On Tue, 12 Oct 2004, Steffen Grossmann wrote:
>
>  
>
>>Dear Allen,
>>
>>I just had a look at your module and I think its a good start. I
>>immediately have a bunch of ideas how to extend it to get where I think
>>one should get to. So, I will accept your offer to work on the module
>>and apply for a bioperl developer's account.
>>
>>So here are the first proposals:
>>1) Very easy: The 'official' GFF3 specification (you know where, don't 
>>you?) states that after the ##FASTA directive there are no more 
>>    
>>
>
>Yes, I've written bits of it.
>
>  
>
>>annotations to follow. So, although the ##FASTA directive is not yet 
>>implemented, you should make sure that the rest of the file is not 
>>parsed. At the moment you get back a nonsense-feature for every line 
>>after the ##FASTA line.
>>    
>>
>
>Good.  Please add.
>
>  
>
>>2) Actually, it would be nice to be able to retrieve hierarchically 
>>nested collections of features from a GFF-file, where the hierarchy 
>>comes from the 'Parent' tag. The concept of parsing a GFF-file 
>>line-by-line, is somehow not compatible with this, because it naturally 
>>only can produce flat arrays of SeqFeatures. Possible workarounds are to 
>>provide some 'unflattening'-mechanism (but where should it naturally 
>>go?), or methods which directly retrieve an array holding the nested 
>>SeqFeatures (which would be an extension to the standard 'next_feature' 
>>approach). I strongly prefer the last option.
>>    
>>
>
>You might want to take advantage of the ### directive here.  Parse
>everything up to it and cache, then start returning hierarchical features
>from the cache.  When the cache empties and the filehandle is still
>returning lines, fill the cache again.  Rinse, repeat.
>
>  
>
>>3) Instead of requiring exact compatibility with SOFA, one could also 
>>simply complain about non SOFA-compatible terms. Additionally, if one 
>>    
>>
>
>No, this violates the spec.  If you want to do this you can give a
>##Ontology directive to describe where the new terms came from.  Feature
>type terms need to be SOFA extensions.
>
>  
>
>>would have a mechanism to map non-SOFA terms to SOFA terms, the module 
>>could be used to create SOFA compatible versions of existing GFF files 
>>(which would be a great tool, I think!).
>>    
>>
>
>You can do this via a callback mechanism to allow custom typemappings.  
>Good idea.
>
>-Allen
>
>  
>
>>These are some thoughts I have. I am not sure whether a discussion about 
>>the future development of the module, should be conducted within the 
>>Bioperl-l list, or whether we should do it privately and then only post 
>>our proposals once we agree...
>>
>>Greetings!
>>
>>Steffen
>>
>>Allen Day wrote:
>>
>>    
>>
>>>Look at Bio::FeatureIO::gff in bioperl-live.  It currently supports
>>>lookup/validation of ontology terms via Bio::Ontology::OntologyStore, but
>>>doesn't do and cardinality or type/relation enforcement which you seem to
>>>be alluding to below.
>>>
>>>I'd be very pleased if you want to work on this too.  Or anyone else on
>>>these lists, for that matter :-).
>>>
>>>-Allen
>>>
>>>
>>>On Mon, 11 Oct 2004, Steffen Grossmann wrote:
>>>
>>> 
>>>
>>>      
>>>
>>>>Dear all,
>>>>
>>>>I fancy very much the approach taken by SO(FA) 
>>>>(http://song.sourceforge.net/) to standardize the vocabulary used for 
>>>>sequence annotation. Also, the GFF3 format is a nice way to represent 
>>>>SO-compatible annotations and it would be a great thing to have this all 
>>>>working seamlessly with bioperl.
>>>>
>>>>A first step towards such a seamless integration into bioperl would be a 
>>>>parser which is able to read/write hierarchically nested feature 
>>>>collections from/to GFF3 files. Such a parser should make use of the 
>>>>GFF3 specific 'ID' and 'Parent' tags.
>>>>
>>>>Of course, I know about the 'Bio::Tools::GFF' and 
>>>>'Bio::SeqFeature::Tools' modules, where some related stuff can be found. 
>>>>The problem is that the 'Bio::Tools::GFF' module doesn't respect the 
>>>>'Parent' and 'ID' tag structure and grouping in the 'Unflattener' 
>>>>approach is also done conceptually different.
>>>>
>>>>Does anybody know about whether there is someone working on such a 
>>>>project? Or, if there is no such project, is someone interested in 
>>>>joining to start it?
>>>>
>>>>Thanks in advance for any response!
>>>>
>>>>Steffen
>>>>
>>>>
>>>>   
>>>>
>>>>        
>>>>
>>>_______________________________________________
>>>Bioperl-l mailing list
>>>Bioperl-l at portal.open-bio.org
>>>http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> 
>>>
>>>      
>>>
>>
>>    
>>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at portal.open-bio.org
>http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>  
>


-- 
%---------------------------------------------%
%            Steffen Grossmann                %
%                                             %
% Max Planck Institute for Molecular Genetics %
%      Computational Molecular Biology        %
%---------------------------------------------%
%              Ihnestrasse 73                 %
%               14195 Berlin                  %
%                 Germany                     %
%---------------------------------------------%
%         Tel: (++49 +30) 8413-1167           %
%         Fax: (++49 +30) 8413-1152           %
%---------------------------------------------%




More information about the Bioperl-l mailing list