[Bioperl-l] GFF3

Allen Day allenday at ucla.edu
Tue Jan 18 00:27:21 EST 2005


Hi Rob,

I looked at FeatureIO::gff and merged in your changes with some
modifications.

I also added a next_seq() method to FeatureIO::gff that is activated when
a /^##FASTA/ or /^>/ line is encountered.  Functionality delegates to
Bio::SeqIO's fasta parser.  I think this obviates the need for
Bio::SeqIO::gff.

Please update your repository and have a look at t/FeatureIO.t (unit test
for FeatureIO, also added).

-Allen


On Sat, 15 Jan 2005, Rob Edwards wrote:

> Because I need it for some things that I am doing, I have worked quite 
> a bit on the GFF3 parser Bio::FeatureIO::gff. Several people have 
> written this module, I have just made some cosmetic changes:
> 
> I have improved the validation processes that are applied as a gff3 
> file is parsed, and the module should now validate essentially 
> everything in the file except alignments. Validation is optional and is 
> based on the specification described at : 
> http://song.sourceforge.net/gff3.shtml
> 
> For clarification and edification I have created a couple of tables 
> describing the module and the validation that is applied to GFF3 files, 
> which you can see online: http://www.salmonella.org/bioperl/gff3.html
> 
> I also wrote a Bio::SeqIO::gff module. Since gff3 files can hold 
> sequences, it seems that you'd want to be able to call the next_seq 
> methods, and therefore SeqIO is more appropriate than FeatureIO for 
> those aspects. Currently the SeqIO module uses the FeatureIO module for 
> parsing the file, it just reorganizes things.
> 
> This provides two different interfaces for getting objects out of GFF3 
> files:
> 	Bio::FeatureIO::gff will return Bio::SeqFeature::Annotated objects 
> representing the annotations.
> 	Bio::SeqIO::gff will return Bio::Seq objects representing the 
> sequences with all the annotations attached.
> 
> The other difference between the two is that the former passes out the 
> objects as they are read, but the latter has to read the whole file to 
> get the annotations and the sequences.
> 
> At the moment I focussed on reading GFF3 files.
> 
> I have not committed these to cvs yet, pending comments from others. I 
> have some specific questions:
> 	Should I wait until after 1.5 is out?
> 	Is two separate modules really the right way to go about this?
> 	What about other GFF modules (like Bio::Tools::GFF)?
> 	Could someone give the modules a workout and let me know about bugs? I 
> am sure there are many.
> 
> I have posted these modules online via anonymous ftp at 
> ftp://ftp.salmonella.org/rob/bioperl/GFF_modules.tgz
> Take a look and let me know what you do and don't like!
> 
> Rob
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> 


More information about the Bioperl-l mailing list