[Bioperl-l] GFF3

Rob Edwards rob at salmonella.org
Sat Jan 15 18:22:23 EST 2005


Because I need it for some things that I am doing, I have worked quite 
a bit on the GFF3 parser Bio::FeatureIO::gff. Several people have 
written this module, I have just made some cosmetic changes:

I have improved the validation processes that are applied as a gff3 
file is parsed, and the module should now validate essentially 
everything in the file except alignments. Validation is optional and is 
based on the specification described at : 
http://song.sourceforge.net/gff3.shtml

For clarification and edification I have created a couple of tables 
describing the module and the validation that is applied to GFF3 files, 
which you can see online: http://www.salmonella.org/bioperl/gff3.html

I also wrote a Bio::SeqIO::gff module. Since gff3 files can hold 
sequences, it seems that you'd want to be able to call the next_seq 
methods, and therefore SeqIO is more appropriate than FeatureIO for 
those aspects. Currently the SeqIO module uses the FeatureIO module for 
parsing the file, it just reorganizes things.

This provides two different interfaces for getting objects out of GFF3 
files:
	Bio::FeatureIO::gff will return Bio::SeqFeature::Annotated objects 
representing the annotations.
	Bio::SeqIO::gff will return Bio::Seq objects representing the 
sequences with all the annotations attached.

The other difference between the two is that the former passes out the 
objects as they are read, but the latter has to read the whole file to 
get the annotations and the sequences.

At the moment I focussed on reading GFF3 files.

I have not committed these to cvs yet, pending comments from others. I 
have some specific questions:
	Should I wait until after 1.5 is out?
	Is two separate modules really the right way to go about this?
	What about other GFF modules (like Bio::Tools::GFF)?
	Could someone give the modules a workout and let me know about bugs? I 
am sure there are many.

I have posted these modules online via anonymous ftp at 
ftp://ftp.salmonella.org/rob/bioperl/GFF_modules.tgz
Take a look and let me know what you do and don't like!

Rob



More information about the Bioperl-l mailing list