[Bioperl-l] Re: GFF3

Scott Cain cain at cshl.edu
Wed Jan 19 16:57:54 EST 2005


Allen,

Sorry about the ID problem/question--FeatureIO is fine in that respect.  I
was misremembering a problem with a chado loader as a bioperl problem.

Thanks,
Scott

----------------------------------------------------------------------
Scott Cain, Ph. D.				 	 cain at cshl.org
GMOD Coordinator, http://www.gmod.org/			 (216)392-3087
----------------------------------------------------------------------


On Mon, 17 Jan 2005, Allen Day wrote:

> Hi,
> 
> On Mon, 17 Jan 2005, Scott Cain wrote:
> 
> > Hi Rob,
> > 
> > Thanks for your work on this--I've put several comments in your
> > original message below.
> > 
> > Scott
> > 
> > ---------Original Message--------
> > Date: Sat, 15 Jan 2005 15:22:23 -0800
> > From: Rob Edwards <rob at salmonella.org>
> > Subject: [Bioperl-l] GFF3
> > To: Bioperl list <bioperl-l at portal.open-bio.org>
> > 
> > Because I need it for some things that I am doing, I have worked quite 
> > a bit on the GFF3 parser Bio::FeatureIO::gff. Several people have 
> > written this module, I have just made some cosmetic changes:
> > 
> > I have improved the validation processes that are applied as a gff3 
> > file is parsed, and the module should now validate essentially 
> > everything in the file except alignments. Validation is optional and is 
> > based on the specification described at : 
> > http://song.sourceforge.net/gff3.shtml
> > 
> > SC> Excellent--Did you happen to relax the requirement that ID be unique
> > SC> for each line of the GFF?  Allen and I put that in due to a misreading
> > SC> of the spec.  The ID has to be unique for a *feature*, which can be
> > SC> spread across several lines.
> 
> I'm not sure if this is taken care of in the code... actually, I'm a bit 
> foggy on exactly what the problem is.
> 
> > For clarification and edification I have created a couple of tables
> > describing the module and the validation that is applied to GFF3 files,
> > which you can see online: http://www.salmonella.org/bioperl/gff3.html
> > 
> > SC> Very nice and well done--do you happen to have a pod-ified version
> > SC> of this page?  It would be nice to include in the pod for
> > SC> Bio::FeatureIO::gff.
> 
> That's nice, I'd like to see it folded into the gff.pm perldoc as well.
> 
> > I also wrote a Bio::SeqIO::gff module. Since gff3 files can hold 
> > sequences, it seems that you'd want to be able to call the next_seq 
> > methods, and therefore SeqIO is more appropriate than FeatureIO for 
> > those aspects. Currently the SeqIO module uses the FeatureIO module for 
> > parsing the file, it just reorganizes things.
> > 
> > This provides two different interfaces for getting objects out of GFF3 
> > files:
> > 	Bio::FeatureIO::gff will return Bio::SeqFeature::Annotated objects 
> > representing the annotations.
> > 	Bio::SeqIO::gff will return Bio::Seq objects representing the 
> > sequences with all the annotations attached.
> > 
> > The other difference between the two is that the former passes out the 
> > objects as they are read, but the latter has to read the whole file to 
> > get the annotations and the sequences.
> > 
> > SC> I thought about doing something similar with SeqIO, but I am worried 
> > SC> about the case where somebody tries to use SeqIO on a well 
> > SC> annotated human Chr1 GFF3 file (if one were ever to exist :-) ,
> > SC> but I suppose the same machine killing thing could be done if
> > SC> someone tried to use SeqIO on a genbank file of Chr1.
> 
> See my previous email, I don't think we need the SeqIO module.
> 
> > At the moment I focussed on reading GFF3 files.
> > 
> > I have not committed these to cvs yet, pending comments from others. I 
> > have some specific questions:
> > 	Should I wait until after 1.5 is out?
> > 
> > SC> I don't have the definative answer, but I would say it doesn't
> > SC> matter much, as long as it passes tests.  Bio::FeatureIO::gff is
> > SC> hardly a fully functional module as it is, so if we could 
> > SC> squeeze a little more functionality into it before we
> > SC> release it, that would be fine with me.
> 
> well it's in now.  and it passes tests.  there weren't any before, but i 
> wrote some.  look in t/FeatureIO.t
> 
> > 	Is two separate modules really the right way to go about this?
> > 
> > SC> As long as it works for this case, I don't mind:  calling
> > SC> 'next_feature' on a FeatureIO object until I run out of features
> > SC> and then calling 'next_sequence' (and get a Bio::PrimarySeq) on
> > SC> the same FeatureIO object until I run out of sequences.
> > 
> > 	What about other GFF modules (like Bio::Tools::GFF)?
> > 
> > SC> I am willing to let Bio::Tools::GFF die a terrible death.  While
> > SC> it will have to be kept around for apps that depend on it, I don't
> > SC> see adding any major functionality as time well spent.
> > 
> > 	Could someone give the modules a workout and let me know about bugs? I 
> > am sure there are many.
> > 
> > SC> I will try to soon, but it won't be until next week at 
> > SC> the earliest.
> > 
> > I have posted these modules online via anonymous ftp at 
> > ftp://ftp.salmonella.org/rob/bioperl/GFF_modules.tgz
> > Take a look and let me know what you do and don't like!
> > 
> > Rob
> > 
> > 
> > ----------------------------------------------------------------------
> > Scott Cain, Ph. D.				 	 cain at cshl.org
> > GMOD Coordinator, http://www.gmod.org/			 (216)392-3087
> > ----------------------------------------------------------------------
> > 
> > 
> > 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> > 
> 



More information about the Bioperl-l mailing list