[Bioperl-l] What's the best way to produce gff files from genebank/embl formats?

aaron.j.mackey at gsk.com aaron.j.mackey at gsk.com
Mon Nov 19 16:50:53 UTC 2007


While Lucia's subject line asked for genbank2gff, her message actually 
asked the reverse (gff + fasta -> genbank).

e.g. pretend you had to prepare a genome annotation for submission to 
GenBank ...

and no, I don't know of any generalized gff2genbank script out there ...

Lucia, the SeqIO::genbank module will write GenBank format, but you have 
to get all the bits and bobs together in the right way, i.e. construct the 
various AnnotationCollections and SeqFeatures (with SplitLocations for 
exons, CDS, etc.) that a GenBank record expects.  One way to do this is to 
start with a template GenBank file that you'd like to mimic, strip it down 
to only two gene models, use SeqIO::genbank to read it into memory, and 
then step through the object with the Perl debugger to see how it is 
composed.

-Aaron

bioperl-l-bounces at lists.open-bio.org wrote on 11/15/2007 02:19:41 PM:

> Chris,
> 
> There's also a genbank2gff3.PLS script in the GMOD package (
> 
http://gmod.cvs.sourceforge.net/gmod/schema/chado/load/bin/genbank2gff3.PLS?
> revision=1.5&view=markup). However, it has not been modified for a 
couple of
> years, it may not be the "preferred" script.
> 
> See http://gmod.org/wiki/index.php/Load_GenBank_into_Chado and
> http://gmod.org/wiki/index.php/Load_RefSeq_Into_Chado for more 
information
> on using Bioperl's bp_genbank2gff3 script.
> 
> Brian O.
> 
> 
> On 11/15/07 1:43 PM, "Chris Fields" <cjfields at uiuc.edu> wrote:
> 
> > There are currently many ways to get what you want, but not all are
> > consistent (particularly re: GFF3).  We are aiming for more
> > consistent, compliant GFF/GTF output in the next developer series
> > (1.7) of Bioperl.
> > 
> > You can try using bp_genbank2gff or bp_genbank2gff3 (both in the
> > scripts directory); these are probably the most common way when
> > working directly from a seq record.  Bio::Tools::GFF is the most
> > commonly used class though I'm unsure of it's status for GFF3
> > output.  From within a Bio::SeqI you can call write_gff() (currently
> > not very flexible) or from the SeqFeature itself gff_string().
> > Bio::Graphics::Feature has the additional method gff3_string().
> > Bio::FeatureIO is also an option, though I would consider it very
> > experimental (it will likely undergo significant revision in the next
> > bioperl dev series).
> > 
> > Any others anyone can think of, maybe non-BioPerl related as well?
> > 
> > chris
> > 
> > On Nov 15, 2007, at 9:44 AM, Lucia Peixoto wrote:
> > 
> >> Hi
> >> I was asked this question recently
> >> and it occurred to me I must be doing things inefficiently
> >> To produce gff file I was using SeqIO to parse the required fields,
> >> then
> >> according to the conventions just printing out whatever was
> >> required tab
> >> delimited, which is easy
> >> 
> >> but if I wanted to generate a genbank file, extracting features
> >> from a gff file
> >> and a plain fasta file it was more complicated
> >> is there support for gff in bioperl now?
> >> anyone can contribute with  smart way to go from/to gff, genebank
> >> and embl?
> >> 
> >> thanks very much
> >> 
> >> Lucia Peixoto
> >> Department of Biology,SAS
> >> University of Pennsylvania
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 





More information about the Bioperl-l mailing list