[Bioperl-pipeline] decent genbank parser

Jason Stajich jason@cgt.mc.duke.edu
Mon, 23 Sep 2002 17:42:22 -0400 (EDT)


Parser debate has been about efficiency only.

We're not going to guess gene structure from a genbank file which
already has the features flattened and no explicit required linking of
features into genes on the implicit linking through the 'gene' field.

If you want the exons just do for a seq returned from SeqIO.

my @exon = grep { $_->primary_tag eq 'exon' } $seq->all_SeqFeatures();

If you want gene structure, you can try and write something that strings
together all the features which have a common /gene field and not try and
merge the rest into anything.  These can be built into
Bio::SeqFeature::Gene objects which need a good going through.  Curious
what use cases we need to address besides the basic, got an array of
exons, I want the coding sequence.

Or else try and gather all the features that fall within the range defined
by given 'gene' field on the same strand and call them part of the same
gene.  But listed exons in genbank file don't really get magically put
into gene structures without some more created data collection schemes.

We've batted around this idea for quite a while on the bioperl list but
none of the main developers has had the time to do a reasonable first pass
and no one else has volunteered and produced any code.  We'd love for
someone to do it - it WILL get onto someone's hot list at some point.

-jason

On Mon, 23 Sep 2002, Andy Nunberg wrote:

> Hi,
> I know there has been a big debate on bioperl list about genbank parsers. I
> admit I wasnt following it very closely.
> I'd rather as you first ,
> Is there a parser that retains the organization of the features ?  That is,
> keeps track of the exons for a given gene, utr's etc...?
>
> I tried parsing a BAC genbank file with bioperl, but it flattens everything
> out (using Bio::SeqIO).
>
> Thanks
> Andy
> *******************************************************************
> Andy Nunberg, Ph.D
> Computational Biologist
> Orion Genomics, LLC
> (314) 615-6989
> http://www.oriongenomics.com
>
> _______________________________________________
> bioperl-pipeline mailing list
> bioperl-pipeline@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-pipeline
>

-- 
Jason Stajich
Duke University
jason at cgt.mc.duke.edu