[BioPython] Parsing Genbank file examples

Scott T. Kelley kelleys@ucsu.colorado.edu
Thu, 26 Jul 2001 14:50:38 -0700


Well I managed to get Biopython working on my computer (thanks Andrew) and
I'm very impressed with the potential power of the code. The documentation
is excellent BTW. The reason I say "potential" is that I haven't quite
gotten it to work completely. (I think mostly it is my lack of comprehension
about how the parsers really work.)

Thus, I'd really like some pointers on parsing Genbank files and extracting
data from these files. What I am really interested in at the moment is
accessing multiple mRNA features from a Genbank record. For instance, let's
say my record has the following features:

     mRNA            join(<92276..92689,92748..>92924)
                     /gene="CG15229"
     gene            <92276..>92924
                     /gene="CG15229"
     CDS             join(92276..92689,92748..92924)
                     /gene="CG15229"
                     /translation="MPAPSFYSISSESGSGSGERGKEAR"
     mRNA
join(<94303..94455,94762..95364,95815..96041,96382..96416,
                     96755..>96768)
                     /gene="CG15230"

What I want to do is read in the genbank file, create a seq record and
access all the exon/intron numbers from the mRNA features (ignoring the
other stuff for the moment). Unfortunately, I don't quite understand how to
make a Genbank sequence record or access these features so if anyone out
there has any tips/pointer on how I might accomplish this feat, or even a
script, I would be very appreciative.

Thanks! -Scott

P.S. Is this list the right place to point out bugs and errors?

-------------------
Scott T. Kelley, Ph.D.
Campus Box 347
MCD Biology
University of Colorado
Boulder, CO 80309-0347
Phone: (303) 735-1808
Fax: (303) 492-7744
E-mail: Scott.Kelley@Colorado.edu