[BioPython] GenBank Format & Parsing (was: Why would this GenBank file choke the GB parser?)

Jan T. Kim jtk at cmp.uea.ac.uk
Fri Aug 26 05:59:46 EDT 2005


On Thu, Aug 25, 2005 at 06:03:24PM -0400, Chris Lasher wrote:
> Hello,
> 
>   I have a GenBank file, accession AY499671.gb, and 21 like it that I
> would like to process through BioPython (I am using BioPython 1.40b
> with Windows), but I am encountering trouble. It seems that the
> GenBank parser is choking on something in the files themselves, but I
> could really use help determining what this would be, and in
> determining how to fix it. The error seems to be raised by the Martel
> Parser, but exactly what is causing it to raise the error is beyond my
> lack of knowledge and inexperience.

I've run into similar problems a while ago, the parser is rather picky
about certain things.

In your case, AY499671 gives "ENV" as the division in the DEFINITION line
(first line of the file), and it turns out that BioPython doesn't know
about this division. Specifically, this is in Bio/expressions/genbank.py:

    valid_divisions = ["PRI", "ROD", "MAM", "VRT", "INV", "PLN", "BCT", "RNA",
		       "VRL", "PHG", "SYN", "UNA", "EST", "PAT", "STS", "GSS",
		       "HTG", "HTC", "CON"]

Chances are very good that by adding "ENV" to that list, you'll fix your
problem. I've tried changing ENV to BCT in the GenBank file and that
fixed it.

While we're at this: Yeast chromosome GenBank files which I downloaded
recently have

    ACCESSION   NC_001133 REGION: 1..230208

which the GenBank parser doesn't like either. I've patched my
Bio/expressions/genbank.py to accept this, but I haven't been able to
find any documentation of this -- I just checked the GenBank release
notes (ftp://ftp.ncbi.nih.gov/genbank/gbrel.txt) again. Can anyone comment
on this?

Personally, I can't help but wonder whether it would not be possible for
the GenBank format to converge to stability after so many years...

Best regards, Jan
-- 
 +- Jan T. Kim -------------------------------------------------------+
 |    *NEW*    email: jtk at cmp.uea.ac.uk                               |
 |    *NEW*    WWW:   http://www.cmp.uea.ac.uk/people/jtk             |
 *-----=<  hierarchical systems are for files, not for humans  >=-----*


More information about the BioPython mailing list