[BioPython] GenBank parsing errors

Fri Nov 19 06:39:01 EST 2004

I have been trying to use the GenBank parser and have had some trouble.

I notice from the archives that Michael Maibaum has also had difficulties:

http://portal.open-bio.org/pipermail/biopython/2004-November/002457.html

Michael wrote:

> I'm trying to use biopython to parse genbank files and it is working 
> happily on some genbank files,  but not many others. So far the 
> pattern appears to be
> 
> Prokaryotic complete genome => OK
 > Eukaryotic complete genome =>failure

I have not tried any prokaryotes, but I have tried several eukaryotes
without any success.

While I do recall have seen Martel parser errors (probably like Michael
had), I generally have a different problem.

For example, this small sample of code fails using E. coli K12, file
NC_000913.gbk (about 10MB) available from here:

ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/Escherichia_coli_K12/

from Bio import GenBank
gb_handle = open('NC_000913.gbk', 'r')
feature_parser = GenBank.FeatureParser()
gb_iterator = GenBank.Iterator(gb_handle, feature_parser)
print 'So far so good'
cur_record = gb_iterator.next()
print 'Done'

I see CPU usage at almost 100%, and memory usage for Python goes
steadily up.  At about 200 or 300MB the CPU usage drops, and my system
becomes very sluggish.  I normally kill the process at this point.

Windows XP
BioPython 1.30
Python 2.3

Does anyone got the GenBank parser to work on a bacterial genome?

Thank you

Peter