[BioPython] GenBank parsing errors
Peter
biopython at maubp.freeserve.co.uk
Fri Nov 19 06:39:01 EST 2004
I have been trying to use the GenBank parser and have had some trouble.
I notice from the archives that Michael Maibaum has also had difficulties:
http://portal.open-bio.org/pipermail/biopython/2004-November/002457.html
Michael wrote:
> I'm trying to use biopython to parse genbank files and it is working
> happily on some genbank files, but not many others. So far the
> pattern appears to be
>
> Prokaryotic complete genome => OK
> Eukaryotic complete genome =>failure
I have not tried any prokaryotes, but I have tried several eukaryotes
without any success.
While I do recall have seen Martel parser errors (probably like Michael
had), I generally have a different problem.
For example, this small sample of code fails using E. coli K12, file
NC_000913.gbk (about 10MB) available from here:
ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/Escherichia_coli_K12/
from Bio import GenBank
gb_handle = open('NC_000913.gbk', 'r')
feature_parser = GenBank.FeatureParser()
gb_iterator = GenBank.Iterator(gb_handle, feature_parser)
print 'So far so good'
cur_record = gb_iterator.next()
print 'Done'
I see CPU usage at almost 100%, and memory usage for Python goes
steadily up. At about 200 or 300MB the CPU usage drops, and my system
becomes very sluggish. I normally kill the process at this point.
Windows XP
BioPython 1.30
Python 2.3
Does anyone got the GenBank parser to work on a bacterial genome?
Thank you
Peter
More information about the BioPython
mailing list