[BioPython] GenBank Parser and large files?

Tue Mar 15 16:27:08 EST 2005

Stephan Herschel wrote:
 > Hi,
 > I'm parsing some genebank files. When parsing large files it
 > appears the parser eats up all of my memory until nothing is
 > left, i.e. it's not possible to parse large files (>25 MB).

You are not the only one to find this...

 > That's the way I do it:
 > >>> from Bio import GenBank
 > >>> fh=open(fname,'r')
 > >>> feature_parser=GenBank.FeatureParser()
 > >>> gb_iterator = GenBank.Iterator(fh, feature_parser)
 > >>> cur_record = gb_iterator.next()

The code looks fine.

 > Is there a way to circumvent this problem?
 > Thanks,
 > Stephan

Yes - please try out my patch available on bug 1747,

http://bugzilla.open-bio.org/show_bug.cgi?id=1747

If you don't know how to use the diff file, tell me which version of 
BioPython you are using (1.30 or 1.40b I would assume) and I can 
email you a replacement python file instead:

Bio/GenBank/__init__.py

As a short term measure, depending on which bits of the GenBank file 
you care about, you can try editing the GenBank file by hand before 
parsing to remove most of the features (leave at least one), or most 
of the sequence.

Peter