[BioPython] GenBank Parser and large files?
Peter
biopython at maubp.freeserve.co.uk
Tue Mar 15 16:27:08 EST 2005
Stephan Herschel wrote:
> Hi,
> I'm parsing some genebank files. When parsing large files it
> appears the parser eats up all of my memory until nothing is
> left, i.e. it's not possible to parse large files (>25 MB).
You are not the only one to find this...
> That's the way I do it:
> >>> from Bio import GenBank
> >>> fh=open(fname,'r')
> >>> feature_parser=GenBank.FeatureParser()
> >>> gb_iterator = GenBank.Iterator(fh, feature_parser)
> >>> cur_record = gb_iterator.next()
The code looks fine.
> Is there a way to circumvent this problem?
> Thanks,
> Stephan
Yes - please try out my patch available on bug 1747,
http://bugzilla.open-bio.org/show_bug.cgi?id=1747
If you don't know how to use the diff file, tell me which version of
BioPython you are using (1.30 or 1.40b I would assume) and I can
email you a replacement python file instead:
Bio/GenBank/__init__.py
As a short term measure, depending on which bits of the GenBank file
you care about, you can try editing the GenBank file by hand before
parsing to remove most of the features (leave at least one), or most
of the sequence.
Peter
More information about the BioPython
mailing list