[Biopython-dev] [Bug 1747] GenBank parser is very slow and memory
hungry for large input files
bugzilla-daemon at portal.open-bio.org
bugzilla-daemon at portal.open-bio.org
Tue Mar 8 16:59:34 EST 2005
http://bugzilla.open-bio.org/show_bug.cgi?id=1747
------- Additional Comments From biopython-bugzilla at maubp.freeserve.co.uk 2005-03-08 16:59 -------
See also Andrew Dalke's comment on this bug:
http://www.biopython.org/pipermail/biopython-dev/2005-February/001910.html
I wrote:
> I filed bug 1747 as "major" and feel it renders the GenBank parser
> effectively useless for large genomes.
Andrew replied:
I saw that bug report when it came in a couple weeks ago but I was busy
at a client site.
One of the fundamental problems with this implementation of Martel
is that it parses a record in memory and uses about 4x as much memory
as the record. The slowness for large records comes from hitting
swap. It can't be fixed without some non-trivial changes to Martel;
basically a rewrite. If anyone wants to tackle rewriting a regex
engine I have some comments about what needs to be done. As for me
I haven't touched the code in years because I haven't needed that
capability and other tasks (including paying work) keep me busy.
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
More information about the Biopython-dev
mailing list