[Biopython-dev] Getting ready for a release, II

Andrew Dalke dalke at dalkescientific.com
Mon Feb 14 09:09:06 EST 2005


Hi all,

Peter:
> I filed bug 1747 as "major" and feel it renders the GenBank parser
> effectively useless for large genomes.

I saw that bug report when it came in a couple weeks ago but I was busy
at a client site.

One of the fundamental problems with this implementation of Martel
is that it parses a record in memory and uses about 4x as much memory
as the record.  The slowness for large records comes from hitting
swap.  It can't be fixed without some non-trivial changes to Martel;
basically a rewrite.  If anyone wants to tackle rewriting a regex
engine I have some comments about what needs to be done.  As for me
I haven't touched the code in years because I haven't needed that
capability and other tasks (including paying work) keep me busy.

					Andrew
					dalke at dalkescientific.com




More information about the Biopython-dev mailing list