[Biopython-dev] [Bug 1747] GenBank parser is very slow and memory
hungry for large input files
bugzilla-daemon at portal.open-bio.org
bugzilla-daemon at portal.open-bio.org
Wed Mar 9 05:32:39 EST 2005
http://bugzilla.open-bio.org/show_bug.cgi?id=1747
------- Additional Comments From biopython-bugzilla at maubp.freeserve.co.uk 2005-03-09 05:32 -------
The following times and memory usage figures are on Windows 2000,
Python 2.3.3 using the GenBank Iterator (script running from Idle).
The computer was a 2.26 GHz Intel Pentium 4, with 735 MB RAM:-
BioPython 1.30 using Martel:-
NC_003065.gbk 480 kb, 4 seconds, 28 MB RAM
NC_003064.gbk 1,217 kb, 11 seconds, 56 MB RAM
NC_000854.gbk 3,391 kb, 45 seconds, 165 MB RAM
NC_003063.gbk 4,725 kb, 55 seconds, 195 MB RAM
NC_003062.gbk 6,574 kb, 88 seconds, 268 MB RAM
NC_005966.gbk 8,858 kb, 139 seconds, 372 MB RAM
NC_000913.gbk 10,267 kb, 171 seconds, 409 MB RAM
NC_000962.gbk 11,010 kb, 200 seconds, 486 MB RAM
NC_003997.gbk 12,026 kb, 228 seconds, 496 MB RAM
NC_002678.gbk 15,120 kb, 306 seconds, 586 MB RAM
NC_005027.gbk 18,211 kb, not enough RAM
NC_004463.gbk 19,500 kb, not enough RAM
NC_003888.gbk 24,390 kb, not enough RAM
NC_004354.gbk 33,139 kb, not enough RAM
NC_003074.gbk 42,281 kb, not enough RAM
NC_003070.gbk 55,149 kb, not enough RAM
BioPython 1.30 with this patch:-
NC_003065.gbk 480 kb, 1 seconds, 13 MB RAM
NC_003064.gbk 1,217 kb, 4 seconds, 16 MB RAM
NC_000854.gbk 3,391 kb, 16 seconds, 25 MB RAM
NC_003063.gbk 4,725 kb, 17 seconds, 26 MB RAM
NC_003062.gbk 6,574 kb, 27 seconds, 33 MB RAM
NC_005966.gbk 8,858 kb, 33 seconds, 40 MB RAM
NC_000913.gbk 10,267 kb, 43 seconds, 45 MB RAM
NC_000962.gbk 11,010 kb, 41 seconds, 45 MB RAM
NC_003997.gbk 12,026 kb, 55 seconds, 52 MB RAM
NC_002678.gbk 15,120 kb, 71 seconds, 61 MB RAM
NC_005027.gbk 18,211 kb, 88 seconds, 68 MB RAM
NC_004463.gbk 19,500 kb, 95 seconds, 74 MB RAM
NC_003888.gbk 24,390 kb, 146 seconds, 95 MB RAM
NC_004354.gbk 33,139 kb, 156 seconds, 121 MB RAM
NC_003074.gbk 42,281 kb, 302 seconds, 193 MB RAM
NC_003070.gbk 55,149 kb, 436 seconds, 250 MB RAM
The last three (really big) files are from Drosophila and
Arabidopsis, the rest are bacteria.
Times recorded by the test script, memory usage recorded by hand
using Task Manager.
In summary, with the patch parsing is nearly four times faster,
and uses almost a tenth of the memory - quite an improvement.
The details of implementation for this approach could be improved,
I have had some thoughts about this over night.
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
More information about the Biopython-dev
mailing list