[Biopython-dev] [Bug 2738] Speed up GenBank parsing, in particular location parsing
bugzilla-daemon at portal.open-bio.org
bugzilla-daemon at portal.open-bio.org
Fri Jan 30 11:29:07 UTC 2009
http://bugzilla.open-bio.org/show_bug.cgi?id=2738
------- Comment #9 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-30 06:29 EST -------
I've run my test script (attachment 1209) on a Linux machine with Python 2.5
5.5K Jan 30 10:29 CY029873.gbk
67M Jan 22 17:53 dr_ref_chr16.gbk
42M Jan 22 17:53 NC_003075.gbk
14M Jan 22 18:43 NC_003272.gbk
25M Jan 22 17:52 NC_003279.gbk
4.8M Jan 22 18:44 NC_004350.gbk
20M Jan 22 18:42 NC_008095.gbk
14M Jan 22 18:44 NC_009925.gbk
18M Jan 22 18:43 NC_010628.gbk
296M Jan 22 17:52 ptr_ref_chr1.gbk
86M Jan 30 10:55 wgs.AAAB.1.gnp.gbk
297M Jan 30 10:55 wgs.AABR.10.gbff.gbk
The last two files are WGS data for protein and nucleotide sequences,
downloaded from ftp://ftp.ncbi.nih.gov/genbank/wgs/ then unzipped and a gbk
extension added so my script parses them.
With and without the patch the test script gives identical output - which
appears to confirm the location parsing is not functionally altered. The
timings where just over 2min and just over 8min with and without the patch (a
four fold speed up on this dataset).
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
More information about the Biopython-dev
mailing list