[Biopython-dev] Fasta.SequenceParser slower on python 2.4 than 2.3

Wed Sep 13 18:23:56 UTC 2006

I've been looking at sequence parsing again, and was a little puzzled to 
notice that the stock Fasta.SequenceParser (which uses Martel 
internally) is about three to four times slower on Python 2.4 than on 
Python 2.3 (on my Windows XP laptop).

Has anyone else noticed this?

For comparison, SeqIO.FASTA.FastaReader is about the same (maybe even a 
fraction faster).

I've been using rat.protein.faa as a test case, a 22 MB file with approx 
36000 entries.  The sequences are split into 80 character lines. 
Available here:

ftp://ftp.ncbi.nlm.nih.gov/refseq/R_norvegicus/mRNA_Prot/rat.protein.faa.gz

On python 2.3.3 the attached script takes about 12s to parse, on python 
2.4.3 it takes about 56s.  Explicitly caching the file using cStringIO 
makes no real difference.  Using SeqIO.FASTA.FastaReader takes about 10s 
or 11s (regardless of the version of python).

It is possible that this "slow down" is Windows only - I know they 
switched from MSVC version 6 to version 7 (or something) instead, which 
may be to blame.

Peter
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: simple_no_cache.py
URL: <http://lists.open-bio.org/pipermail/biopython-dev/attachments/20060913/a69fedfd/attachment.ksh>