[BioPython] blast parser slows down under python2.3

Jeffrey Chang jchang at jeffchang.com
Fri Aug 29 15:07:48 EDT 2003


Hey, thanks very much for the note, and the patch (mailed separately).

Python 2.3 also seems to have broken some of the regression tests.  The 
boolean type gets printed out as "True" and "False" rather than 1 or 0 
as before.

I'll take a look at these over the weekend.

Jeff




On Friday, August 29, 2003, at 10:24  AM, Peter Slickers wrote:

> The biopython blast parser runs at only half of the speed
> seen with python2.2 when executed with python2.3.
>
>
> This effect is monitored best with a huge blast output file.
> My setup for measuring the performance is quite simple.
> I have used a small python script which just parses a blast
> file and stores the content in memory. I have started this
> script with the time command, and the python interpreter
> was explicitely specified either as python2.2 or python2.3.
> Each run was repeated four times.
>
> --------------------------------------------------------------
> command                                    CPU time in sec
> --------------------------------------------------------------
> time python2.2 parser.py blastout.txt      5.11,3.58,3.98,4.15
> time python2.3 parser.py blastout.txt      8.85,7.97,7.30,7.12
> --------------------------------------------------------------
> (with biopython 1.21)
>
> I sticked into this when running the python profiler
> on the blast parser. It turns out, that more
> than half of the CPU time was spent in the warnings module,
> which is part of the python standard installation
> (/usr/local/lib/python2.3/warnings.py).
>
> Further digging revealed that the function warn() is called
> each time the readline() method from class UndoHandle is
> executed (file site-packages/Bio/File.py).
>
> Within the readline() method the python build-in function
> apply() is heavily used. But since python2.3 the usage of
> apply() is deprecated, and therefore the warn() function is called
> by the interpreter each time the apply() function is used.
>
>
> According to the python2.3 manual, the apply() function should be
> substituted by the "extended call syntax" (which was introduced
> in python2.0).
>
> To test my hypothesis that the perfomance leck ist caused by
> the apply() function, I took the standard genetical approach
> of knock-out and complementing: I created a modified version
> of Bio/File.py where all occurences of apply() were replaced
> by "extended call syntax". After that, I run the benchmark again:
>
> --------------------------------------------------------------
> command                                    CPU time in sec
> --------------------------------------------------------------
> time python2.2 parser.py blastout.txt      4.11,3.53,4.07,4.03
> time python2.3 parser.py blastout.txt      4.94,4.96,4.54,5.24
> --------------------------------------------------------------
> (with modified Bio/File.py)
>
>
> The numbers clearly reveal that my patch successfully reconstitutes
> the speed of the blast parser under pythons2.3.
>
>
>
> Fazit:  the "newer, better, faster" dogma is not true with python.
>
>
> Here is an example of what the patch looks like:
>
>   old:     line = apply(self._handle.readline, args, keywds)
>   new:     line = self._handle.readline(*args,**keywds)
>
>
> -- 
>
>
> Peter
> -------------------------------------------------------------------
> Peter Slickers                             piet at clondiag.com
> Clondiag Chip Technologies                 http://www.clondiag.com/
> Löbstedter Str. 105
> 07749 Jena
> Germany
>
> Fon:  03641/5947-65                        Fax:  03641/5947-20
> -------------------------------------------------------------------
>
> _______________________________________________
> BioPython mailing list  -  BioPython at biopython.org
> http://biopython.org/mailman/listinfo/biopython




More information about the BioPython mailing list