[BioPython] blast parser slows down under python2.3
Jeffrey Chang
jefftc at stanford.edu
Sun Aug 31 18:44:18 EDT 2003
I have applied the patch. Thanks very much!
The regression tests now work again. For the tests that print out
booleans, I am now explicitly printing out 0 or 1, for backwards
compatibility.
I have also gone through and changed some more instances of apply to
the new call syntax. Please let me know if there appears to be any
problems.
Jeff
On Friday, August 29, 2003, at 12:07 PM, Jeffrey Chang wrote:
> Hey, thanks very much for the note, and the patch (mailed separately).
>
> Python 2.3 also seems to have broken some of the regression tests.
> The boolean type gets printed out as "True" and "False" rather than 1
> or 0 as before.
>
> I'll take a look at these over the weekend.
>
> Jeff
>
>
>
>
> On Friday, August 29, 2003, at 10:24 AM, Peter Slickers wrote:
>
>> The biopython blast parser runs at only half of the speed
>> seen with python2.2 when executed with python2.3.
>>
>>
>> This effect is monitored best with a huge blast output file.
>> My setup for measuring the performance is quite simple.
>> I have used a small python script which just parses a blast
>> file and stores the content in memory. I have started this
>> script with the time command, and the python interpreter
>> was explicitely specified either as python2.2 or python2.3.
>> Each run was repeated four times.
>>
>> --------------------------------------------------------------
>> command CPU time in sec
>> --------------------------------------------------------------
>> time python2.2 parser.py blastout.txt 5.11,3.58,3.98,4.15
>> time python2.3 parser.py blastout.txt 8.85,7.97,7.30,7.12
>> --------------------------------------------------------------
>> (with biopython 1.21)
>>
>> I sticked into this when running the python profiler
>> on the blast parser. It turns out, that more
>> than half of the CPU time was spent in the warnings module,
>> which is part of the python standard installation
>> (/usr/local/lib/python2.3/warnings.py).
>>
>> Further digging revealed that the function warn() is called
>> each time the readline() method from class UndoHandle is
>> executed (file site-packages/Bio/File.py).
>>
>> Within the readline() method the python build-in function
>> apply() is heavily used. But since python2.3 the usage of
>> apply() is deprecated, and therefore the warn() function is called
>> by the interpreter each time the apply() function is used.
>>
>>
>> According to the python2.3 manual, the apply() function should be
>> substituted by the "extended call syntax" (which was introduced
>> in python2.0).
>>
>> To test my hypothesis that the perfomance leck ist caused by
>> the apply() function, I took the standard genetical approach
>> of knock-out and complementing: I created a modified version
>> of Bio/File.py where all occurences of apply() were replaced
>> by "extended call syntax". After that, I run the benchmark again:
>>
>> --------------------------------------------------------------
>> command CPU time in sec
>> --------------------------------------------------------------
>> time python2.2 parser.py blastout.txt 4.11,3.53,4.07,4.03
>> time python2.3 parser.py blastout.txt 4.94,4.96,4.54,5.24
>> --------------------------------------------------------------
>> (with modified Bio/File.py)
>>
>>
>> The numbers clearly reveal that my patch successfully reconstitutes
>> the speed of the blast parser under pythons2.3.
>>
>>
>>
>> Fazit: the "newer, better, faster" dogma is not true with python.
>>
>>
>> Here is an example of what the patch looks like:
>>
>> old: line = apply(self._handle.readline, args, keywds)
>> new: line = self._handle.readline(*args,**keywds)
>>
>>
>> --
>>
>>
>> Peter
>> -------------------------------------------------------------------
>> Peter Slickers piet at clondiag.com
>> Clondiag Chip Technologies http://www.clondiag.com/
>> Löbstedter Str. 105
>> 07749 Jena
>> Germany
>>
>> Fon: 03641/5947-65 Fax: 03641/5947-20
>> -------------------------------------------------------------------
>>
>> _______________________________________________
>> BioPython mailing list - BioPython at biopython.org
>> http://biopython.org/mailman/listinfo/biopython
>
More information about the BioPython
mailing list