[BioPython] blast parser slows down under python2.3

Andrew Nunberg anunberg at oriongenomics.com
Tue Sep 2 10:30:11 EDT 2003


I take it that you are applying these patches in CVS?
I have only downloaded the tarball for BioPython, would you suggest i 
check it out from CVS and what tag should I use?

Andy

On Sunday, August 31, 2003, at 05:44 PM, Jeffrey Chang wrote:

> I have applied the patch.  Thanks very much!
>
> The regression tests now work again.  For the tests that print out 
> booleans, I am now explicitly printing out 0 or 1, for backwards 
> compatibility.
>
> I have also gone through and changed some more instances of apply to 
> the new call syntax.  Please let me know if there appears to be any 
> problems.
>
> Jeff
>
>
>
> On Friday, August 29, 2003, at 12:07  PM, Jeffrey Chang wrote:
>
>> Hey, thanks very much for the note, and the patch (mailed separately).
>>
>> Python 2.3 also seems to have broken some of the regression tests.  
>> The boolean type gets printed out as "True" and "False" rather than 1 
>> or 0 as before.
>>
>> I'll take a look at these over the weekend.
>>
>> Jeff
>>
>>
>>
>>
>> On Friday, August 29, 2003, at 10:24  AM, Peter Slickers wrote:
>>
>>> The biopython blast parser runs at only half of the speed
>>> seen with python2.2 when executed with python2.3.
>>>
>>>
>>> This effect is monitored best with a huge blast output file.
>>> My setup for measuring the performance is quite simple.
>>> I have used a small python script which just parses a blast
>>> file and stores the content in memory. I have started this
>>> script with the time command, and the python interpreter
>>> was explicitely specified either as python2.2 or python2.3.
>>> Each run was repeated four times.
>>>
>>> --------------------------------------------------------------
>>> command                                    CPU time in sec
>>> --------------------------------------------------------------
>>> time python2.2 parser.py blastout.txt      5.11,3.58,3.98,4.15
>>> time python2.3 parser.py blastout.txt      8.85,7.97,7.30,7.12
>>> --------------------------------------------------------------
>>> (with biopython 1.21)
>>>
>>> I sticked into this when running the python profiler
>>> on the blast parser. It turns out, that more
>>> than half of the CPU time was spent in the warnings module,
>>> which is part of the python standard installation
>>> (/usr/local/lib/python2.3/warnings.py).
>>>
>>> Further digging revealed that the function warn() is called
>>> each time the readline() method from class UndoHandle is
>>> executed (file site-packages/Bio/File.py).
>>>
>>> Within the readline() method the python build-in function
>>> apply() is heavily used. But since python2.3 the usage of
>>> apply() is deprecated, and therefore the warn() function is called
>>> by the interpreter each time the apply() function is used.
>>>
>>>
>>> According to the python2.3 manual, the apply() function should be
>>> substituted by the "extended call syntax" (which was introduced
>>> in python2.0).
>>>
>>> To test my hypothesis that the perfomance leck ist caused by
>>> the apply() function, I took the standard genetical approach
>>> of knock-out and complementing: I created a modified version
>>> of Bio/File.py where all occurences of apply() were replaced
>>> by "extended call syntax". After that, I run the benchmark again:
>>>
>>> --------------------------------------------------------------
>>> command                                    CPU time in sec
>>> --------------------------------------------------------------
>>> time python2.2 parser.py blastout.txt      4.11,3.53,4.07,4.03
>>> time python2.3 parser.py blastout.txt      4.94,4.96,4.54,5.24
>>> --------------------------------------------------------------
>>> (with modified Bio/File.py)
>>>
>>>
>>> The numbers clearly reveal that my patch successfully reconstitutes
>>> the speed of the blast parser under pythons2.3.
>>>
>>>
>>>
>>> Fazit:  the "newer, better, faster" dogma is not true with python.
>>>
>>>
>>> Here is an example of what the patch looks like:
>>>
>>>   old:     line = apply(self._handle.readline, args, keywds)
>>>   new:     line = self._handle.readline(*args,**keywds)
>>>
>>>
>>> -- 
>>>
>>>
>>> Peter
>>> -------------------------------------------------------------------
>>> Peter Slickers                             piet at clondiag.com
>>> Clondiag Chip Technologies                 http://www.clondiag.com/
>>> Löbstedter Str. 105
>>> 07749 Jena
>>> Germany
>>>
>>> Fon:  03641/5947-65                        Fax:  03641/5947-20
>>> -------------------------------------------------------------------
>>>
>>> _______________________________________________
>>> BioPython mailing list  -  BioPython at biopython.org
>>> http://biopython.org/mailman/listinfo/biopython
>>
>
>
> _______________________________________________
> BioPython mailing list  -  BioPython at biopython.org
> http://biopython.org/mailman/listinfo/biopython
>
>
---------------------------------------------------
Andrew Nunberg Ph.D
Bioinfomagician
Orion Genomics
4041 Forest Park
St Louis, MO
314-615-6989
anunberg at oriongenomics.com
www.oriongenomics.com




More information about the BioPython mailing list