[Biopython-dev] Performance of Bio.File.UndoHandle
Michael Hoffman
hoffman at ebi.ac.uk
Fri Oct 10 10:11:19 EDT 2003
I have long wondered about how much the use of Bio.File.UndoHandle
slows things down (it has additional checks for every read
operation). Here are some results:
I wrote two scripts, filetest.py and filetest-undo.py. They both read
in every line of Homo sapiens chromosome 1 and do nothing with
it. This file is 4086733 lines and 249290633 bytes.
**** filetest.py
input = file("/scratch/hoffman/1.fa")
line = 1
while line:
line = input.readline()
**** filetest-undo.py
import Bio.File
input = Bio.File.UndoHandle(file("/scratch/hoffman/1.fa"))
line = 1
while line:
line = input.readline()
****
Timing the run of these files gives the following results (real):
filetest.py: 0m12.703s 0m12.215s 0m12.331s
filetest-undo.py: 0m30.135s 0m29.676s 0m30.165s
There is about a 150% increase in the amount of time it takes to do
input using readline() with UndoHandle. The overhead of loading
Bio.File is minimal:
$ time python -c "import Bio.File"
real 0m0.418s
user 0m0.090s
sys 0m0.080s
$ time python -c "None"
real 0m0.070s
user 0m0.010s
sys 0m0.030s
This kind of increase on basic I/O means much one one is doing big
jobs, in my opinion. I wasn't volunteering to rewrite anything to not
use UndoHandle but people might consider it when writing future
stuff. And I might try rewriting some stuff anyway. Any thoughts?
--
Michael Hoffman <hoffman at ebi.ac.uk>
European Bioinformatics Institute
More information about the Biopython-dev
mailing list