[Biopython-dev] Performance of Bio.File.UndoHandle

Fri Oct 10 10:11:19 EDT 2003

I have long wondered about how much the use of Bio.File.UndoHandle
slows things down (it has additional checks for every read
operation). Here are some results:

I wrote two scripts, filetest.py and filetest-undo.py. They both read
in every line of Homo sapiens chromosome 1 and do nothing with
it. This file is 4086733 lines and 249290633 bytes.

**** filetest.py

input = file("/scratch/hoffman/1.fa")

line = 1
while line:
    line = input.readline()

**** filetest-undo.py

import Bio.File

input = Bio.File.UndoHandle(file("/scratch/hoffman/1.fa"))

line = 1
while line:
    line = input.readline()

****

Timing the run of these files gives the following results (real):

     filetest.py: 0m12.703s 0m12.215s 0m12.331s
filetest-undo.py: 0m30.135s 0m29.676s 0m30.165s

There is about a 150% increase in the amount of time it takes to do
input using readline() with UndoHandle. The overhead of loading
Bio.File is minimal:

$ time python -c "import Bio.File"

real    0m0.418s
user    0m0.090s
sys     0m0.080s

$ time python -c "None"

real    0m0.070s
user    0m0.010s
sys     0m0.030s

This kind of increase on basic I/O means much one one is doing big
jobs, in my opinion. I wasn't volunteering to rewrite anything to not
use UndoHandle but people might consider it when writing future
stuff. And I might try rewriting some stuff anyway. Any thoughts?
-- 
Michael Hoffman <hoffman at ebi.ac.uk>
European Bioinformatics Institute