[BioPython] QIO

Tim Peters tim_one@email.msn.com
Mon, 4 Oct 1999 21:43:32 -0400


[Andrew Dalke, about the QIO extension at
 http://members.xoom.com/_XOOM/meowing/python/index.html
]

[quotes Tim from one msg[
> Under Win95 it was between 2-3 times as fast as 1.5.2 Python doing
> a native readline() loop over an 11Mb text file.

[and then from an earlier one]
> Line-at-a-time input on my platform is about 3 times faster in
> Perl than Python;

[then reaches a surprising conclusion]
> then Perl and Python should be neck and neck for reading large
> data files when the run-time is read bound.

3 > 2-3, even if you don't read the latter as subtraction <wink>.  That is,
Perl remains quicker.

None of this is a mystery, BTW:  Perl cheats.  That is, it uses
platform-specific code to subvert C's stdio, and that's the entire source of
its speed advantage (Perl is faster than C!) here.  The QIO extension is on
the right track, but its inner reading loop isn't as efficient as Perl's
inner reading loop, and its author indulged in feature bloat.

Without changing anything in Python, you can get a major speed boost by
"chunking" the input, as in:

BUFSIZE = 100000
while 1:
    lines = f.readlines(BUFSIZE)
    if not lines:
        break
    for line in lines:
        xxx

instead of:

while 1:
    line = f.readline()
    if not line:
        break
    xxx

I'm surprised people don't do this more often -- I suppose because it's 5
lines of boilerplate instead of 4 <wink>.

not-opposed-to-fast-input-ly y'rs  - tim