[BioPython] QIO

Andrew Dalke dalke@bioreason.com
Tue, 05 Oct 1999 23:13:06 -0600


Tim Peters <tim_one@email.msn.com> said:
| 3 > 2-3, even if you don't read the latter as subtraction <wink>.  

Weeelllll, some necks are longer than others.  I do believe a
camel's neck is longer than that of a python (and just *where* is
the neck on a python?)

>Without changing anything in Python, you can get a major speed
> boost by "chunking" the input, as in:
>
>BUFSIZE = 100000
>while 1:
>     lines = f.readlines(BUFSIZE)
>     if not lines:
>         break
>     for line in lines:
>         xxx
>

Ewan Birney (of bioperl) wants to do something along the lines of
a cookbook for bioinformatics.  An equivalent for Python should
include this snippet.

>I'm surprised people don't do this more often -- I suppose because it's 5
>lines of boilerplate instead of 4 <wink>.

First off, I haven't gotten into the habit of using that idiom.
There are quite a few places where I could/should have done things
that way.

However, there are certain types of parsers I've written which
work like:

infile = open("spam.txt")
header = parse_header(infile)
content = []
while 1:
  data = parse_content(infile)
  if not data:
    break
  content.append(data)

that is, the input file object is passed around to different
routines which each expect to consume a line at a time.

Actually, when I did this sort of work in Perl I ended up passing
long @lists of lines around, and popping off the list to consume.
Made it easy when I needed to push back for look-aheads.


>not-opposed-to-fast-input-ly y'rs  - tim

Of course, I also want to take a look at Andrew Kuckling's memory
mapped IO to see how well that works.  And I'll have time to start
looking at this in only a few more weeks....

						Andrew
						dalke@acm.org