[Biopython-dev] Performance of Bio.File.UndoHandle
Michael Hoffman
hoffman at ebi.ac.uk
Wed Oct 15 10:51:21 EDT 2003
On Mon, 13 Oct 2003, Jeffrey Chang wrote:
> The UndoHandle creates overhead on readline due to its extra if checks
> and function calls.
>
> [...]
>
> The best way to speed this up might be to recode the class in C as a
> type. This would help because the if statement would be evaluated in
> C, and also you can cache the self._handle.readline for a faster
> function lookup.
Actually, I was thinking along the lines of recoding the class that
calls UndoHandle instead (see below). This new implementation does not
seem to be significantly faster than Bio.Fasta.Iterator when the
latter is used without a parser. However you get the parsing done for
free with this implementation! It seems to be about twice as fast as
using Bio.Fasta.Iterator with Bio.Fasta.RecordParser, and provides the
same functionality in a more lightweight package--a tuple of
(defline, data) instead of a Bio.Record object. What do you think?
class LightIterator(object):
def __init__(self, handle):
self._handle = handle
self._defline = None
def __iter__(self):
return self
def next(self):
lines = []
defline_old = self._defline
while 1:
line = self._handle.readline()
if not line:
if not defline_old and not lines:
raise StopIteration
if defline_old:
self._defline = None
break
elif line[0] == '>':
self._defline = line[1:].rstrip()
if defline_old or lines:
break
else:
defline_old = self._defline
else:
lines.append(line.rstrip())
return defline_old, ''.join(lines)
--
Michael Hoffman <hoffman at ebi.ac.uk>
European Bioinformatics Institute
More information about the Biopython-dev
mailing list