[Biopython-dev] Performance of Bio.File.UndoHandle

Jeffrey Chang jchang at jeffchang.com
Wed Oct 15 23:10:39 EDT 2003


On Wednesday, October 15, 2003, at 10:51  AM, Michael Hoffman wrote:

> On Mon, 13 Oct 2003, Jeffrey Chang wrote:
>
>> The UndoHandle creates overhead on readline due to its extra if checks
>> and function calls.
>>
>> [...]
>>
>> The best way to speed this up might be to recode the class in C as a
>> type.  This would help because the if statement would be evaluated in
>> C, and also you can cache the self._handle.readline for a faster
>> function lookup.
>
> Actually, I was thinking along the lines of recoding the class that
> calls UndoHandle instead (see below). This new implementation does not
> seem to be significantly faster than Bio.Fasta.Iterator when the
> latter is used without a parser. However you get the parsing done for
> free with this implementation! It seems to be about twice as fast as
> using Bio.Fasta.Iterator with Bio.Fasta.RecordParser, and provides the
> same functionality in a more lightweight package--a tuple of
> (defline, data) instead of a Bio.Record object. What do you think?

[cut code]

That is a nice implementation.  However, Biopython already has at least 
3 Fasta parsers!
   Bio/Fasta
   Bio/SeqIO/FASTA
   Bio/expressions/fasta

Bio/Fasta, the one you compared against, is easily the slowest one.  
Bio/SeqIO/FASTA is very similar to your implementation and not likely 
to be significantly faster or slower.  Bio/expressions/fasta uses 
Martel.  I don't know how well that will perform.  The parsing part 
should be blazingly fast (since it is mostly in C), but building the 
object will be slow.  It might be a wash.

Jeff




More information about the Biopython-dev mailing list