[Biopython-dev] Performance of Bio.File.UndoHandle
Jeffrey Chang
jchang at jeffchang.com
Wed Oct 15 23:10:39 EDT 2003
On Wednesday, October 15, 2003, at 10:51 AM, Michael Hoffman wrote:
> On Mon, 13 Oct 2003, Jeffrey Chang wrote:
>
>> The UndoHandle creates overhead on readline due to its extra if checks
>> and function calls.
>>
>> [...]
>>
>> The best way to speed this up might be to recode the class in C as a
>> type. This would help because the if statement would be evaluated in
>> C, and also you can cache the self._handle.readline for a faster
>> function lookup.
>
> Actually, I was thinking along the lines of recoding the class that
> calls UndoHandle instead (see below). This new implementation does not
> seem to be significantly faster than Bio.Fasta.Iterator when the
> latter is used without a parser. However you get the parsing done for
> free with this implementation! It seems to be about twice as fast as
> using Bio.Fasta.Iterator with Bio.Fasta.RecordParser, and provides the
> same functionality in a more lightweight package--a tuple of
> (defline, data) instead of a Bio.Record object. What do you think?
[cut code]
That is a nice implementation. However, Biopython already has at least
3 Fasta parsers!
Bio/Fasta
Bio/SeqIO/FASTA
Bio/expressions/fasta
Bio/Fasta, the one you compared against, is easily the slowest one.
Bio/SeqIO/FASTA is very similar to your implementation and not likely
to be significantly faster or slower. Bio/expressions/fasta uses
Martel. I don't know how well that will perform. The parsing part
should be blazingly fast (since it is mostly in C), but building the
object will be slow. It might be a wash.
Jeff
More information about the Biopython-dev
mailing list