[Biopython-dev] Performance of Bio.File.UndoHandle

Michael Hoffman hoffman at ebi.ac.uk
Thu Oct 16 05:45:07 EDT 2003


On Wed, 15 Oct 2003, Jeffrey Chang wrote:

> That is a nice implementation.  However, Biopython already has at least 
> 3 Fasta parsers!
>    Bio/Fasta
>    Bio/SeqIO/FASTA
>    Bio/expressions/fasta

There sure are. We should probably be cutting them rather than adding
them I suppose. :-) Have you thought of deprecating Bio.Fasta since it
is the slowest?

I know that the official path is to get people towards FormatIO but
Bio.expressions.fasta is more than 12x slower than my
implementation/Bio.SeqIO.FASTA (comparable as you predicted)! For one
test:

FormatIO: 3.085s/3.094s/3.154s
LightIterator: 0.246s/0.243s/0.245s

Unless of course, I am using Bio.expressions.fasta incorrectly. It is
a bit hard to figure out what to do as there are no docstrings, unit
tests, or other documentation that I can see. Here is the code,
anyway. Please let me know if I did this in an inefficient way (this
is a slight speedup over using SeqRecord.io).

=====
from Bio import FormatIO

iterator = FormatIO.FormatIO("SeqRecord", default_input_format = "fasta").readFile(file("/scratch/test.fna"))
for record in iterator:
    pass
=====
-- 
Michael Hoffman <hoffman at ebi.ac.uk>
European Bioinformatics Institute




More information about the Biopython-dev mailing list