[Biopython-dev] Performance of Bio.File.UndoHandle
Michael Hoffman
hoffman at ebi.ac.uk
Thu Oct 16 05:45:07 EDT 2003
On Wed, 15 Oct 2003, Jeffrey Chang wrote:
> That is a nice implementation. However, Biopython already has at least
> 3 Fasta parsers!
> Bio/Fasta
> Bio/SeqIO/FASTA
> Bio/expressions/fasta
There sure are. We should probably be cutting them rather than adding
them I suppose. :-) Have you thought of deprecating Bio.Fasta since it
is the slowest?
I know that the official path is to get people towards FormatIO but
Bio.expressions.fasta is more than 12x slower than my
implementation/Bio.SeqIO.FASTA (comparable as you predicted)! For one
test:
FormatIO: 3.085s/3.094s/3.154s
LightIterator: 0.246s/0.243s/0.245s
Unless of course, I am using Bio.expressions.fasta incorrectly. It is
a bit hard to figure out what to do as there are no docstrings, unit
tests, or other documentation that I can see. Here is the code,
anyway. Please let me know if I did this in an inefficient way (this
is a slight speedup over using SeqRecord.io).
=====
from Bio import FormatIO
iterator = FormatIO.FormatIO("SeqRecord", default_input_format = "fasta").readFile(file("/scratch/test.fna"))
for record in iterator:
pass
=====
--
Michael Hoffman <hoffman at ebi.ac.uk>
European Bioinformatics Institute
More information about the Biopython-dev
mailing list