[Biopython-dev] Reading sequences: FormatIO, SeqIO, etc
Peter (BioPython Dev)
biopython-dev at maubp.freeserve.co.uk
Mon Jul 31 17:41:49 UTC 2006
Peter wrote:
>>In the short term maybe we should just replace the internals of the
>>current Bio.Fasta module with a pure python implementation like
>>that in Bio.SeqIO.FASTA - good idea? Bad idea?
Marc wrote:
> I would keep them separate but change the documentation on the how-to
> site to point to using the Bio.SeqIO.FASTA since that is where I
> think we want people to start going. The code change to Bio.Fasta
> should be to add a depreciation warning.
Certainly long term we could do that. There may be advantages to the
current very flexible Bio.Fasta code that the SeqIO replacement may not
offer (e.g. if we focus on just parsing into SeqRecords).
Short Term
----------
Right now I guess most people dealing with Fasta files will be using
Bio.Fasta, and it is very slow, hence bug 2058:
http://bugzilla.open-bio.org/show_bug.cgi?id=2058
My patch makes Bio.Fasta almost as fast as Bio.SeqIO.FASTA according to
my tests (modest sized files).
If any of you could try this patch on your machines - on the off chance
that it causes problems for any existing code. It does pass
test_Fasta.py and test_Fasta2.py on Windows at least.
Medium/Long Term
----------------
We need to sort out what to do with Bio.SeqIO as currently the existing
code in Bio/SeqIO/generic.py and Bio/SeqIO/FASTA.py uses different
interfaces. But do agree that something like that should be OK.
I have been working on a possible replacement (but it doesn't seem to
have made it to the mailing list yet - must check my recent email).
Peter
More information about the Biopython-dev
mailing list