[Biopython-dev] Fasta parser

Sat Jul 1 22:52:43 UTC 2006

Michiel,

There is actually a simple minded fasta reader/writer  that does not use Martel. Bio.SeqIO.FASTA

./I

--
Iddo Friedberg, PhD
Burnham Institute for Medical Research
10901 N. Torrey Pines Rd.
La Jolla, CA 92037 USA
T: +1 858 646 3100 x3516
http://iddo-friedberg.org
http://BioFunctionPrediction.org

-----Original Message-----
From: biopython-dev-bounces at lists.open-bio.org on behalf of Michiel de Hoon
Sent: Sat 7/1/2006 2:47 PM
To: biopython-dev at biopython.org
Subject: [Biopython-dev] Fasta parser

Hi everybody,

The Biopython shows the following approach to parsing a Fasta file:

 >>> from Bio import Fasta
 >>> parser = Fasta.RecordParser()
 >>> file = open("ls_orchid.fasta")
 >>> iterator = Fasta.Iterator(file, parser)
 >>> cur_record = iterator.next()

But for large Fasta files, it's very slow, compared to file.read(), 
which may be due to going through Martel (I believe the same was true 
for large GenBank files).

So I'm thinking about writing a simple-minded Fasta parser for better 
performance with large files. What I'm wondering about:
1) Is there some advantage that I overlooked of using Martel for parsing 
Fasta files?
2) Why is it necessary to create a parser first and passing it to 
Fasta.Iterator? Are there any cases where Fasta.Iterator uses something 
other than a Fasta.RecordParser?

--Michiel.
_______________________________________________
Biopython-dev mailing list
Biopython-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython-dev