[Biopython-dev] Fasta parser
Colosimo, Marc E.
mcolosimo at mitre.org
Sun Jul 2 18:36:22 UTC 2006
On 7/1/06 5:47 PM, "Michiel de Hoon" <mdehoon at c2b2.columbia.edu> wrote:
> Hi everybody,
>
> The Biopython shows the following approach to parsing a Fasta file:
>
>>>> from Bio import Fasta
>>>> parser = Fasta.RecordParser()
>>>> file = open("ls_orchid.fasta")
>>>> iterator = Fasta.Iterator(file, parser)
>>>> cur_record = iterator.next()
>
> But for large Fasta files, it's very slow, compared to file.read(),
> which may be due to going through Martel (I believe the same was true
> for large GenBank files).
>
> So I'm thinking about writing a simple-minded Fasta parser for better
> performance with large files. What I'm wondering about:
> 1) Is there some advantage that I overlooked of using Martel for parsing
> Fasta files?
> 2) Why is it necessary to create a parser first and passing it to
> Fasta.Iterator? Are there any cases where Fasta.Iterator uses something
> other than a Fasta.RecordParser?
Yes!!!! I use Fasta.SequenceParser which gives me a SeqRecord Object
(Bio.SeqRecord) not some odd Fasta.Record Object that I would have to then
remap into a SeqRecord.
Also, could someone re-run epydoc! My changes in the code have not made it
to the on-line API docs.
Marc
More information about the Biopython-dev
mailing list