[Biopython-dev] [Bug 2382] Generic FASTA parser

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Wed Oct 17 02:17:28 UTC 2007


http://bugzilla.open-bio.org/show_bug.cgi?id=2382





------- Comment #9 from mdehoon at ims.u-tokyo.ac.jp  2007-10-16 22:17 EST -------
If all these special fasta files are coming from Roche Diagnostics, I'd suggest
to create a module rather than trying to put this in Bio.SeqIO. Bio.SeqIO is
one of the few modules in Biopython that is used by most users, so I'd like to
keep it clean as much as possible. To avoid confusion for users who just want
to parse regular Fasta files, I think the module should not be called
Bio.Fasta. In addition, I doubt we'd get much code reuse from a generic
Bio.Fasta module beyond what is needed for the Roche files, since the only
thing they have in common is that they use ">" to separate records.

With a separate module to handle the Roche files, my preferred usage would be
something like this:

from Bio import SeqIO, GSFlex # Or whatever you'd like to call it

seqrecords = SeqIO.parse(open("mysequences.fa"), "fasta")
qualities = GSFlex.parse(open("myqualities.qual"), "quality")

for seqrecord, quality in zip(seqrecords, qualities):
    seqrecord.quality = quality

Note that "quality" is currently not a field of the SeqRecord class, but with
SeqRecord being a Python class, we can just add fields on the fly.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.



More information about the Biopython-dev mailing list