[Biopython-dev] [Bug 2767] Bio.SeqIO support for FASTQ and QUAL files
bugzilla-daemon at portal.open-bio.org
bugzilla-daemon at portal.open-bio.org
Thu Feb 19 18:09:13 UTC 2009
http://bugzilla.open-bio.org/show_bug.cgi?id=2767
------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-19 13:09 EST -------
(In reply to comment #0)
> In order to use a paired "fasta" and "qual" file you might do this:
>
> def merge_fasta_qual(fasta_record, qual_record) :
> """Modifies the fasta_record in place, and also returns it."""
> assert fasta_record.id == qual_record.id
> assert len(f_rec) == len(q_rec.annotations["phred_quality"])
> f_rec.annotations["phred_quality"] = q_rec.annotations["phred_quality"]
> return f_rec
>
> from Bio import SeqIO
> records = [merge_fasta_qual(f_rec, q_rec) for (f_rec, q_rec) in \
> zip(SeqIO.parse(open("example.fasta"), "fasta"),
> SeqIO.parse(open("example.qual"), "qual"))]
>
> I think it would probably make sense to offer this kind of functionality in
> the Bio.SeqIO.QualityIO module itself, as this code above has several draw
> backs (e.g. the zip makes a list in memory, rather than a generator).
Alternatively, if you have enough RAM to hold all the records in memory at
once,
then a simple dictionary approach using just Bio.SeqIO methods would also work.
This was inspired by Jared's related example at the end of Bug 2382 comment 0.
>>> from Bio import SeqIO
>>> reads = SeqIO.to_dict(SeqIO.parse(open("Quality/example.fasta"), "fasta"))
>>> for record in SeqIO.parse(open("Quality/example.qual"), "qual") :
... reads[record.id].annotations["phred_quality"] =
record.annotations["phred_quality"]
You can then access any record by its key, and get both the sequence and the
quality scores.
>>> print reads["EAS54_6_R1_2_1_540_792"].format("fastq")
@EAS54_6_R1_2_1_540_792
TTGGCAGGCCAAGGCCGATGGATCA
+
;;;;;;;;;;;7;;;;;-;;;3;83
<BLANKLINE>
This is neat, but given QUAL files are often very very large, wanting to use an
iterator may be more typical.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
More information about the Biopython-dev
mailing list