[Biopython-dev] [Bug 2767] Bio.SeqIO support for FASTQ and QUAL files

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Thu Feb 19 13:09:13 EST 2009


http://bugzilla.open-bio.org/show_bug.cgi?id=2767





------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk  2009-02-19 13:09 EST -------
(In reply to comment #0)
> In order to use a paired "fasta" and "qual" file you might do this:
> 
> def merge_fasta_qual(fasta_record, qual_record) :
>     """Modifies the fasta_record in place, and also returns it."""
>     assert fasta_record.id == qual_record.id
>     assert len(f_rec) == len(q_rec.annotations["phred_quality"])
>     f_rec.annotations["phred_quality"] = q_rec.annotations["phred_quality"]
>     return f_rec
> 
> from Bio import SeqIO
> records = [merge_fasta_qual(f_rec, q_rec) for (f_rec, q_rec) in \
>            zip(SeqIO.parse(open("example.fasta"), "fasta"),
>                SeqIO.parse(open("example.qual"), "qual"))]
> 
> I think it would probably make sense to offer this kind of functionality in
> the Bio.SeqIO.QualityIO module itself, as this code above has several draw
> backs (e.g. the zip makes a list in memory, rather than a generator).

Alternatively, if you have enough RAM to hold all the records in memory at
once,
then a simple dictionary approach using just Bio.SeqIO methods would also work.
 This was inspired by Jared's related example at the end of Bug 2382 comment 0.

>>> from Bio import SeqIO
>>> reads = SeqIO.to_dict(SeqIO.parse(open("Quality/example.fasta"), "fasta"))
>>> for record in SeqIO.parse(open("Quality/example.qual"), "qual") :
...     reads[record.id].annotations["phred_quality"] =
record.annotations["phred_quality"]

You can then access any record by its key, and get both the sequence and the
quality scores.

>>> print reads["EAS54_6_R1_2_1_540_792"].format("fastq")
@EAS54_6_R1_2_1_540_792
TTGGCAGGCCAAGGCCGATGGATCA
+
;;;;;;;;;;;7;;;;;-;;;3;83
<BLANKLINE>

This is neat, but given QUAL files are often very very large, wanting to use an
iterator may be more typical.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


More information about the Biopython-dev mailing list