[Biopython] FASTQ to qual+fasta

Peter Cock p.j.a.cock at googlemail.com
Sun Jan 16 19:25:43 UTC 2011


On Sun, Jan 16, 2011 at 6:48 PM, Iddo Friedberg <idoerg at gmail.com> wrote:
> question regarding the use of SeqIO.convert: how do I convert a FASTQ file
> to qual and fasta files? Currently it seems that I have to run SeqIO.convert
> twice e.g.:
>
>  SeqIO.convert(open("infile.fastq"),"fastq",open("outfile.qual","w"),"qual")
>  SeqIO.convert(open("infile.fastq"),"fastq",open("outfile.fasta","w"),"fasta")
>
> Or am I missing something?
>
> Thanks,
>
> ./I

Hi Iddo,

That is almost the simplest solution, yes. You can use filename directly:

SeqIO.convert("infile.fastq", "fastq", "outfile.qual", "qual")
SeqIO.convert("infile.fastq", "fastq", "outfile.fasta", "fasta")

Is it a bit slow for you?

Using SeqIO.convert(...) in this case does use optimised code for FASTQ
to FASTA, but currently we don't have a similar fast FASTQ to QUAL
function. See Bio/SeqIO/_convert.py if you want to know how this is
implemented. I can see several tricks for FASTQ to QUAL which should
work... do you fancy trying this yourself?

Alternatively, you could try combining a single call to SeqIO.parse(...) to
iterate over the records as SeqRecord objects with itertools.tee to split
this iterator in two to give it to two copies of SeqIO.write(...) to write
FASTA and QUAL. I don't know how well that would work with memory
consumption, but it would make only a single pass though the FASTQ file.

If speed really matters here, first we should add FASTQ to QUAL
to Bio/SeqIO/_convert.py and if that isn't enough, do a special case for
FASTQ to FASTA and QUAL (to live in Bio.SeqIO.QualityIO I guess).

Peter




More information about the Biopython mailing list