[Biopython] FASTQ to qual+fasta

Peter Cock p.j.a.cock at googlemail.com
Sun Jan 16 22:58:30 UTC 2011


On Sun, Jan 16, 2011 at 10:35 PM, Iddo Friedberg  wrote:
> On 01/16/2011 02:25 PM, Peter Cock wrote:
>>
>> On Sun, Jan 16, 2011 at 6:48 PM, Iddo Friedberg  wrote:
>>>
>>> question regarding the use of SeqIO.convert: how do I convert a
>>> FASTQ fileto qual and fasta files? Currently it seems that I have
>>> to run SeqIO.convert twice e.g.:
>>>
>>>
>>>  SeqIO.convert(open("infile.fastq"),"fastq",open("outfile.qual","w"),"qual")
>>>  SeqIO.convert(open("infile.fastq"),"fastq",open("outfile.fasta","w"),"fasta")
>>>
>>> Or am I missing something?
>>>
>>> Thanks,
>>>
>>> ./I
>>
>> Hi Iddo,
>>
>> That is almost the simplest solution, yes. You can use filename directly:
>>
>> SeqIO.convert("infile.fastq", "fastq", "outfile.qual", "qual")
>> SeqIO.convert("infile.fastq", "fastq", "outfile.fasta", "fasta")
>>
>> Is it a bit slow for you?
>>
>
> Well, although elegant, in this case I am running two loops, where one
> should suffice.

KISS?

>> Using SeqIO.convert(...) in this case does use optimised code for FASTQ
>> to FASTA, but currently we don't have a similar fast FASTQ to QUAL
>> function. See Bio/SeqIO/_convert.py if you want to know how this is
>> implemented. I can see several tricks for FASTQ to QUAL which should
>> work... do you fancy trying this yourself?
>
> I wish I had the time.... :(

I can picture how I'd solve this - it shouldn't take me too long.

>> Alternatively, you could try combining a single call to SeqIO.parse(...)
>> to iterate over the records as SeqRecord objects with itertools.tee to
>> split this iterator in two to give it to two copies of SeqIO.write(...) to write
>> FASTA and QUAL. I don't know how well that would work with memory
>> consumption, but it would make only a single pass though the FASTQ file.
>
> That's actually what I ended up doing.

Do you have the timings of this versus the two calls to SeqIO.convert()?
I'd also be curious to see your code for this.

>> If speed really matters here, first we should add FASTQ to QUAL
>> to Bio/SeqIO/_convert.py and if that isn't enough, do a special case for
>> FASTQ to FASTA and QUAL (to live in Bio.SeqIO.QualityIO I guess).
>
> I think a fastq to fasta & qual would be best. I'll look into the QualityIO
> module and see if my code can be massaged in there.

Maybe - assuming it would be faster than two calls to SeqIO.convert
(once FASTQ to QUAL is optimised).

Peter




More information about the Biopython mailing list