[BioRuby] Fastq performances

Raoul Bonnal bonnal at ingm.org
Tue Mar 29 06:16:11 EDT 2011


On 29/mar/2011, at 11.56, Peter Cock wrote:

> On Mon, Mar 28, 2011 at 4:49 PM, Raoul Bonnal <bonnal at ingm.org> wrote:
>> Note: output from to_biosequence.output(:fastq_illumina) is not equal to the
>> input (still from illumina) the sequence(na and quality is wrapped to 70 chas)
>> and the header is repeated. Is it my fault is some part of the code ? I'll put
>> the code in github asap.
> 
> When writing FASTQ, I would expect BioRuby to omit the repeated header
> on the plus line, and NOT to line wrap the sequence and quality lines. This
> is deliberate, see http://dx.doi.org/10.1093/nar/gkp1137 for details.
I did a little  mistake in the previous e-mail, when I was talking about repeated header it was the first line @H125:1:1108:1188:2036#0/1 H125:1:1108:1188:2036#0/1

Input:
@H125:1:1108:1188:2036#0/1
CTTGTATGCAGCATCCCCTTCTTGCCTAGGGACTTGAAGGGCCAGGCTTCCTGTCATTGCCTCACTCAAATGTAGC
+
gggggggggggggegggggffggeggegggeagge^ggdbcgggcdgedegfggffff^ffffefdeeZefccceg

Output created with fastq_read.to_biosequence.output(:fastq_illumina), same with sanger format.
@H125:1:1108:1188:2036#0/1 H125:1:1108:1188:2036#0/1
CTTGTATGCAGCATCCCCTTCTTGCCTAGGGACTTGAAGGGCCAGGCTTCCTGTCATTGCCTCACTCAAA
TGTAGC
+
gggggggggggggegggggffggeggegggeagge^ggdbcgggcdgedegfggffff^ffffefdeeZe
fccceg

> 
> So, if your input did have the repeated header and/or wrapping, then yes,
> the input would not match the output.
My input is not wrapped and has no repeated header.

Thanks Peter.
--
Ra










More information about the BioRuby mailing list