[emboss-dev] EMBOSS seqret FASTQ support

Peter biopython at maubp.freeserve.co.uk
Tue Jul 21 13:10:17 EDT 2009


On Tue, Jul 21, 2009 at 8:43 AM, Peter Rice<pmr at ebi.ac.uk> wrote:
>
> Peter C. wrote:
>
>> Could anyone spot a "but" coming up?
>> ...
>> I would guess the problem is that quality line starts with a @,
>
> Urghh ... I left an extra '@' test in even though I meant to take it out
> before the release.
>
> I will make a patch for this ... have to look into a couple of your other
> queries at the same time as they are in the same source file.
>
> Thanks

I've got another issue for you, which I think is an rounding problem
converting negative Solexa scores into ASCII (which sounds a bit
strange), or assuming you store everything as PHRED scores in
memory, this could be in how you round negative Solexa scores
on conversion back to ASCII.

This can be neatly demonstrated with the following artificial FASTQ
file which uses the Solexa encoding covering scores 40 to -5
inclusive (which I understand to be the typical range likely to come
off an actual Solexa/Illumina machine):

$ more solexa_faked.fastq
@slxa_0001_1_0001_01
ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTNNNNNN
+slxa_0001_1_0001_01
hgfedcba`_^]\[ZYXWVUTSRQPONMLKJIHGFEDCBA@?>=<;

$ seqret -sequence solexa_faked.fastq -sformat fastq-solexa -osformat
fastq-solexa -stdout -auto
@slxa_0001_1_0001_01
ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTNNNNNN
+slxa_0001_1_0001_01
hgfedcba`_^]\[ZYXWVUTSRQPONMLKJIHGFEDCBA@@?>=<

$ embossversion
Reports the current EMBOSS version number
6.1.0

As I hope is clear, EMBOSS seqret has inflated the last five scores
by one. The original Solexa scores were:
40, 39, 38, 37, 36, 35, 34, 33, 32, 31, 30, 29, 28, 27, 26, 25, 24,
23, 22, 21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5,
4, 3, 2, 1, 0, -1, -2, -3, -4, -5

After putting this file through seqret, they become:
40, 39, 38, 37, 36, 35, 34, 33, 32, 31, 30, 29, 28, 27, 26, 25, 24,
23, 22, 21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5,
4, 3, 2, 1, 0, 0, -1, -2, -3, -4

Peter C.


More information about the emboss-dev mailing list