[Biopython] A third FASTQ variant from Illumina 1.3+ ?!!

Peter biopython at maubp.freeserve.co.uk
Fri Jun 5 12:02:24 UTC 2009


On Fri, Jun 5, 2009 at 12:47 PM, Peter<biopython at maubp.freeserve.co.uk> wrote:
> Oh dear - it sounds like Solexa/Illumina have just made the whole FASTQ
> thing much much worse by introducing a third version of the FASTQ file
> format. Curses! Again!
>
> http://seqanswers.com/forums/showthread.php?t=1526
> http://en.wikipedia.org/wiki/FASTQ_format
>
> In Biopython, "fastq" refers to the original Sanger FASTQ format which
> encodes a Phred quality score from 0 to 90 (or 93 in the latest code)
> using an ASCII offset of 33.
>
> In Biopython "fastq-solexa" refers to the first bastardised version of the
> FASTQ format introduced by Solexa/Illumina 1.0 format which encodes
> a Solexa/Illumina quality score (which can be negative) using an ACSII
> offset of 64. Why they didn't make the files easily distinguishable from
> Sanger FASTQ files escapes me!
>
> Apparently Illumina 1.3 introduces a third FASTQ format which encodes
> a PHRED quality score from 0 to 40 using ASCII 64 to 104. While they
> switched to PHRED scores, they appear to have decided to stick with
> the 64 offset - I can only assume this is so that existing tools expecting
> the old Solexa/Illumina FASTQ format data will still more or less work
> with this new variant (as for higher qualities the PHRED and Solexa
> scores are approximately equal).

This appears to be confirmed by the following thread, apparently with an
Illumina employee posting:
http://seqanswers.com/forums/showthread.php?t=1526

kmcarr wrote:
>> Out of curiosity why did you stick with ASCII(Q+64) instead of the
>> standard ASCII(Q+33)? It results in the minor annoyance of having
>> to remember to convert before use in programs which are expecting
>> Sanger FASTQ. It also means that there are now three types of
>> FASTQ files floating about; standard Sanger FASTQ with quality
>> scores expressed as ASCII(Qphred+33), Solexa FASTQ with
>> ASCII(Qsolexa+64) and Solexa FASTQ with ASCII(Qphred+64).

coxtonyj wrote:
> That is a fair point. The need to convert has always been present
> of course. We did give this some thought at the time and as I recall
> the rationale was that any code (ours or others) that was expecting
> Qsolexa+64 would probably still work if given Qphred+64, but that
> the conversion to Qphred+33 was at least now just a simple
> subtraction. But perhaps we should have bitten the bullet and gone
> with Qphred+33.

As you might guess from the tone of my earlier email, I think Illumina
should have "bitten the bullet" and switched to the original Sanger
FASTQ format rather than inventing another variant. But its too late
now :(

Peter



More information about the Biopython mailing list