[Biojava-dev] Extension of the org.biojava3.sequencing.io.fastq classes

Hannes Brandstätter-Müller biojava at hannes.oib.com
Mon Feb 27 09:05:41 UTC 2012


On Mon, Feb 27, 2012 at 10:00, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Sunday, February 26, 2012, Hannes Brandstätter-Müller wrote:
>> *) I implemented a new Fastq Reader/Writer for the new Illumina Fastq
>> Formatting (according to Wikipedia version 1.8, new this month
>> http://en.wikipedia.org/wiki/FASTQ_format#Encoding - it's a bit
>> contradictory, some say it's directly Sanger, some say it has Phred
>> values up to 41?
>
>
> It *is* using the standard Sanger FASTQ encoding, so unless
> you are also doing something special with the Illumina headers
> (e.g. their new paired end naming, of quality control flags),
> there doesn't seem to be any need for a new class. Note that
> Sanger FASTQ is quite happy with high PHRED qualities
> and gets used like that when describing contigs. i.e. By
> combining many reads the probability of error goes down,
> so the PHRED score goes up.
>
> See also:
>
> http://seqanswers.com/forums/showthread.php?t=8895
>
> Peter
>

Indeed, Sanger Variant would be sufficient too, but adding a separate
variant offers the possibility to "do something with the headers" if
needed; also, if you're a die-hard Illumina fan and do not want the
filthy word "sanger" in your code, you can avoid it that way (just
kidding, of course).

I'm open to suggestions. should we keep it and risk code duplication
issues for identical encodings, or should we throw it away?

Hannes




More information about the biojava-dev mailing list