[Biojava-dev] Extension of the org.biojava3.sequencing.io.fastq classes

Peter Cock p.j.a.cock at googlemail.com
Mon Feb 27 09:00:17 UTC 2012


On Sunday, February 26, 2012, Hannes Brandstätter-Müller wrote:

> Hi (now) fellow devs :)
>
> So, I spent some time this weekend on the
> org.biojava3.sequencing.io.fastq classes.
>
> This is what I did:
>
> *) the code formatting was not like in the other files, my automatic
> formatting in the IDE changed that to the "normal java standard", I
> hope noone feels offended by that.
> *) I implemented a new Fastq Reader/Writer for the new Illumina Fastq
> Formatting (according to Wikipedia version 1.8, new this month
> http://en.wikipedia.org/wiki/FASTQ_format#Encoding - it's a bit
> contradictory, some say it's directly Sanger, some say it has Phred
> values up to 41?
>

It *is* using the standard Sanger FASTQ encoding, so unless
you are also doing something special with the Illumina headers
(e.g. their new paired end naming, of quality control flags),
there doesn't seem to be any need for a new class. Note that
Sanger FASTQ is quite happy with high PHRED qualities
and gets used like that when describing contigs. i.e. By
combining many reads the probability of error goes down,
so the PHRED score goes up.

See also:

http://seqanswers.com/forums/showthread.php?t=8895

Peter




More information about the biojava-dev mailing list