[Biojava-dev] Extension of the org.biojava3.sequencing.io.fastq classes

Mon Feb 27 16:12:56 UTC 2012

Peter Cock wrote:
> Hannes Brandstätter-Müller wrote:
>
>> Hi (now) fellow devs :)
>>
>> So, I spent some time this weekend on the
>> org.biojava3.sequencing.io.fastq classes.
>>
>> This is what I did:
>>
>> *) the code formatting was not like in the other files, my automatic
>> formatting in the IDE changed that to the "normal java standard", I
>> hope noone feels offended by that.
>> *) I implemented a new Fastq Reader/Writer for the new Illumina Fastq
>> Formatting (according to Wikipedia version 1.8, new this month
>> http://en.wikipedia.org/wiki/FASTQ_format#Encoding - it's a bit
>> contradictory, some say it's directly Sanger, some say it has Phred
>> values up to 41?
>
> It *is* using the standard Sanger FASTQ encoding, so unless
> you are also doing something special with the Illumina headers
> (e.g. their new paired end naming, of quality control flags),
> there doesn't seem to be any need for a new class. Note that
> Sanger FASTQ is quite happy with high PHRED qualities
> and gets used like that when describing contigs. i.e. By
> combining many reads the probability of error goes down,
> so the PHRED score goes up.
>
> See also:
>
> http://seqanswers.com/forums/showthread.php?t=8895

Hello Peter,

Would it be useful to add some fastq files generated by CASAVA 1.8
into the bio* test suite?  There are some linked in this post

http://seqanswers.com/forums/showpost.php?p=41542&postcount=55

I imagine we might find a better source though.

For biojava, it sounds like all we need is documentation in the
Illumina and Sanger variants?

   michael