[Biojava-dev] Extension of the org.biojava3.sequencing.io.fastq classes

Hannes Brandstätter-Müller biojava at hannes.oib.com
Mon Feb 27 06:00:19 UTC 2012


The classes will be moved soon, most likely to
org.biojava3.genome.parsers or another custom subpackage under genome.
I think I'll let the more experienced biojava devs decide where to put
them.

If you want to review the changes I made, these are the Files where I
added code:

Fastq
FastqVariant
NewIlluminaFastqReader
NewIlluminaFastWriter

test:
FastqTest

The others should only contain reformatting.

Hannes


On Mon, Feb 27, 2012 at 05:11, Michael Heuer <heuermh at gmail.com> wrote:
> Hello Hannes,
>
> Nice work!
>
> All those test cases were generated to ensure that FASTQ support was identical among all the various bio* projects. That was the whole point of the paper and of the current code.
>
> We need to maintain this support, even if Illumina did change formats. I can review your other changes this week, although with your reformatting it might be hard to see the diff.
>
>   michael
>
>
> On Feb 26, 2012, at 4:59 PM, Hannes Brandstätter-Müller<biojava at hannes.oib.com> wrote:
>
>> Hi (now) fellow devs :)
>>
>> So, I spent some time this weekend on the
>> org.biojava3.sequencing.io.fastq classes.
>>
>> This is what I did:
>>
>> *) the code formatting was not like in the other files, my automatic
>> formatting in the IDE changed that to the "normal java standard", I
>> hope noone feels offended by that.
>> *) I implemented a new Fastq Reader/Writer for the new Illumina Fastq
>> Formatting (according to Wikipedia version 1.8, new this month
>> http://en.wikipedia.org/wiki/FASTQ_format#Encoding - it's a bit
>> contradictory, some say it's directly Sanger, some say it has Phred
>> values up to 41?)
>> *) I extended the Fastq class to be able to generate DNASequence
>> representations with the Quality (as Phred Numbers) added as Feature
>> (QualityFeature, also new)
>> *) I extended the Fastq class to have a contructor that accepts a
>> DNASequence (if the quality feature is present; might need a bit more
>> refinement there)
>> *) as a consequence, Fastq can now translate between the Fastq
>> variants and the Phred Fasta/Qual file format (I'll add a dedicated
>> parser/Fastq constructor or reader/writer for that format later, but
>> that's rather trivial)
>>
>>        Fastq sangerfastq = new Fastq("description", "ACGTA",
>> "I?5+\"", FastqVariant.FASTQ_SANGER);
>>        DNASequence dnaSequence = sangerfastq.getDNASequence();
>>        // dnaSequence has the Phred qualities [40 30 20 10 1]
>>        Fastq illuminafastq = new Fastq(dnaSequence,
>> FastqVariant.FASTQ_ILLUMINA);
>>        // assertEquals("h^TJA", illuminafastq.getQuality());
>>        dnaSequence = illuminafastq.getDNASequence();
>>        Fastq solexafastq = new Fastq(dnaSequence, FastqVariant.FASTQ_SOLEXA);
>>        // assertEquals("h^TJ;", solexafastq.getQuality());
>>
>> *) I have added some test cases for my code, but I might have lowered
>> the awesome test coverage in that module. Was that generated by hand
>> or by some tool?
>>
>> I hope someone else will find that useful (at least we can boast Fastq
>> support now; someone add that to the Fastq wiki page once we release
>> 3.0.3!)
>>
>> Hannes
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev




More information about the biojava-dev mailing list