[Biojava-dev] Extension of the org.biojava3.sequencing.io.fastq classes

Hannes Brandstätter-Müller biojava at hannes.oib.com
Sun Feb 26 22:59:52 UTC 2012


Hi (now) fellow devs :)

So, I spent some time this weekend on the
org.biojava3.sequencing.io.fastq classes.

This is what I did:

*) the code formatting was not like in the other files, my automatic
formatting in the IDE changed that to the "normal java standard", I
hope noone feels offended by that.
*) I implemented a new Fastq Reader/Writer for the new Illumina Fastq
Formatting (according to Wikipedia version 1.8, new this month
http://en.wikipedia.org/wiki/FASTQ_format#Encoding - it's a bit
contradictory, some say it's directly Sanger, some say it has Phred
values up to 41?)
*) I extended the Fastq class to be able to generate DNASequence
representations with the Quality (as Phred Numbers) added as Feature
(QualityFeature, also new)
*) I extended the Fastq class to have a contructor that accepts a
DNASequence (if the quality feature is present; might need a bit more
refinement there)
*) as a consequence, Fastq can now translate between the Fastq
variants and the Phred Fasta/Qual file format (I'll add a dedicated
parser/Fastq constructor or reader/writer for that format later, but
that's rather trivial)

        Fastq sangerfastq = new Fastq("description", "ACGTA",
"I?5+\"", FastqVariant.FASTQ_SANGER);
        DNASequence dnaSequence = sangerfastq.getDNASequence();
        // dnaSequence has the Phred qualities [40 30 20 10 1]
        Fastq illuminafastq = new Fastq(dnaSequence,
FastqVariant.FASTQ_ILLUMINA);
        // assertEquals("h^TJA", illuminafastq.getQuality());
        dnaSequence = illuminafastq.getDNASequence();
        Fastq solexafastq = new Fastq(dnaSequence, FastqVariant.FASTQ_SOLEXA);
        // assertEquals("h^TJ;", solexafastq.getQuality());

*) I have added some test cases for my code, but I might have lowered
the awesome test coverage in that module. Was that generated by hand
or by some tool?

I hope someone else will find that useful (at least we can boast Fastq
support now; someone add that to the Fastq wiki page once we release
3.0.3!)

Hannes



More information about the biojava-dev mailing list