[Biojava-l] converting fastq format
Daniel Katzel
dkatzel at gmail.com
Thu Sep 17 02:26:10 UTC 2015
The fastq file I was using is part of the 1000genomes phase 3 dataset (very
large gzipped files) with about 25 million records each. The reads are
short so it is probably old.
Here's the file I used
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG00096/sequence_read/SRR062634_1.filt.fastq.gz
I made a histogram of the encoded quality values as ascii:
33 : 166838
34 : 0
35 : 100598505
36 : 26817
37 : 156873
38 : 268700
39 : 419677
40 : 807326
41 : 997720
42 : 889665
43 : 946268
44 : 2372479
45 : 4147316
46 : 760108
47 : 850433
48 : 1433894
49 : 1165379
50 : 1769347
51 : 2493316
52 : 2966864
53 : 12457233
54 : 3172484
55 : 3741809
56 : 3722004
57 : 4320581
58 : 23804570
59 : 6554713
60 : 7207725
61 : 33021639
62 : 13106991
63 : 60909837
64 : 36753951
65 : 70258165
66 : 91889938
67 : 102533947
68 : 129093976
69 : 368143099
70 : 231023980
71 : 1089945133
You can see the lowest value is 33 which means SANGER encoding.
I think the problem is the FastqWriter code only allows Fastq objects to be
written that have the same FastqVariant object. I also didn't see any unit
tests in biojava that tested converting the formats. In fact there are
several tests that make sure the Fastq being written has the same
FastqVariant as the type of the writer.
For example
https://github.com/biojava/biojava/blob/master/biojava-sequencing/src/test/java/org/biojava/nbio/sequencing/io/fastq/IlluminaFastqWriterTest.java
has a test to make sure an IlluminaFastqWriter only writes Fastq objects
that are FastqVariant.FASTQ_ILLUMINA
public void testValidateNotIlluminaVariant()
{
IlluminaFastqWriter writer = new IlluminaFastqWriter();
Appendable appendable = new StringBuilder();
Fastq invalid = new FastqBuilder()
.withDescription("description")
.withSequence("sequence")
.withQuality("quality_")
.withVariant(FastqVariant.FASTQ_SANGER)
.build();
try
{
writer.append(appendable, invalid);
fail("validate not fastq-illumina variant expected IOException");
}
catch (IOException e)
{
// expected
}
}
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biojava-l/attachments/20150916/54ac9481/attachment-0001.html>
More information about the Biojava-l
mailing list