[Bioperl-l] Creating a fastq format file?
Heikki Lehvaslaiho
heikki.lehvaslaiho at gmail.com
Mon Apr 27 05:42:03 UTC 2009
> I have tried to summarise this in a central place:
> http://en.wikipedia.org/wiki/FASTQ_format
Torsten,
Thanks for putting this together. Very helpful.
Do you have a plan of action? Let me propose one for BioPerl. It
based on following assumptions:
1. There is multitude of different ways of coding quality values out there.
2. Bio::Seq::Quality is agnostic of any quality value range rules
3. The emerging open standard is the Sanger fastq specification
4. Open source programs use the Sanger fastq specs
>From these it follows that:
1. BioPerl should support Sanger fastq standard
1.1. it already does and there are other SeqIO modules for dealing
with other non-fastq formats.
2. BioPerl should offer simple ways of converting between quality range rules
2.1. Have a generic method accessible from Bio::Seq::Quality with
preset versions of the method for converting between known variants
(Sanger fastq and the two Illumina versions)
For example:
range_convert ($from_lower, $from_upper, $to_lower, $to_upper, $value)
throw if $value < $from_lower or $value > $from_upper
return $newvalue
range_convert_illumina2fastq(), range_convert_fastq2illumina(),
range_convert_fastq2phred(), range_convert_phred2fastq()....
(assuming that illumina 1.3 eq phred)
2.2. Bio::SeqIO::Fastq::next_seq methods should convert Illumina
qualities into Sanger fastq on the fly
2.2.1 Bio::SeqIO::Fastq::next_seq should detect the incoming stream of
quality value range either automatically or be given a keyword
parameter indicating the range.
2.2.2. Bio::SeqIO::Fastq::next_seq should throw an error if it detects
a quality value out of range.
2.2.3. Bio::SeqIO::Fastq::write_seq should throw an error if it
detects a quality value out of range.
2.2.4. It would be useful but not absolutely necessary for
Bio::SeqIO::Fastq::write_seq to be able to write out in Illumina
ranges
What do you think?
-Heikki
2009/4/26 Torsten Seemann <torsten.seemann at infotech.monash.edu.au>:
>> > This might be a good place to ask the question: having looked at the
>> > fastq.pm page, is the fastq format defined (only) by a "@'" followed by
>> a
>> > sequence line and a "+" header followed by a quality line and the two
>> > headers have to agree? Now that Illumina is using phred scaling, are
>> > 'Sanger' and 'Illumina' versions the same?
>>
>> No they aren't the same, Illumina still encodes the ascii as value + 64
>> and Sanger as value + 33.
>>
>
> Illumina have now CHANGED how they calculate the quality value however in
> the last month or so... Their Q range used to be -5..40 mapped to ASCII 64+,
> but now they produce Q >= 0 and it is unclear if they start at 69 or 64
> now...
>
> I have tried to summarise this in a central place:
>
> http://en.wikipedia.org/wiki/FASTQ_format
>
> Corrections welcome!
>
>
> --Torsten Seemann
> --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash
> University, AUSTRALIA
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
--
-Heikki
Heikki Lehvaslaiho - skype:heikki_lehvaslaiho
cell: +27 (0)714328090
Sent from Claremont, WC, South Africa
More information about the Bioperl-l
mailing list