[Bioperl-l] Creating a fastq format file?

Heikki Lehvaslaiho heikki.lehvaslaiho at gmail.com
Mon Apr 27 09:38:40 UTC 2009


I convinced at least myself to the degree that I wrote the
range_convert() method - with plenty of tests. I mention this now so
that no-one else need to start thinking through all the edge values.
:)

I'll contribute it to the code base once there is a consensus of best
way forward.

    -Heikki

2009/4/27 Heikki Lehvaslaiho <heikki.lehvaslaiho at gmail.com>:
>> I have tried to summarise this in a central place:
>> http://en.wikipedia.org/wiki/FASTQ_format
>
> Torsten,
>
> Thanks for putting this together. Very helpful.
>
> Do you have a plan of action?  Let me propose one for BioPerl. It
> based on following assumptions:
>
> 1. There is multitude of different ways of coding quality values out there.
> 2. Bio::Seq::Quality is agnostic of any quality value range rules
> 3. The emerging open standard is the Sanger fastq specification
> 4. Open source programs use the Sanger fastq specs
>
>
> From these it follows that:
>
>
> 1. BioPerl should support Sanger fastq standard
>
> 1.1. it already does and there are other SeqIO modules for dealing
> with other non-fastq formats.
>
> 2. BioPerl should offer simple ways of converting between quality range rules
>
> 2.1. Have a generic method accessible from Bio::Seq::Quality with
> preset versions of the method for converting between known variants
> (Sanger fastq and the two Illumina versions)
>
> For example:
>
> range_convert ($from_lower, $from_upper, $to_lower, $to_upper, $value)
>  throw if $value < $from_lower or $value > $from_upper
>  return $newvalue
>
> range_convert_illumina2fastq(), range_convert_fastq2illumina(),
> range_convert_fastq2phred(),  range_convert_phred2fastq()....
>
> (assuming that illumina 1.3 eq phred)
>
> 2.2. Bio::SeqIO::Fastq::next_seq methods should convert Illumina
> qualities into Sanger fastq on the fly
>
> 2.2.1 Bio::SeqIO::Fastq::next_seq should detect the incoming stream of
> quality value range either automatically or be given a keyword
> parameter indicating the range.
>
> 2.2.2. Bio::SeqIO::Fastq::next_seq should throw an error if it detects
> a quality value out of range.
>
> 2.2.3. Bio::SeqIO::Fastq::write_seq should throw an error if it
> detects a quality value out of range.
>
> 2.2.4. It would be useful but not absolutely necessary for
> Bio::SeqIO::Fastq::write_seq to be able to write out in Illumina
> ranges
>
>
> What do you think?
>
>    -Heikki
>
> 2009/4/26 Torsten Seemann <torsten.seemann at infotech.monash.edu.au>:
>>> > This might be a good place to ask the question: having looked at the
>>> > fastq.pm page, is the fastq format defined (only) by a "@'" followed by
>>> a
>>> > sequence line and a "+" header followed by a quality line and the two
>>> > headers have to agree? Now that Illumina is using phred scaling, are
>>> > 'Sanger' and 'Illumina' versions the same?
>>>
>>> No they aren't the same, Illumina still encodes the ascii as value + 64
>>> and Sanger as value + 33.
>>>
>>
>> Illumina have now CHANGED how they calculate the quality value however in
>> the last month or so... Their Q range used to be -5..40 mapped to ASCII 64+,
>> but now they produce Q >= 0 and it is unclear if they start at 69 or 64
>> now...
>>
>> I have tried to summarise this in a central place:
>>
>> http://en.wikipedia.org/wiki/FASTQ_format
>>
>> Corrections welcome!
>>
>>
>> --Torsten Seemann
>> --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash
>> University, AUSTRALIA
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>
>
> --
>    -Heikki
> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho
> cell: +27 (0)714328090
> Sent from Claremont, WC, South Africa
>



-- 
    -Heikki
Heikki Lehvaslaiho - skype:heikki_lehvaslaiho
cell: +27 (0)714328090
Sent from Claremont, WC, South Africa




More information about the Bioperl-l mailing list