[Bioperl-l] Creating a fastq format file?
Tristan Lefebure
tristan.lefebure at gmail.com
Tue Jun 2 12:24:21 EDT 2009
On Monday 27 April 2009 05:38:40 Heikki Lehvaslaiho wrote:
> I convinced at least myself to the degree that I wrote
> the range_convert() method - with plenty of tests. I
> mention this now so that no-one else need to start
> thinking through all the edge values.
>
> :)
>
> I'll contribute it to the code base once there is a
> consensus of best way forward.
>
Heikki,
This thread has been quiet for a while, but I don't see
anything new in Bio::Seq::Quality. Did we reach a consensus
or are you waiting for some more discussion on the subject?
(I'm pretty impatient to see bioperl handling both sanger
and illumina ranges on the fly!)
--Tristan
> -Heikki
>
> 2009/4/27 Heikki Lehvaslaiho
<heikki.lehvaslaiho at gmail.com>:
> >> I have tried to summarise this in a central place:
> >> http://en.wikipedia.org/wiki/FASTQ_format
> >
> > Torsten,
> >
> > Thanks for putting this together. Very helpful.
> >
> > Do you have a plan of action? Let me propose one for
> > BioPerl. It based on following assumptions:
> >
> > 1. There is multitude of different ways of coding
> > quality values out there. 2. Bio::Seq::Quality is
> > agnostic of any quality value range rules 3. The
> > emerging open standard is the Sanger fastq
> > specification 4. Open source programs use the Sanger
> > fastq specs
> >
> >
> > From these it follows that:
> >
> >
> > 1. BioPerl should support Sanger fastq standard
> >
> > 1.1. it already does and there are other SeqIO modules
> > for dealing with other non-fastq formats.
> >
> > 2. BioPerl should offer simple ways of converting
> > between quality range rules
> >
> > 2.1. Have a generic method accessible from
> > Bio::Seq::Quality with preset versions of the method
> > for converting between known variants (Sanger fastq and
> > the two Illumina versions)
> >
> > For example:
> >
> > range_convert ($from_lower, $from_upper, $to_lower,
> > $to_upper, $value) throw if $value < $from_lower or
> > $value > $from_upper return $newvalue
> >
> > range_convert_illumina2fastq(),
> > range_convert_fastq2illumina(),
> > range_convert_fastq2phred(),
> > range_convert_phred2fastq()....
> >
> > (assuming that illumina 1.3 eq phred)
> >
> > 2.2. Bio::SeqIO::Fastq::next_seq methods should convert
> > Illumina qualities into Sanger fastq on the fly
> >
> > 2.2.1 Bio::SeqIO::Fastq::next_seq should detect the
> > incoming stream of quality value range either
> > automatically or be given a keyword parameter
> > indicating the range.
> >
> > 2.2.2. Bio::SeqIO::Fastq::next_seq should throw an
> > error if it detects a quality value out of range.
> >
> > 2.2.3. Bio::SeqIO::Fastq::write_seq should throw an
> > error if it detects a quality value out of range.
> >
> > 2.2.4. It would be useful but not absolutely necessary
> > for Bio::SeqIO::Fastq::write_seq to be able to write
> > out in Illumina ranges
> >
> >
> > What do you think?
> >
> > -Heikki
> >
> > 2009/4/26 Torsten Seemann
<torsten.seemann at infotech.monash.edu.au>:
> >>> > This might be a good place to ask the question:
> >>> > having looked at the fastq.pm page, is the fastq
> >>> > format defined (only) by a "@'" followed by
> >>>
> >>> a
> >>>
> >>> > sequence line and a "+" header followed by a
> >>> > quality line and the two headers have to agree? Now
> >>> > that Illumina is using phred scaling, are 'Sanger'
> >>> > and 'Illumina' versions the same?
> >>>
> >>> No they aren't the same, Illumina still encodes the
> >>> ascii as value + 64 and Sanger as value + 33.
> >>
> >> Illumina have now CHANGED how they calculate the
> >> quality value however in the last month or so... Their
> >> Q range used to be -5..40 mapped to ASCII 64+, but now
> >> they produce Q >= 0 and it is unclear if they start at
> >> 69 or 64 now...
> >>
> >> I have tried to summarise this in a central place:
> >>
> >> http://en.wikipedia.org/wiki/FASTQ_format
> >>
> >> Corrections welcome!
> >>
> >>
> >> --Torsten Seemann
> >> --Victorian Bioinformatics Consortium, Dept.
> >> Microbiology, Monash University, AUSTRALIA
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > --
> > -Heikki
> > Heikki Lehvaslaiho - skype:heikki_lehvaslaiho
> > cell: +27 (0)714328090
> > Sent from Claremont, WC, South Africa
More information about the Bioperl-l
mailing list