[Bioperl-l] Creating a fastq format file?

Chris Fields cjfields at illinois.edu
Wed May 13 13:55:17 EDT 2009


Heikki,

Did you still want to commit this?  I think it's a good idea and would  
be worth including in the next 1.6 point release.

chris

------------------------------------------------------------
I convinced at least myself to the degree that I wrote the
range_convert() method - with plenty of tests. I mention this now so
that no-one else need to start thinking through all the edge values.
:)

I'll contribute it to the code base once there is a consensus of best
way forward.

     -Heikki

2009/4/27 Heikki Lehvaslaiho <heikki.lehvaslaiho at gmail.com>:
 >> I have tried to summarise this in a central place:
 >> http://en.wikipedia.org/wiki/FASTQ_format
 >
 > Torsten,
 >
 > Thanks for putting this together. Very helpful.
 >
 > Do you have a plan of action?  Let me propose one for BioPerl. It
 > based on following assumptions:
 >
 > 1. There is multitude of different ways of coding quality values  
out there.
 > 2. Bio::Seq::Quality is agnostic of any quality value range rules
 > 3. The emerging open standard is the Sanger fastq specification
 > 4. Open source programs use the Sanger fastq specs
 >
 >
 > From these it follows that:
 >
 >
 > 1. BioPerl should support Sanger fastq standard
 >
 > 1.1. it already does and there are other SeqIO modules for dealing
 > with other non-fastq formats.
 >
 > 2. BioPerl should offer simple ways of converting between quality  
range rules
 >
 > 2.1. Have a generic method accessible from Bio::Seq::Quality with
 > preset versions of the method for converting between known variants
 > (Sanger fastq and the two Illumina versions)
 >
 > For example:
 >
 > range_convert ($from_lower, $from_upper, $to_lower, $to_upper,  
$value)
 >  throw if $value < $from_lower or $value > $from_upper
 >  return $newvalue
 >
 > range_convert_illumina2fastq(), range_convert_fastq2illumina(),
 > range_convert_fastq2phred(),  range_convert_phred2fastq()....
 >
 > (assuming that illumina 1.3 eq phred)
 >
 > 2.2. Bio::SeqIO::Fastq::next_seq methods should convert Illumina
 > qualities into Sanger fastq on the fly
 >
 > 2.2.1 Bio::SeqIO::Fastq::next_seq should detect the incoming stream  
of
 > quality value range either automatically or be given a keyword
 > parameter indicating the range.
 >
 > 2.2.2. Bio::SeqIO::Fastq::next_seq should throw an error if it  
detects
 > a quality value out of range.
 >
 > 2.2.3. Bio::SeqIO::Fastq::write_seq should throw an error if it
 > detects a quality value out of range.
 >
 > 2.2.4. It would be useful but not absolutely necessary for
 > Bio::SeqIO::Fastq::write_seq to be able to write out in Illumina
 > ranges
 >
 >
 > What do you think?
 >
 >    -Heikki
 >
 > 2009/4/26 Torsten Seemann <torsten.seemann at  
infotech.monash.edu.au>:
 >>> > This might be a good place to ask the question: having looked  
at the
 >>> > fastq.pm page, is the fastq format defined (only) by a "@'"  
followed by
 >>> a
 >>> > sequence line and a "+" header followed by a quality line and  
the two
 >>> > headers have to agree? Now that Illumina is using phred  
scaling, are
 >>> > 'Sanger' and 'Illumina' versions the same?
 >>>
 >>> No they aren't the same, Illumina still encodes the ascii as  
value + 64
 >>> and Sanger as value + 33.
 >>>
 >>
 >> Illumina have now CHANGED how they calculate the quality value  
however in
 >> the last month or so... Their Q range used to be -5..40 mapped to  
ASCII 64+,
 >> but now they produce Q >= 0 and it is unclear if they start at 69  
or 64
 >> now...
 >>
 >> I have tried to summarise this in a central place:
 >>
 >> http://en.wikipedia.org/wiki/FASTQ_format
 >>
 >> Corrections welcome!
 >>
 >>
 >> --Torsten Seemann
 >> --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash
 >> University, AUSTRALIA
 >> _______________________________________________
 >> Bioperl-l mailing list
 >> Bioperl-l at lists.open-bio.org
 >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
 >>
 >
 >
 >
 > --
 >    -Heikki
 > Heikki Lehvaslaiho - skype:heikki_lehvaslaiho
 > cell: +27 (0)714328090
 > Sent from Claremont, WC, South Africa
 >



-- 
     -Heikki
Heikki Lehvaslaiho - skype:heikki_lehvaslaiho
cell: +27 (0)714328090
Sent from Claremont, WC, South Africa


More information about the Bioperl-l mailing list