[Open-bio-l] [Bioperl-l] FASTQ support in Biopython, BioPerl, and EMBOSS

Peter biopython at maubp.freeserve.co.uk
Mon Jul 27 11:51:13 UTC 2009


On Sat, Jul 25, 2009 at 8:50 PM, Chris Fields<cjfields at illinois.edu> wrote:
>
> From this it could be summarized that converting to sanger format is least
> problematic, as possible issues may be encountered when converting to the
> other variants.  We'll need to fix the solexa quality calculations in the
> BioPerl parser as noted in your previous post; I'll work on that.
>

BioPerl SVN (revision 15887, just updated on the off chance you
have committed any fixes recently) also has a problem going the
other way (from FASTQ Sanger to FASTQ Solexa),

$ more sanger_faked.fastq
@Test PHRED qualities from 40 to 0 inclusive
ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTN
+
IHGFEDCBA@?>=<;:9876543210/.-,+*)('&%$#"!

$ perl bioperl_sanger2solexa.pl < sanger_faked.fastq
@Test PHRED qualities from 40 to 0 inclusive
ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTN
+Test PHRED qualities from 40 to 0 inclusive
hgfedcba`_^]\[ZYXWVUTSRQPONMLKJHGFEDB@><

Depending on your email viewer this may not be obvious, but
the sequence line is length 41 but the quality line is only 40
characters. And again, I also suspect a problem in the mapping
itself.

Peter




More information about the Open-Bio-l mailing list