[Bioperl-l] Next-Gen and the next point release - updates

Peter biopython at maubp.freeserve.co.uk
Tue Sep 1 15:33:13 UTC 2009


On Thu, Aug 27, 2009 at 12:55 PM, Peter wrote:
>> The two conversions to solexa are still failing.  I'm not sure but I think
>> it's something fairly simple, but I can't work on it until Friday (got too
>> many other things on my plate ATM).  If I get stumped I'll post a message.
>
> ...
>
> This should narrow it down - the bug is in mapping PHRED
> scores (from either Sanger or Illumina 1.3+ files) to the
> Solexa encoding.
>
> Peter

Hi Chris,

I've just noticed BioPerl is treating invalid characters in the quality
string as a warning condition (not an error):
http://lists.open-bio.org/pipermail/open-bio-l/2009-September/000568.html

It seems for fastq-sanger and fastq-illumina, these get given PHRED 0
(character "!" or "@" respectively) which is reasonable. For fastq-solexa
to fastq-solexa however, Solexa -5 (ASCII 59, character ";") does not get
used - a bug?

Also, in all these cases there is currently a spurious "data loss" warning:

$ ./bioperl_sanger2sanger.pl < error_qual_null.fastq

--------------------- WARNING ---------------------
MSG: Unknown symbol with ASCII value 0 outside of quality range,
---------------------------------------------------

--------------------- WARNING ---------------------
MSG: Data loss for sanger: following values exceed max 93

---------------------------------------------------
@SLXA-B3_649_FC8437_R1_1_1_850_123
GAGGGTGTTGATCATGATGATGGCG
+
YYY!YYYYYYYYYWYYWYYSYYYSY
@SLXA-B3_649_FC8437_R1_1_1_397_389
GGTTTGAGAAAGAGAAATGAGATAA
+
YYYYYYYYYWYYYYWWYYYWYWYWW
@SLXA-B3_649_FC8437_R1_1_1_850_123
GAGGGTGTTGATCATGATGATGGCG
+
YYYYYYYYYYYYYWYYWYYSYYYSY
@SLXA-B3_649_FC8437_R1_1_1_362_549
GGAAACAAAGTTTTTCTCAACATAG
+
YYYYYYYYYYYYYYYYYYWWWWYWY
@SLXA-B3_649_FC8437_R1_1_1_183_714
GTATTATTTAATGGCATACACTCAA
+
YYYYYYYYYYWYYYYWYWWUWWWQQ

Regards,

Peter




More information about the Bioperl-l mailing list