[Bioperl-l] Next-Gen and the next point release - updates

Chris Fields cjfields at illinois.edu
Wed Aug 26 22:52:13 EDT 2009


On Aug 26, 2009, at 4:16 PM, Peter wrote:

> It is looking much better than yesterday - nice work :)
> However, there are a few rough edges still.

Not unexpected, actually.

> ===========================
> Evil wrapping
> ===========================
> Chris - Did you get the zip file of FASTQ examples I sent off list?  
> One of
> these was the evil_wrapping.fastq file already in Biopython CVS/git  
> (under
> a new name). This is intended as a real torture test, with line  
> wrapped
> quality strings where plenty of the lines start with "+" or "@"  
> characters.
> Bioperl doesn't like this file at all - but I have not dug into why.

Now fixed; I've saved this as very_tricky.fastq, but it's the same file.

> ===========================
> Sanger To Illumina 1.3+
> ===========================
> When mapping a Sanger FASTQ file with very high scores to Illumina,
> these don't get the maximum value imposes (ASCII 126, tidle). e.g.
...

Yes, I know where that one is going wrong.  Fixed now for bounds for  
the above.  Partly related to the below.

> ===========================
> Sanger To Solexa
> ===========================
> Likewise when mapping a Sanger FASTQ file with very high scores to
> Solexa FASTQ, these don't get the maximum value imposes (ASCII 126,
> tidle). For example,
>
> $ ./biopython_sanger2solexa < sanger_93.fastq
> /usr/local/lib/python2.6/dist-packages/Bio/SeqIO/QualityIO.py:764:
> UserWarning: Data loss - max Solexa quality 62 in Solexa FASTQ
>  warnings.warn("Data loss - max Solexa quality 62 in Solexa FASTQ")
> @Test PHRED qualities from 93 to 0 inclusive
> ACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGAN
> +
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~}|{zyxwvutsrqponmlkjihgfedcba`_^]\ 
> [ZYXWVUTSRQPONMLKJHGFECB@>;;
>
> But,
>
> $ ./bioperl_sanger2solexa < sanger_93.fastq
>
> --------------------- WARNING ---------------------
> MSG: Quality values not found for
> solexa: 
> 0,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93
> ---------------------------------------------------
> @Test PHRED qualities from 93 to 0 inclusive
> ACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGAN
> +
> <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<~}|{zyxwvutsrqponmlkjihgfedcba`_^]\ 
> [ZYXWVUTSRQPONMLKJHGFEDB@><<
>
> i.e. You've mapped the high value scores to "<", ASCII 60, thus  
> Solexa -4
> (an odd thing to happen - getting the lowest score wouldn't surprise  
> me so
> much).

This one is fixed, it was the same bounding issue as above.

> Furthermore, notice that PHRED scores 0 and 1 have both been mapped
> to "<", ASCII 60, thus Solexa -4, and not ";" ASCII 59 meaning  
> Solexa -5.

The two conversions to solexa are still failing.  I'm not sure but I  
think it's something fairly simple, but I can't work on it until  
Friday (got too many other things on my plate ATM).  If I get stumped  
I'll post a message.

> ===========================
>
> Still, things are looking up :)
>
> Peter

Yes they are, much more so that previously.  I'll add these to the  
tests.

chris


More information about the Bioperl-l mailing list