[Bioperl-l] Next-Gen and the next point release - updates
Chris Fields
cjfields at illinois.edu
Wed Aug 26 22:52:13 EDT 2009
On Aug 26, 2009, at 4:16 PM, Peter wrote:
> It is looking much better than yesterday - nice work :)
> However, there are a few rough edges still.
Not unexpected, actually.
> ===========================
> Evil wrapping
> ===========================
> Chris - Did you get the zip file of FASTQ examples I sent off list?
> One of
> these was the evil_wrapping.fastq file already in Biopython CVS/git
> (under
> a new name). This is intended as a real torture test, with line
> wrapped
> quality strings where plenty of the lines start with "+" or "@"
> characters.
> Bioperl doesn't like this file at all - but I have not dug into why.
Now fixed; I've saved this as very_tricky.fastq, but it's the same file.
> ===========================
> Sanger To Illumina 1.3+
> ===========================
> When mapping a Sanger FASTQ file with very high scores to Illumina,
> these don't get the maximum value imposes (ASCII 126, tidle). e.g.
...
Yes, I know where that one is going wrong. Fixed now for bounds for
the above. Partly related to the below.
> ===========================
> Sanger To Solexa
> ===========================
> Likewise when mapping a Sanger FASTQ file with very high scores to
> Solexa FASTQ, these don't get the maximum value imposes (ASCII 126,
> tidle). For example,
>
> $ ./biopython_sanger2solexa < sanger_93.fastq
> /usr/local/lib/python2.6/dist-packages/Bio/SeqIO/QualityIO.py:764:
> UserWarning: Data loss - max Solexa quality 62 in Solexa FASTQ
> warnings.warn("Data loss - max Solexa quality 62 in Solexa FASTQ")
> @Test PHRED qualities from 93 to 0 inclusive
> ACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGAN
> +
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~}|{zyxwvutsrqponmlkjihgfedcba`_^]\
> [ZYXWVUTSRQPONMLKJHGFECB@>;;
>
> But,
>
> $ ./bioperl_sanger2solexa < sanger_93.fastq
>
> --------------------- WARNING ---------------------
> MSG: Quality values not found for
> solexa:
> 0,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93
> ---------------------------------------------------
> @Test PHRED qualities from 93 to 0 inclusive
> ACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGAN
> +
> <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<~}|{zyxwvutsrqponmlkjihgfedcba`_^]\
> [ZYXWVUTSRQPONMLKJHGFEDB@><<
>
> i.e. You've mapped the high value scores to "<", ASCII 60, thus
> Solexa -4
> (an odd thing to happen - getting the lowest score wouldn't surprise
> me so
> much).
This one is fixed, it was the same bounding issue as above.
> Furthermore, notice that PHRED scores 0 and 1 have both been mapped
> to "<", ASCII 60, thus Solexa -4, and not ";" ASCII 59 meaning
> Solexa -5.
The two conversions to solexa are still failing. I'm not sure but I
think it's something fairly simple, but I can't work on it until
Friday (got too many other things on my plate ATM). If I get stumped
I'll post a message.
> ===========================
>
> Still, things are looking up :)
>
> Peter
Yes they are, much more so that previously. I'll add these to the
tests.
chris
More information about the Bioperl-l
mailing list