[Bioperl-l] Update of SeqIO:: fastq Module for PacBio

Peter Cock p.j.a.cock at googlemail.com
Thu Sep 20 14:30:07 UTC 2012


On Thu, Sep 20, 2012 at 1:47 PM, Dan Nasko <dan.nasko at gmail.com> wrote:
> Hi,
>
> I've recently begun working through some PacBio sequencing data
> and it has been chocking up current bioperl FASTQ I/O modules.
> Here are the problems I'm running into:
>
>         [1] PacBio will report quality scores up to 100 - I believe there's
> an upper limit of 93 and the FASTQ parser will throw and error if
> that's surpassed.

How exactly? The 93 limit comes from the fact that the top printable
ASCII character is '~', 126 - 33 = 93. Are PacBio joining in the game
of redefining FASTQ encodings? An example would be very
interesting.

>         [2] Very often PacBio will have one base sequences. e.g.:
>
>
>         @m120726_170229_42157_c100356772550000001523024009301210_s1_p0/9743/2588_2589
>         T
>         +
>         0
>         @m120726_170229_42157_c100356772550000001523024009301210_s1_p0/9743/3320_3321
>         G
>         +
>         (
>
>         If this one base sequence has a quality character of "0"
> (quality score 15), shown above, I/O will throw the following error:

I thought that BioPerl bug had been fixed... or maybe it was the
very similar situation of a quality score using the zero character?

Peter



More information about the Bioperl-l mailing list