[Bioperl-l] Update of SeqIO:: fastq Module for PacBio

Thu Sep 20 12:08:21 EDT 2012

On Sep 20, 2012, at 9:30 AM, Peter Cock <p.j.a.cock at googlemail.com> wrote:

> On Thu, Sep 20, 2012 at 1:47 PM, Dan Nasko <dan.nasko at gmail.com> wrote:
>> Hi,
>> 
>> I've recently begun working through some PacBio sequencing data
>> and it has been chocking up current bioperl FASTQ I/O modules.
>> Here are the problems I'm running into:
>> 
>>        [1] PacBio will report quality scores up to 100 - I believe there's
>> an upper limit of 93 and the FASTQ parser will throw and error if
>> that's surpassed.
> 
> How exactly? The 93 limit comes from the fact that the top printable
> ASCII character is '~', 126 - 33 = 93. Are PacBio joining in the game
> of redefining FASTQ encodings? An example would be very
> interesting.
> 
>>        [2] Very often PacBio will have one base sequences. e.g.:
>> 
>> 
>>        @m120726_170229_42157_c100356772550000001523024009301210_s1_p0/9743/2588_2589
>>        T
>>        +
>>        0
>>        @m120726_170229_42157_c100356772550000001523024009301210_s1_p0/9743/3320_3321
>>        G
>>        +
>>        (
>> 
>>        If this one base sequence has a quality character of "0"
>> (quality score 15), shown above, I/O will throw the following error:
> 
> I thought that BioPerl bug had been fixed... or maybe it was the
> very similar situation of a quality score using the zero character?
> 
> Peter

This should be fixed.  Is this using the latest CPAN release?  The latest code from GitHub?

chris