[Bioperl-l] Update of SeqIO:: fastq Module for PacBio
Dan Nasko
dan.nasko at gmail.com
Thu Sep 20 12:47:55 UTC 2012
Hi,
I've recently begun working through some PacBio sequencing data and it has been chocking up current bioperl FASTQ I/O modules. Here are the problems I'm running into:
[1] PacBio will report quality scores up to 100 - I believe there's an upper limit of 93 and the FASTQ parser will throw and error if that's surpassed.
[2] Very often PacBio will have one base sequences. e.g.:
@m120726_170229_42157_c100356772550000001523024009301210_s1_p0/9743/2588_2589
T
+
0
@m120726_170229_42157_c100356772550000001523024009301210_s1_p0/9743/3320_3321
G
+
(
If this one base sequence has a quality character of "0" (quality score 15), shown above, I/O will throw the following error:
------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Quality string [0 at m120726_170229_42157_c100356772550000001523024009301210_s1_p0/9743/3320_3321] of length [78]
doesn't match length of sequence T
[1], line: 86394
STACK: Error::throw
STACK: Bio::Root::Root::throw /Library/Perl/5.12/Bio/Root/Root.pm:472
STACK: Bio::SeqIO::fastq::next_dataset /Library/Perl/5.12/Bio/SeqIO/fastq.pm:102
STACK: Bio::SeqIO::fastq::next_seq /Library/Perl/5.12/Bio/SeqIO/fastq.pm:29
STACK: quality_length_filter.pl:146
-----------------------------------------------------------
For some reason when it encounters ^0$ on the quality line, it won't see the [\n] and will take up the next sequence's header as quality scores. (i.e. @m120726_170229_42157_c100356772550000001523024009301210_s1_p0/9743/3320_3321 was the name of the next sequence).
Thanks,
Dan
More information about the Bioperl-l
mailing list