[Bioperl-l] fastq parsing problem
Chris Fields
cjfields at illinois.edu
Tue May 12 03:07:56 UTC 2009
On May 9, 2009, at 5:55 AM, John Marshall wrote:
> Michael Muratet wrote:
>> I've got a problem parsing fastq output from the maq aligner. The
>> parser is throwing an exception for the following record:
>>
>> @HWI-EAS146:3:1:2:177#0/1
>> CTCCGCTNNCTTCTCAG[...]
>> +
>> @,AB=>-&&:5).;+*=[...]
>>
>> I looked up the line in fastq.pm that does the parsing:
>>
>> 116 my ($top,$sequence,$top2,$qualsequence) = [...]
>
> This is the fastq parser from 1.5.2 or thereabouts, which had a bug
> (the
> $/ definition just above this code) that prevented it from parsing a
> record with a quality line starting with "@". This was probably not
> recognised as a bug for a long time due to the enduring myth that
> fastq
> quality lines always start with "!".
>
> The fastq next_seq() was rewritten for 1.6.0 and parses this
> successfully.
> (Unfortunately the documentation at the top of fastq.pm was not
> updated
> and still reflects the now-unused false belief about an initial "!"
> quality.)
>
> You may be able to just drop 1.6.0's Bio/SeqIO/fastq.pm in front of
> your
> existing Bioperl installation, if you're a little crazy and don't
> want to
> update the installation properly. If you do that, or if you update,
> you'll find that the new parser emits the following pedantic warning
> for
> your fastq sequences:
>
> MSG: Seq/Qual descriptions don't match; using sequence description
>
> In practice, lots of people (probably even most!) don't bother
> putting the
> sequence id on the "+" line, as it is entirely pointless duplication,
> instead leaving the "+" line otherwise empty. So I hope the
> maintainers
> agree that this warning should be relaxed, such as in the attached
> patch.
> Or even removed -- there was no equivalent warning in the previous
> code.
>
> Cheers,
>
> John
Okay, patch committed (also removed the blurb about '!'). Thanks!
chris
More information about the Bioperl-l
mailing list