[Bioperl-l] fastq parsing problem

Tue May 12 03:07:56 UTC 2009

On May 9, 2009, at 5:55 AM, John Marshall wrote:

> Michael Muratet wrote:
>> I've got a problem parsing fastq output from the maq aligner. The
>> parser is throwing an exception for the following record:
>>
>> @HWI-EAS146:3:1:2:177#0/1
>> CTCCGCTNNCTTCTCAG[...]
>> +
>> @,AB=>-&&:5).;+*=[...]
>>
>> I looked up the line in fastq.pm that does the parsing:
>>
>>    116   my ($top,$sequence,$top2,$qualsequence) = [...]
>
> This is the fastq parser from 1.5.2 or thereabouts, which had a bug  
> (the
> $/ definition just above this code) that prevented it from parsing a
> record with a quality line starting with "@".  This was probably not
> recognised as a bug for a long time due to the enduring myth that  
> fastq
> quality lines always start with "!".
>
> The fastq next_seq() was rewritten for 1.6.0 and parses this  
> successfully.
> (Unfortunately the documentation at the top of fastq.pm was not  
> updated
> and still reflects the now-unused false belief about an initial "!"
> quality.)
>
> You may be able to just drop 1.6.0's Bio/SeqIO/fastq.pm in front of  
> your
> existing Bioperl installation, if you're a little crazy and don't  
> want to
> update the installation properly.  If you do that, or if you update,
> you'll find that the new parser emits the following pedantic warning  
> for
> your fastq sequences:
>
> MSG: Seq/Qual descriptions don't match; using sequence description
>
> In practice, lots of people (probably even most!) don't bother  
> putting the
> sequence id on the "+" line, as it is entirely pointless duplication,
> instead leaving the "+" line otherwise empty.  So I hope the  
> maintainers
> agree that this warning should be relaxed, such as in the attached  
> patch.
> Or even removed -- there was no equivalent warning in the previous  
> code.
>
> Cheers,
>
>    John

Okay, patch committed (also removed the blurb about '!'). Thanks!

chris