[Bioperl-l] fastq parsing problem
Michael Muratet
mmuratet at hudsonalpha.org
Fri May 8 19:29:38 UTC 2009
Greetings
I've got a problem parsing fastq output from the maq aligner. The
parser is throwing an exception for the following record:
@HWI-EAS146:3:1:2:177#0/1
CTCCGCTNNCTTCTCAGCTTTCTTGTAGGCGATAGACTTCCCGAGCCTANCCAGAGCAACGAGCNTNNNGNNNNTN
+
@,AB=>-&&:5).;+*=<*8?%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%
I looked up the line in fastq.pm that does the parsing:
116 my ($top,$sequence,$top2,$qualsequence) = $entry =~ /^
117 \@?(.
+?)\n
118 ([^
\@]*?)\n
119 \+?(.
+?)\n
120 (.*)\n
121 /xs
I don't consider myself a regex-pert, but I would interpret the above
as "put everything after one or zero @ characters on the first line in
$top; then put anything that is not @ on the second line in $sequence;
then everything after one or zero + characters on the third line in
$top2; then everything on the fourth line in $qualsequence; and don't
be greedy".
It seems like the fastq record above should parse with these rules. I
note that the @ character is escaped in the regex and appears in
several of the problem records, but not all. Has anyone come across
this before? I don't see this exact problem in the list archives.
Thanks
Mike
More information about the Bioperl-l
mailing list