[Bioperl-l] fastq parsing problem
Mark A. Jensen
maj at fortinbras.us
Sat May 9 01:45:18 UTC 2009
Hi Michael--
Can you send along the exception? The line you send seems to
parse as advertised in the debugger (as long as the last newline
that breaks up the string of %'s is not really there).
thanks, Mark
----- Original Message -----
From: "Michael Muratet" <mmuratet at hudsonalpha.org>
To: <bioperl-l at lists.open-bio.org>; <maq-help at lists.sourceforge.net>
Sent: Friday, May 08, 2009 3:29 PM
Subject: [Bioperl-l] fastq parsing problem
> Greetings
>
> I've got a problem parsing fastq output from the maq aligner. The
> parser is throwing an exception for the following record:
>
> @HWI-EAS146:3:1:2:177#0/1
> CTCCGCTNNCTTCTCAGCTTTCTTGTAGGCGATAGACTTCCCGAGCCTANCCAGAGCAACGAGCNTNNNGNNNNTN
> +
> @,AB=>-&&:5).;+*=<*8?%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
> %%%%%
>
> I looked up the line in fastq.pm that does the parsing:
>
> 116 my ($top,$sequence,$top2,$qualsequence) = $entry =~ /^
> 117 \@?(.
> +?)\n
> 118 ([^
> \@]*?)\n
> 119 \+?(.
> +?)\n
> 120 (.*)\n
> 121 /xs
>
> I don't consider myself a regex-pert, but I would interpret the above
> as "put everything after one or zero @ characters on the first line in
> $top; then put anything that is not @ on the second line in $sequence;
> then everything after one or zero + characters on the third line in
> $top2; then everything on the fourth line in $qualsequence; and don't
> be greedy".
>
> It seems like the fastq record above should parse with these rules. I
> note that the @ character is escaped in the regex and appears in
> several of the problem records, but not all. Has anyone come across
> this before? I don't see this exact problem in the list archives.
>
> Thanks
>
> Mike
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
More information about the Bioperl-l
mailing list