[Bioperl-l] Fix for Bug #3376 broke somewhere else
Paul Cantalupo
pcantalupo at gmail.com
Sat Mar 2 12:28:15 EST 2013
Hi Francisco,
Nice catch. Please submit a new bug report for this and reference bug
3376. Please provide a minimal hmmer output file, a script and the
expected output. Then, I'll look into it and fix the bug.
Thank you,
Paul
Paul Cantalupo
University of Pittsburgh
On Thu, Feb 28, 2013 at 10:36 AM, Francisco J. Ossandón
<fossandonc at hotmail.com> wrote:
> Hi,
> I was re-checking Bug #3302 using the Bio::SearchIO modules of the
> repository and found that now it can't parse a Hmmer2 file that was
> previously fine. After tracking the problem, I discovered that a change in a
> regular expression to fix another bug broke the parse.
>
> The fix for the Bug #3376 consisted in adding an extra condition to omit
> lines where end of domain indicator is split across lines
> (https://redmine.open-bio.org/issues/3376):
> TEST: domain 1 of 1, from 8 to 97: score 184.7, E = 2.5e-56
> *->svfqqqqssksttgstvtAiAiAigYRYRYRAvtWnsGsLssGvnDn
> sv+qqqq+ + +vtAiAiAigYRYRYRAv Wn GsLs G nDn
> Test 8 SVYQQQQGGSA----MVTAIAIAIGYRYRYRAVVWNKGSLSTGTNDN 50
>
> DnDqqsdgLYtiYYsvtvpssslpsqtviHHHaHkasstkiiikiePr<-
> DnDq +d LYtiYYsvtv +ss+p q+v+HHHaH+asstkiiiki P
> Test 51 DNDQAAD-LYTIYYSVTVSASSWPGQSVTHHHAHPASSTKIIIKIAPS 97
>
> *
>
> Test - -
> This case is characterized by the 2 dashes in the line...
>
> So the expression added in hmmer2.pm - ‘next_result’
> (https://github.com/bioperl/bioperl-live/commit/142e5d79e3a6593db32bf0af9904
> 8f47d01bd3f2):
> elsif (CORE::length($_) == 0
> || ( $count != 1 && /^\s+$/o )
> || /^\s+\-?\*\s*$/
> || /^.+\-\s+\-\s*$/ ) ### <--- This regex was
> designed for bug 3376
> {
> next;
> }
>
> But the expression used is too broad because it uses the "^.+" just before
> the 2 dashes, and it broke these lines parsing, where is full of dashes:
> KyACrqCdtiVQAPaPakpIErGiptaGLLArvlVSKyaEHlPLYRQsEI
>
> lcl|gi|340 - -------------------------------------------------- -
>
> yaRqGVeiaRstLadWVgrtgarLaPLvdALaeyVLkeGklHADeTPVqV
> +i s L V++ + r
> lcl|gi|340 60938 ------AIMISGLIHGVSARCLRF-------------------------- 60955
>
> I think a reasonable fix that still fixes the original bug and restore the
> function for this case is to add an extra \s+ in the regex just before the
> first dash, so the expression makes sure that the first dash is the one that
> comes AFTER the description (and is replacing the usual coordinate number)
> and is not the last of an alignment or a series of dashes like the one
> above:
> elsif (CORE::length($_) == 0
> || ( $count != 1 && /^\s+$/o )
> || /^\s+\-?\*\s*$/
> || /^.+\s+\-\s+\-\s*$/ ) ### <--- Tweaked regex
> {
> next;
> }
> I tested it and it works fine, hope you find the fix acceptable.
>
> Cheers,
>
> --
> Francisco J. Ossandon
> Bioinformatician.
> Ph.D. Candidate, University Andres Bello.
> Center for Bioinformatics and Genome Biology,
> Fundacion Ciencia para la Vida.
> Santiago, Chile.
> www.cienciavida.cl/CBGB.htm
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
More information about the Bioperl-l
mailing list