[Bioperl-l] Possible parsing bug for Bio::Restriction::IO::withrefm ?
Emmanuel Quevillon
tuco at pasteur.fr
Mon Feb 22 15:39:06 UTC 2010
Hi,
I am just playing with Bio::Restriction::* modules and found a
possible bug in parsing the site part of the enzyme definition in
Rebase file.
It looks like it does not take into account enzyme cutting such site :
<3>CCTCAGC(-5/-2)
<3>CCGCTC(-3/-3)
<3>CCGC(-3/-1)
<3>CACGTC(-3/-3)
<3>GAATGC(1/-1)
<3>GAATGC(1/-1)
<3>GAATGC(1/-1)
Is it a normal behaviour where as parsing is supposed to support
kind of stuff (B::R::IO::withrefm line 143):
my ($precut, $recog, $postcut) = ( $site =~
m/^(?:\((\w+\/\w+)\))?([\w^]+)(?:\((\w+\/\w+)\))?/ );
For example, this regex is incapable of determining the overhang of
BmgBI which is defined as :
<1>BmgBI
<2>BtrI,AjiI
<3>CACGTC(-3/-3)
<4>?(5)
<5>Bacillus thermoglucosidasius
<6>NEB 1353
<7>N
Shouldn't it be a blunt enzyme?
The overhang for this enzyme is returned as 'unknown' and cut and
complementary cut are undef.
I propose to change the regex to :
m/^(?:\((-?\w+\/-?\w+)\))?([\w^]+)(?:\((-?\w+\/-?\w+)\))?/ )
unless I making a terrible mistake?
Thanks in advance
Regards
--
-------------------------
Emmanuel Quevillon
Biological Software and Databases Group
Institut Pasteur
+33 1 44 38 95 98
tuco at_ pasteur dot fr
-------------------------
More information about the Bioperl-l
mailing list