[Bioperl-l] Possible parsing bug for Bio::Restriction::IO::withrefm ?

Emmanuel Quevillon tuco at pasteur.fr
Mon Feb 22 10:39:06 EST 2010


Hi,

I am just playing with Bio::Restriction::* modules and found a
possible bug in parsing the site part of the enzyme definition in
Rebase file.

It looks like it does not take into account enzyme cutting such site :
<3>CCTCAGC(-5/-2)
<3>CCGCTC(-3/-3)
<3>CCGC(-3/-1)
<3>CACGTC(-3/-3)
<3>GAATGC(1/-1)
<3>GAATGC(1/-1)
<3>GAATGC(1/-1)

Is it a normal behaviour where as parsing is supposed to support
kind of stuff (B::R::IO::withrefm line 143):


my ($precut, $recog, $postcut) = ( $site =~
m/^(?:\((\w+\/\w+)\))?([\w^]+)(?:\((\w+\/\w+)\))?/ );

For example, this regex is incapable of determining the overhang of
BmgBI which is defined as :

<1>BmgBI
<2>BtrI,AjiI
<3>CACGTC(-3/-3)
<4>?(5)
<5>Bacillus thermoglucosidasius
<6>NEB 1353
<7>N

Shouldn't it be a blunt enzyme?

The overhang for this enzyme is returned as 'unknown' and cut and
complementary cut are undef.

I propose to change the regex to :

m/^(?:\((-?\w+\/-?\w+)\))?([\w^]+)(?:\((-?\w+\/-?\w+)\))?/ )

unless I making a terrible mistake?

Thanks in advance

Regards

-- 
-------------------------
Emmanuel Quevillon
Biological Software and Databases Group
Institut Pasteur
+33 1 44 38 95 98
tuco at_ pasteur dot fr
-------------------------



More information about the Bioperl-l mailing list