[Biopython] Restriction enzymes and sticky ends

Peter Cock p.j.a.cock at googlemail.com
Mon Apr 8 09:32:00 UTC 2013


On Sun, Apr 7, 2013 at 9:15 PM, Mark Budde <markbudde at gmail.com> wrote:
> Thanks for doing some digging on my behalf, Peter. After I posted my email
> last night, I started looking through the Bio.Restriction code myself. You
> response is helpful, I was having trouble seeing how the cut site was
> encoded for each strand. I think Bjorn's python-dna might be a better
> starting place for me than Bio.Restriction, as it already has some of the
> functionality I was looking for.

Fair enough.

> However, to you question, I'm still not quite getting the cut sites. You
> example with EcoRI makes complete sense, but I can't figure out the pattern
> for some other enzymes, such as BsaI, which is why I got confused initially.
> If you repeat that protocol for BsaI, the results don't match up.
>
> In [80]: BsaI.elucidate()
> Out[80]: 'GGTCTCN^NNNN_N'
>
> In [81]: BsaI.fst5
> Out[81]: 7
>
> In [82]: BsaI.fst3
> Out[82]: 5
>
> In [83]: BsaI.site
> Out[83]: 'GGTCTC'
>
> Based on this, I would expect that BsaI.fst3 should yield
> "11" but it yields 5.

I think you are counting from the wrong reference point.
Using Python style indexing would only allow cleavage
points within the recognition site to be described.

BsaI is a weird enzyme, and appears to be handled by the
Ambiguous class in Bio/Restriction/Restriction.py - which
says this is for enzymes for which the overhang is variable.

>>> from Bio.Restriction import Bsal
>>> BsaI.is_ambiguous()
True
>>> BsaI.is_defined() # is there a consistent site?
False
>>> BsaI.is_unknown()
False
>>> BsaI.fst5
7
>>> BsaI.fst3
5
>>> BsaI.elucidate()
'GGTCTCN^NNNN_N'

This subclass has a more complicated elucidate method,
but gives the same string as the REBASE website, so this
is deliberate: http://rebase.neb.com/rebase/enz/BsaI.html

The 5' cut site of 7 clearly means this is downstream of
the 6 bp recognition site. This appears to be counted
from the start (left) of the restriction site.

>From the illustration the 3' cut side is also to the right of
the 5bp recognition site. It appears the number is counted
from the end (right) of the recognition site, where positive
as in BsaI means to the right (after the recognition site)
while negative as in EcoRI means to the left (within the
recognition site).

Peter

P.S. Please remember to CC the mailing list, e.g. reply all.
Unless people say explicitly that they have done this deliberately,
I generally assume taking a public discussion off list is accidental.



More information about the Biopython mailing list