[Biopython-dev] [Bug 2856] New: Duplicate positions for some restriction enzymes in some sequences

Fri Jun 12 14:53:07 EDT 2009

Hi everyone,

OK, It is a little mistake in the way the sequence is dealt with by 
restriction objects to search sites spread over the boundaries of 
circular sequences.
The actual code goes one base too far therefore the beginning of the 
sequence is scanned twice. Two sites are reported. One at the beginning 
and one at the end.
After correction of the index, the second site is reported at the same 
position as the first one (which incidentally is a good thing since it 
proves the corrections are properly handled).
Final results is a duplicated report for restriction sites starting at 
the very first base of a circular sequence.

Here is the patch :
======================================================================

--- biopython-1.50-old/Bio/Restriction/Restriction.py       2008-10-22 
23:49:06.000000000 +0200
+++ biopython-1.50-new/Bio/Restriction/Restriction.py    2009-06-12 
20:28:46.000000000 +0200
@@ -197,7 +197,7 @@
          if self.is_linear() :
              data = self.data
          else :
-            data = self.data + self.data[1:size+1]
+            data = self.data + self.data[1:size]
          return [(i.start(), i.group) for i in re.finditer(pattern, data)]

      def __getitem__(self, i) :
=======================================================================

I will try to upload it.

Best regards

Fred

bugzilla-daemon at portal.open-bio.org wrote:
> http://bugzilla.open-bio.org/show_bug.cgi?id=2856
> 
>            Summary: Duplicate positions for some restriction enzymes in some
>                     sequences
>            Product: Biopython
>            Version: 1.50
>           Platform: All
>         OS/Version: All
>             Status: NEW
>           Severity: normal
>           Priority: P2
>          Component: Main Distribution
>         AssignedTo: biopython-dev at biopython.org
>         ReportedBy: zdmytriv at lbl.gov
> 
> 
> Returns 2 identical positions for EcoRI enzyme in this sequence:
> gaattccggatgagcattcatcaggcgggcaagaatgtgaataaaggccgga
> 
> Run this script test.py:
> from Bio import SeqIO
> from Bio.Restriction import *
> from Bio.Seq import Seq
> from Bio.Alphabet.IUPAC import IUPACAmbiguousDNA
> 
> if __name__ == "__main__":
>     sequence = "gaattccggatgagcattcatcaggcgggcaagaatgtgaataaaggccgga"
>     seq = Seq(sequence, IUPACAmbiguousDNA())
>     analysis = Analysis([EcoRI], seq, linear=False)
>     results =  analysis.full()
> 
>     for enzyme, positions in results.iteritems():
>         if len(positions) == 0: continue
> 
>         print enzyme
>         for position in positions:
>             print position
> 
> # returns 2 items 2 and 2
> 
>