[Biopython-dev] [Bug 2856] New: Duplicate positions for some restriction enzymes in some sequences
Frédéric Sohm
sohm at inaf.cnrs-gif.fr
Fri Jun 12 14:53:07 EDT 2009
Hi everyone,
OK, It is a little mistake in the way the sequence is dealt with by
restriction objects to search sites spread over the boundaries of
circular sequences.
The actual code goes one base too far therefore the beginning of the
sequence is scanned twice. Two sites are reported. One at the beginning
and one at the end.
After correction of the index, the second site is reported at the same
position as the first one (which incidentally is a good thing since it
proves the corrections are properly handled).
Final results is a duplicated report for restriction sites starting at
the very first base of a circular sequence.
Here is the patch :
======================================================================
--- biopython-1.50-old/Bio/Restriction/Restriction.py 2008-10-22
23:49:06.000000000 +0200
+++ biopython-1.50-new/Bio/Restriction/Restriction.py 2009-06-12
20:28:46.000000000 +0200
@@ -197,7 +197,7 @@
if self.is_linear() :
data = self.data
else :
- data = self.data + self.data[1:size+1]
+ data = self.data + self.data[1:size]
return [(i.start(), i.group) for i in re.finditer(pattern, data)]
def __getitem__(self, i) :
=======================================================================
I will try to upload it.
Best regards
Fred
bugzilla-daemon at portal.open-bio.org wrote:
> http://bugzilla.open-bio.org/show_bug.cgi?id=2856
>
> Summary: Duplicate positions for some restriction enzymes in some
> sequences
> Product: Biopython
> Version: 1.50
> Platform: All
> OS/Version: All
> Status: NEW
> Severity: normal
> Priority: P2
> Component: Main Distribution
> AssignedTo: biopython-dev at biopython.org
> ReportedBy: zdmytriv at lbl.gov
>
>
> Returns 2 identical positions for EcoRI enzyme in this sequence:
> gaattccggatgagcattcatcaggcgggcaagaatgtgaataaaggccgga
>
> Run this script test.py:
> from Bio import SeqIO
> from Bio.Restriction import *
> from Bio.Seq import Seq
> from Bio.Alphabet.IUPAC import IUPACAmbiguousDNA
>
> if __name__ == "__main__":
> sequence = "gaattccggatgagcattcatcaggcgggcaagaatgtgaataaaggccgga"
> seq = Seq(sequence, IUPACAmbiguousDNA())
> analysis = Analysis([EcoRI], seq, linear=False)
> results = analysis.full()
>
> for enzyme, positions in results.iteritems():
> if len(positions) == 0: continue
>
> print enzyme
> for position in positions:
> print position
>
> # returns 2 items 2 and 2
>
>
More information about the Biopython-dev
mailing list