[Biopython-dev] [Bug 1876] New: Bio.pairwise2 generates incorrect
Needleman-Wunsch score_matrix
bugzilla-daemon at portal.open-bio.org
bugzilla-daemon at portal.open-bio.org
Fri Oct 7 13:26:00 EDT 2005
http://bugzilla.open-bio.org/show_bug.cgi?id=1876
Summary: Bio.pairwise2 generates incorrect Needleman-Wunsch
score_matrix
Product: Biopython
Version: Not Applicable
Platform: All
OS/Version: All
Status: NEW
Severity: normal
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: bill at barnard-engineering.com
Investigation of Bio.pairwise2 to duplicate the alignment example from the text
"Biological sequence analysis" by R. Durbin, et. al. reveals that although the
alignments returned for the example x = 'HEAGAWGHEE', y = 'PAWHEAE' are
correct, the underlying scoring matrix is not correct.
The Biopython version I'm using is from CVS, up-to-date as of 7 Oct 2005.
My analysis shows that the scoring matrix entries are correct for each entry
F(i,j) where one of the traceback vectors points to F(i-1,j-1). If the
traceback vectors do not contain a pointer to the diagonally previous entry,
then the F(i,j) entry is calculated incorrectly.
For this initial bug report I will show output of two programs that generate
the scoring matrix for this example by two methods. I will attach some
supporting files to this bug report following the initial commit. These files
will make it easy to reproduce the bug.
The output from my simple program that duplicates the example in Durbin (there
is one entry in the Durbin text that is in error) is:
Score matrix for Figure 2.5 example in Durbin text
x: H E A G A W G H E E
y: 0 -8 -16 -24 -32 -40 -48 -56 -64 -72 -80
P -8 -2 -9 -17 -25 -33 -41 -49 -57 -65 -73
A -16 -10 -3 -4 -12 -20 -28 -36 -44 -52 -60
W -24 -18 -11 -6 -7 -15 -5 -13 -21 -29 -37
H -32 -14 -18 -13 -8 -9 -13 -7 -3 -11 -19
E -40 -22 -8 -16 -16 -9 -12 -15 -7 3 -5
A -48 -30 -16 -3 -11 -11 -12 -12 -15 -5 2
E -56 -38 -24 -11 -6 -12 -14 -15 -12 -9 1
The output from the (slightly) modified pairwise2.py code is:
Global alignment:
HEAGAWGHE-E
-P--AW-HEAE
score: 1
alignment: begin = 0, end = 11
pairwise2 Score matrix for Figure 2.5 example in Durbin text
x: H E A G A W G H E E
y: x x x x x x x x x x x
P x -2 -9 -17 -26 -33 -44 -50 -58 -65 -73
A x -10 -3 -4 -17 -20 -36 -41 -51 -58 -66
W x -19 -13 -6 -7 -15 -5 -31 -39 -47 -55
H x -14 -18 -13 -8 -9 -18 -7 -3 -21 -29
E x -32 -8 -19 -16 -9 -12 -16 -7 3 -5
A x -42 -23 -3 -16 -11 -12 -12 -17 -8 2
E x -48 -24 -17 -6 -12 -14 -15 -12 -9 1
My supporting attachments will include the driver programs that generated these
outputs, along with the patch to modify pairwise2.py so it returns the
score_matrix.
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
More information about the Biopython-dev
mailing list