[Biopython-dev] [Bug 1876] New: Bio.pairwise2 generates incorrect Needleman-Wunsch score_matrix

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Fri Oct 7 13:26:00 EDT 2005


http://bugzilla.open-bio.org/show_bug.cgi?id=1876

           Summary: Bio.pairwise2 generates incorrect Needleman-Wunsch
                    score_matrix
           Product: Biopython
           Version: Not Applicable
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: bill at barnard-engineering.com


Investigation of Bio.pairwise2 to duplicate the alignment example from the text
"Biological sequence analysis" by R. Durbin, et. al. reveals that although the
alignments returned for the example x = 'HEAGAWGHEE', y = 'PAWHEAE' are
correct, the underlying scoring matrix is not correct.

The Biopython version I'm using is from CVS, up-to-date as of 7 Oct 2005.

My analysis shows that the scoring matrix entries are correct for each entry
F(i,j) where one of the traceback vectors points to F(i-1,j-1). If the
traceback vectors do not contain a pointer to the diagonally previous entry,
then the F(i,j) entry is calculated incorrectly.

For this initial bug report I will show output of two programs that generate
the scoring matrix for this example by two methods. I will attach some
supporting files to this bug report following the initial commit. These files
will make it easy to reproduce the bug.

The output from my simple program that duplicates the example in Durbin (there
is one entry in the Durbin text that is in error) is:

Score matrix for Figure 2.5 example in Durbin text
     x:    H    E    A    G    A    W    G    H    E    E 
y:    0   -8  -16  -24  -32  -40  -48  -56  -64  -72  -80 
 P   -8   -2   -9  -17  -25  -33  -41  -49  -57  -65  -73 
 A  -16  -10   -3   -4  -12  -20  -28  -36  -44  -52  -60 
 W  -24  -18  -11   -6   -7  -15   -5  -13  -21  -29  -37 
 H  -32  -14  -18  -13   -8   -9  -13   -7   -3  -11  -19 
 E  -40  -22   -8  -16  -16   -9  -12  -15   -7    3   -5 
 A  -48  -30  -16   -3  -11  -11  -12  -12  -15   -5    2 
 E  -56  -38  -24  -11   -6  -12  -14  -15  -12   -9    1 

The output from the (slightly) modified pairwise2.py code is:

Global alignment:
HEAGAWGHE-E
-P--AW-HEAE
score: 1
alignment: begin = 0, end = 11

pairwise2 Score matrix for Figure 2.5 example in Durbin text
    x:    H    E    A    G    A    W    G    H    E    E 
y:   x    x    x    x    x    x    x    x    x    x    x 
 P   x   -2   -9  -17  -26  -33  -44  -50  -58  -65  -73 
 A   x  -10   -3   -4  -17  -20  -36  -41  -51  -58  -66 
 W   x  -19  -13   -6   -7  -15   -5  -31  -39  -47  -55 
 H   x  -14  -18  -13   -8   -9  -18   -7   -3  -21  -29 
 E   x  -32   -8  -19  -16   -9  -12  -16   -7    3   -5 
 A   x  -42  -23   -3  -16  -11  -12  -12  -17   -8    2 
 E   x  -48  -24  -17   -6  -12  -14  -15  -12   -9    1 

My supporting attachments will include the driver programs that generated these
outputs, along with the patch to modify pairwise2.py so it returns the
score_matrix.




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


More information about the Biopython-dev mailing list