[Biopython] bug with pairwise2 local alignments?

Jocelyne jocelyne at gmail.com
Wed Sep 12 01:41:31 UTC 2012


Hi Peter:
I understand the C code would need to be fixed too, and it should be a
fairly quick fix, but I unfortunately don't have much time on my hands at
the moment. When I have more time, I'll see about fixing the bug in both
python and C, recompiling and testing. I thought it would be good for the
community to at least be aware of this bug.
Jocelyne

On Tue, Sep 11, 2012 at 9:03 AM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> Hi Jocelyne,
>
> The reason for the C code is speed. The pure Python code is a fall
> back for systems where you can't use this - for example PyPy or Jython.
>
> To test your (pure Python) fix you'd have to comment out the C library
> import line. Ideally we'd prefer a combined fix which also updates the
> C implementation to match. Are you on Windows? That does complicate
> this - whereas for the Mac or Linux (re)compiling Biopython from source
> should be quite easy.
>
> Peter
>
> On Fri, Sep 7, 2012 at 7:15 AM, Jocelyne <jocelyne at gmail.com> wrote:
> > Hi Peter:
> > I added 4 lines of code in each snippet below (there are copies of the
> same
> > code). I'm pretty sure it should fix it (there are copies of line
> 438-439,
> > with the indexes changed). Basically, the previous code allowed for
> negative
> > scores in the first row and column of the matrix, even in the case of
> local
> > alignments (in which case scores shouldn't go negative). I didn't test
> it,
> > so please make sure it works before merging.
> >
> > Also, it seems that it imports _make_score_matrix_fasta from a C library
> > (line 851), which overload the corresponding python function, so that
> would
> > have to be fixed too.
> >
> > Thanks!
> > Jocelyne
> >
> >
> >
> > 378      # The top and left borders of the matrices are special cases
> > 379      # because there are no previously aligned characters.  To
> simplify
> > 380      # the main loop, handle these separately.
> > 381      for i in range(lenA):
> > 382          # Align the first residue in sequenceB to the ith residue in
> > 383          # sequence A.  This is like opening up i gaps at the
> beginning
> > 384          # of sequence B.
> > 385          score = match_fn(sequenceA[i], sequenceB[0])
> > 386          if penalize_end_gaps:
> > 387              score += gap_B_fn(0, i)
> > 388          score_matrix[i][0] = score
> > +++          if not align_globally and score_matrix[0][i] < 0:
> > +++              score_matrix[i][0] = 0
> > 389      for i in range(1, lenB):
> > 390          score = match_fn(sequenceA[0], sequenceB[i])
> > 391          if penalize_end_gaps:
> > 392              score += gap_A_fn(0, i)
> > 393          score_matrix[0][i] = score
> > +++          if not align_globally and score_matrix[0][i] < 0:
> > +++              score_matrix[0][i] = 0
> >
> >
> > 461      # The top and left borders of the matrices are special cases
> > 462      # because there are no previously aligned characters.  To
> simplify
> > 463      # the main loop, handle these separately.
> > 464      for i in range(lenA):
> > 465          # Align the first residue in sequenceB to the ith residue in
> > 466          # sequence A.  This is like opening up i gaps at the
> beginning
> > 467          # of sequence B.
> > 468          score = match_fn(sequenceA[i], sequenceB[0])
> > 469          if penalize_end_gaps:
> > 470              score += calc_affine_penalty(
> > 471                  i, open_B, extend_B, penalize_extend_when_opening)
> > 472          score_matrix[i][0] = score
> > +++          if not align_globally and score_matrix[i][0] < 0:
> > +++              score_matrix[i][0] = 0
> > 473      for i in range(1, lenB):
> > 474          score = match_fn(sequenceA[0], sequenceB[i])
> > 475          if penalize_end_gaps:
> > 476              score += calc_affine_penalty(
> > 477                  i, open_A, extend_A, penalize_extend_when_opening)
> > 478          score_matrix[0][i] = score
> > +++          if not align_globally and score_matrix[0][i] < 0:
> > +++              score_matrix[0][i] = 0
> >
> >
> >
> > On Thu, Sep 6, 2012 at 8:59 PM, Peter Cock <p.j.a.cock at googlemail.com>
> > wrote:
> >>
> >> On Thu, Sep 6, 2012 at 9:31 PM, Jocelyne <jocelyne at gmail.com> wrote:
> >> > Hello:
> >> > First, I'd like to say that I really appreciate the effort of the
> >> > community
> >> > to provide us with such a nice package.
> >> > I found some odd scoring behavior with the pairwise2 local alignment
> >> > (see 5
> >> > below). I think these 2 alignments should have the same score.
> >>
> >> Hmm. I'm not overly familiar with this bit of the code, but did
> >> occur to me it might be something related to this open issue:
> >>
> >> https://redmine.open-bio.org/issues/2776
> >>
> >> I was able to repeat your pairwise2.align.localms example
> >> and the score matrix example a Mac using the latest code
> >> from github, and got the same answers. So (as I suspected)
> >> this does not seem to be a platform specific issue.
> >>
> >> Unfortunately the original author of this code (Jeff Chang)
> >> isn't active with Biopython anymore - we can try emailing
> >> him directly, but if you're willing to look into this in more
> >> detail and can propose a fix, I'm happy to take a look at
> >> merging it.
> >>
> >> Peter
> >
> >
>



More information about the Biopython mailing list