[Biopython] bug with pairwise2 local alignments?

Jocelyne jocelyne at gmail.com
Fri Sep 7 06:15:59 UTC 2012


Hi Peter:
I added 4 lines of code in each snippet below (there are copies of the same
code). I'm pretty sure it should fix it (there are copies of line 438-439,
with the indexes changed). Basically, the previous code allowed for
negative scores in the first row and column of the matrix, even in the case
of local alignments (in which case scores shouldn't go negative). I didn't
test it, so please make sure it works before merging.

Also, it seems that it imports _make_score_matrix_fasta from a C library
(line 851), which overload the corresponding python function, so that would
have to be fixed too.

Thanks!
Jocelyne


378      # The top and left borders of the matrices are special cases
379      # because there are no previously aligned characters.  To simplify
380      # the main loop, handle these separately.
381      for i in range(lenA):
382          # Align the first residue in sequenceB to the ith residue in
383          # sequence A.  This is like opening up i gaps at the beginning
384          # of sequence B.
385          score = match_fn(sequenceA[i], sequenceB[0])
386          if penalize_end_gaps:
387              score += gap_B_fn(0, i)
388          score_matrix[i][0] = score
+++          if not align_globally and score_matrix[0][i] < 0:
+++              score_matrix[i][0] = 0
389      for i in range(1, lenB):
390          score = match_fn(sequenceA[0], sequenceB[i])
391          if penalize_end_gaps:
392              score += gap_A_fn(0, i)
393          score_matrix[0][i] = score
+++          if not align_globally and score_matrix[0][i] < 0:
+++              score_matrix[0][i] = 0


461      # The top and left borders of the matrices are special cases
462      # because there are no previously aligned characters.  To simplify
463      # the main loop, handle these separately.
464      for i in range(lenA):
465          # Align the first residue in sequenceB to the ith residue in
466          # sequence A.  This is like opening up i gaps at the beginning
467          # of sequence B.
468          score = match_fn(sequenceA[i], sequenceB[0])
469          if penalize_end_gaps:
470              score += calc_affine_penalty(
471                  i, open_B, extend_B, penalize_extend_when_opening)
472          score_matrix[i][0] = score
+++          if not align_globally and score_matrix[i][0] < 0:
+++              score_matrix[i][0] = 0
473      for i in range(1, lenB):
474          score = match_fn(sequenceA[0], sequenceB[i])
475          if penalize_end_gaps:
476              score += calc_affine_penalty(
477                  i, open_A, extend_A, penalize_extend_when_opening)
478          score_matrix[0][i] = score
+++          if not align_globally and score_matrix[0][i] < 0:
+++              score_matrix[0][i] = 0



On Thu, Sep 6, 2012 at 8:59 PM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> On Thu, Sep 6, 2012 at 9:31 PM, Jocelyne <jocelyne at gmail.com> wrote:
> > Hello:
> > First, I'd like to say that I really appreciate the effort of the
> community
> > to provide us with such a nice package.
> > I found some odd scoring behavior with the pairwise2 local alignment
> (see 5
> > below). I think these 2 alignments should have the same score.
>
> Hmm. I'm not overly familiar with this bit of the code, but did
> occur to me it might be something related to this open issue:
>
> https://redmine.open-bio.org/issues/2776
>
> I was able to repeat your pairwise2.align.localms example
> and the score matrix example a Mac using the latest code
> from github, and got the same answers. So (as I suspected)
> this does not seem to be a platform specific issue.
>
> Unfortunately the original author of this code (Jeff Chang)
> isn't active with Biopython anymore - we can try emailing
> him directly, but if you're willing to look into this in more
> detail and can propose a fix, I'm happy to take a look at
> merging it.
>
> Peter
>



More information about the Biopython mailing list