[Biopython] bug with pairwise2 local alignments?

Peter Cock p.j.a.cock at googlemail.com
Tue Sep 11 16:03:21 UTC 2012


Hi Jocelyne,

The reason for the C code is speed. The pure Python code is a fall
back for systems where you can't use this - for example PyPy or Jython.

To test your (pure Python) fix you'd have to comment out the C library
import line. Ideally we'd prefer a combined fix which also updates the
C implementation to match. Are you on Windows? That does complicate
this - whereas for the Mac or Linux (re)compiling Biopython from source
should be quite easy.

Peter

On Fri, Sep 7, 2012 at 7:15 AM, Jocelyne <jocelyne at gmail.com> wrote:
> Hi Peter:
> I added 4 lines of code in each snippet below (there are copies of the same
> code). I'm pretty sure it should fix it (there are copies of line 438-439,
> with the indexes changed). Basically, the previous code allowed for negative
> scores in the first row and column of the matrix, even in the case of local
> alignments (in which case scores shouldn't go negative). I didn't test it,
> so please make sure it works before merging.
>
> Also, it seems that it imports _make_score_matrix_fasta from a C library
> (line 851), which overload the corresponding python function, so that would
> have to be fixed too.
>
> Thanks!
> Jocelyne
>
>
>
> 378      # The top and left borders of the matrices are special cases
> 379      # because there are no previously aligned characters.  To simplify
> 380      # the main loop, handle these separately.
> 381      for i in range(lenA):
> 382          # Align the first residue in sequenceB to the ith residue in
> 383          # sequence A.  This is like opening up i gaps at the beginning
> 384          # of sequence B.
> 385          score = match_fn(sequenceA[i], sequenceB[0])
> 386          if penalize_end_gaps:
> 387              score += gap_B_fn(0, i)
> 388          score_matrix[i][0] = score
> +++          if not align_globally and score_matrix[0][i] < 0:
> +++              score_matrix[i][0] = 0
> 389      for i in range(1, lenB):
> 390          score = match_fn(sequenceA[0], sequenceB[i])
> 391          if penalize_end_gaps:
> 392              score += gap_A_fn(0, i)
> 393          score_matrix[0][i] = score
> +++          if not align_globally and score_matrix[0][i] < 0:
> +++              score_matrix[0][i] = 0
>
>
> 461      # The top and left borders of the matrices are special cases
> 462      # because there are no previously aligned characters.  To simplify
> 463      # the main loop, handle these separately.
> 464      for i in range(lenA):
> 465          # Align the first residue in sequenceB to the ith residue in
> 466          # sequence A.  This is like opening up i gaps at the beginning
> 467          # of sequence B.
> 468          score = match_fn(sequenceA[i], sequenceB[0])
> 469          if penalize_end_gaps:
> 470              score += calc_affine_penalty(
> 471                  i, open_B, extend_B, penalize_extend_when_opening)
> 472          score_matrix[i][0] = score
> +++          if not align_globally and score_matrix[i][0] < 0:
> +++              score_matrix[i][0] = 0
> 473      for i in range(1, lenB):
> 474          score = match_fn(sequenceA[0], sequenceB[i])
> 475          if penalize_end_gaps:
> 476              score += calc_affine_penalty(
> 477                  i, open_A, extend_A, penalize_extend_when_opening)
> 478          score_matrix[0][i] = score
> +++          if not align_globally and score_matrix[0][i] < 0:
> +++              score_matrix[0][i] = 0
>
>
>
> On Thu, Sep 6, 2012 at 8:59 PM, Peter Cock <p.j.a.cock at googlemail.com>
> wrote:
>>
>> On Thu, Sep 6, 2012 at 9:31 PM, Jocelyne <jocelyne at gmail.com> wrote:
>> > Hello:
>> > First, I'd like to say that I really appreciate the effort of the
>> > community
>> > to provide us with such a nice package.
>> > I found some odd scoring behavior with the pairwise2 local alignment
>> > (see 5
>> > below). I think these 2 alignments should have the same score.
>>
>> Hmm. I'm not overly familiar with this bit of the code, but did
>> occur to me it might be something related to this open issue:
>>
>> https://redmine.open-bio.org/issues/2776
>>
>> I was able to repeat your pairwise2.align.localms example
>> and the score matrix example a Mac using the latest code
>> from github, and got the same answers. So (as I suspected)
>> this does not seem to be a platform specific issue.
>>
>> Unfortunately the original author of this code (Jeff Chang)
>> isn't active with Biopython anymore - we can try emailing
>> him directly, but if you're willing to look into this in more
>> detail and can propose a fix, I'm happy to take a look at
>> merging it.
>>
>> Peter
>
>



More information about the Biopython mailing list