[Biopython] bug with pairwise2 local alignments?
Peter Cock
p.j.a.cock at googlemail.com
Tue Sep 11 16:03:21 UTC 2012
Hi Jocelyne,
The reason for the C code is speed. The pure Python code is a fall
back for systems where you can't use this - for example PyPy or Jython.
To test your (pure Python) fix you'd have to comment out the C library
import line. Ideally we'd prefer a combined fix which also updates the
C implementation to match. Are you on Windows? That does complicate
this - whereas for the Mac or Linux (re)compiling Biopython from source
should be quite easy.
Peter
On Fri, Sep 7, 2012 at 7:15 AM, Jocelyne <jocelyne at gmail.com> wrote:
> Hi Peter:
> I added 4 lines of code in each snippet below (there are copies of the same
> code). I'm pretty sure it should fix it (there are copies of line 438-439,
> with the indexes changed). Basically, the previous code allowed for negative
> scores in the first row and column of the matrix, even in the case of local
> alignments (in which case scores shouldn't go negative). I didn't test it,
> so please make sure it works before merging.
>
> Also, it seems that it imports _make_score_matrix_fasta from a C library
> (line 851), which overload the corresponding python function, so that would
> have to be fixed too.
>
> Thanks!
> Jocelyne
>
>
>
> 378 # The top and left borders of the matrices are special cases
> 379 # because there are no previously aligned characters. To simplify
> 380 # the main loop, handle these separately.
> 381 for i in range(lenA):
> 382 # Align the first residue in sequenceB to the ith residue in
> 383 # sequence A. This is like opening up i gaps at the beginning
> 384 # of sequence B.
> 385 score = match_fn(sequenceA[i], sequenceB[0])
> 386 if penalize_end_gaps:
> 387 score += gap_B_fn(0, i)
> 388 score_matrix[i][0] = score
> +++ if not align_globally and score_matrix[0][i] < 0:
> +++ score_matrix[i][0] = 0
> 389 for i in range(1, lenB):
> 390 score = match_fn(sequenceA[0], sequenceB[i])
> 391 if penalize_end_gaps:
> 392 score += gap_A_fn(0, i)
> 393 score_matrix[0][i] = score
> +++ if not align_globally and score_matrix[0][i] < 0:
> +++ score_matrix[0][i] = 0
>
>
> 461 # The top and left borders of the matrices are special cases
> 462 # because there are no previously aligned characters. To simplify
> 463 # the main loop, handle these separately.
> 464 for i in range(lenA):
> 465 # Align the first residue in sequenceB to the ith residue in
> 466 # sequence A. This is like opening up i gaps at the beginning
> 467 # of sequence B.
> 468 score = match_fn(sequenceA[i], sequenceB[0])
> 469 if penalize_end_gaps:
> 470 score += calc_affine_penalty(
> 471 i, open_B, extend_B, penalize_extend_when_opening)
> 472 score_matrix[i][0] = score
> +++ if not align_globally and score_matrix[i][0] < 0:
> +++ score_matrix[i][0] = 0
> 473 for i in range(1, lenB):
> 474 score = match_fn(sequenceA[0], sequenceB[i])
> 475 if penalize_end_gaps:
> 476 score += calc_affine_penalty(
> 477 i, open_A, extend_A, penalize_extend_when_opening)
> 478 score_matrix[0][i] = score
> +++ if not align_globally and score_matrix[0][i] < 0:
> +++ score_matrix[0][i] = 0
>
>
>
> On Thu, Sep 6, 2012 at 8:59 PM, Peter Cock <p.j.a.cock at googlemail.com>
> wrote:
>>
>> On Thu, Sep 6, 2012 at 9:31 PM, Jocelyne <jocelyne at gmail.com> wrote:
>> > Hello:
>> > First, I'd like to say that I really appreciate the effort of the
>> > community
>> > to provide us with such a nice package.
>> > I found some odd scoring behavior with the pairwise2 local alignment
>> > (see 5
>> > below). I think these 2 alignments should have the same score.
>>
>> Hmm. I'm not overly familiar with this bit of the code, but did
>> occur to me it might be something related to this open issue:
>>
>> https://redmine.open-bio.org/issues/2776
>>
>> I was able to repeat your pairwise2.align.localms example
>> and the score matrix example a Mac using the latest code
>> from github, and got the same answers. So (as I suspected)
>> this does not seem to be a platform specific issue.
>>
>> Unfortunately the original author of this code (Jeff Chang)
>> isn't active with Biopython anymore - we can try emailing
>> him directly, but if you're willing to look into this in more
>> detail and can propose a fix, I'm happy to take a look at
>> merging it.
>>
>> Peter
>
>
More information about the Biopython
mailing list