[Biopython-dev] [Biopython] Bio.motifs raising Exceptions using pypy
Peter Cock
p.j.a.cock at googlemail.com
Fri Jul 12 12:57:08 UTC 2013
On Fri, Jul 12, 2013 at 11:48 AM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>
> OK - this also breaks under Jython and even Python if we
> disable the C extension. Here self[letters] only has ACGT,
> not N, thus a key error. This is something the C code just
> ignores. There is also an inconsistency with mixed case.
>
> New unit test:
> https://github.com/biopython/biopython/commit/e13c97ae3535b58d8ec3da3fc565e97db1fa75a3
>
> Fix for the mixed case difference:
> https://github.com/biopython/biopython/commit/0cab00c66a1fd15072d020cfc17edbdfb37484a5
>
> The KeyError from bad characters can be handled like this:
>
> $ git diff
> diff --git a/Bio/motifs/matrix.py b/Bio/motifs/matrix.py
> index bce1d4f..e6446b5 100644
> --- a/Bio/motifs/matrix.py
> +++ b/Bio/motifs/matrix.py
> @@ -364,7 +364,11 @@ class PositionSpecificScoringMatrix(GenericPositionMatrix):
> score = 0.0
> for position in xrange(m):
> letter = sequence[i+position]
> - score += self[letter][position]
> + try:
> + score += self[letter][position]
> + except KeyError:
> + #The C code ignores unexpected letters like N
> + pass
> scores.append(score)
> else:
> # get the log-odds matrix into a proper shape
>
> However, that leaves a numerical difference in the output:
>
> ...
>
> The same error occurs on Jython, and on Python if I disable
> the C extension. This needs a little more investigation... I
> don't immediately follow when the C code sets the value
> to nan.
Rereading the C code after lunch I realised how the 'ok' sentinel
value was being used - bad letters result in NaN as the value.
Fixed,
https://github.com/biopython/biopython/commit/00043d28bdf5408519cb4832d6a8e822d10f6653
Peter
More information about the Biopython-dev
mailing list