[Biopython-dev] [Bug 2657] Improved Bio/Statistics/lowess.py

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Fri Nov 14 11:11:26 UTC 2008


http://bugzilla.open-bio.org/show_bug.cgi?id=2657





------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk  2008-11-14 06:11 EST -------
I've updated CVS to use standard four space indentation, add a doctest and the
copyright statement etc.

James' code makes two code changes (shown against CVS revision 1.9).

67,68c67,68
<     h = [numpy.sort(abs(x-x[i]))[r] for i in range(n)]
<     w = numpy.clip(abs(([x]-numpy.transpose([x]))/h),0.0,1.0)
---
>     h = [numpy.sort(numpy.abs(x-x[i]))[r] for i in range(n)]
>     w = numpy.clip(numpy.abs(([x]-numpy.transpose([x]))/h),0.0,1.0)

Due to the historic usage "from Numeric import *" this code did once use
Numeric.abs here, so it makes sense to use numpy.abs now.  Probably just an
oversight from the recent Numeric/numpy conversion.  This is another reminder
that using "from XXX import *" is a bad idea.

76,80c76,82
<             b = numpy.array([sum(weights*y), sum(weights*y*x)])
<             A = numpy.array([[sum(weights),   sum(weights*x)],
<                        [sum(weights*x), sum(weights*x*x)]])
<             beta = numpy.linalg.solve(A,b)
<             yest[i] = beta[0] + beta[1]*x[i]
---
>             theta = weights*x
>             b_top = sum(weights*y)
>             b_bot = sum(theta*y)
>             a = sum(weights)
>             b = sum(theta)
>             d = sum(theta*x)
>             yest[i] = (d*b_top-b*b_bot+(a*b_bot-b*b_top)*x[i])/(a*d-b**2)

I can see the point of calculating and caching these:
weights*y
weights*x
sum(weights*x)

Was there a good reason for the name theta for weights*x?

I personally think using an explicit matrix solver is much nicer to read than
that complex hand coded version.  Does it really save much time?

My suggestion is just:
76,78c76,81
<             b = numpy.array([sum(weights*y), sum(weights*y*x)])
<             A = numpy.array([[sum(weights),   sum(weights*x)],
<                        [sum(weights*x), sum(weights*x*x)]])
---
>             weights_x = weights*x
>             weights_y = weights*y
>             sum_weights_x = sum(weights_x)
>             b = numpy.array([sum(weights_y), sum(weights_y*x)])
>             A = numpy.array([[sum(weights),   sum_weights_x],
>                        [sum_weights_x, sum(weights_x*x)]])

However, I'm going to leave this for Michiel to resolve (given he wrote the
code in the first place).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.



More information about the Biopython-dev mailing list