[Biopython-dev] [Bug 2657] Improved Bio/Statistics/lowess.py
bugzilla-daemon at portal.open-bio.org
bugzilla-daemon at portal.open-bio.org
Fri Nov 14 11:28:36 UTC 2008
http://bugzilla.open-bio.org/show_bug.cgi?id=2657
eric.pruitt at gmail.com changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |eric.pruitt at gmail.com
------- Comment #5 from eric.pruitt at gmail.com 2008-11-14 06:28 EST -------
(In reply to comment #3)
> I've updated CVS to use standard four space indentation, add a doctest and the
> copyright statement etc.
>
> James' code makes two code changes (shown against CVS revision 1.9).
>
> 67,68c67,68
> < h = [numpy.sort(abs(x-x[i]))[r] for i in range(n)]
> < w = numpy.clip(abs(([x]-numpy.transpose([x]))/h),0.0,1.0)
> ---
> > h = [numpy.sort(numpy.abs(x-x[i]))[r] for i in range(n)]
> > w = numpy.clip(numpy.abs(([x]-numpy.transpose([x]))/h),0.0,1.0)
>
> Due to the historic usage "from Numeric import *" this code did once use
> Numeric.abs here, so it makes sense to use numpy.abs now. Probably just an
> oversight from the recent Numeric/numpy conversion. This is another reminder
> that using "from XXX import *" is a bad idea.
>
> 76,80c76,82
> < b = numpy.array([sum(weights*y), sum(weights*y*x)])
> < A = numpy.array([[sum(weights), sum(weights*x)],
> < [sum(weights*x), sum(weights*x*x)]])
> < beta = numpy.linalg.solve(A,b)
> < yest[i] = beta[0] + beta[1]*x[i]
> ---
> > theta = weights*x
> > b_top = sum(weights*y)
> > b_bot = sum(theta*y)
> > a = sum(weights)
> > b = sum(theta)
> > d = sum(theta*x)
> > yest[i] = (d*b_top-b*b_bot+(a*b_bot-b*b_top)*x[i])/(a*d-b**2)
>
> I can see the point of calculating and caching these:
> weights*y
> weights*x
> sum(weights*x)
>
> Was there a good reason for the name theta for weights*x?
>
> I personally think using an explicit matrix solver is much nicer to read than
> that complex hand coded version. Does it really save much time?
>
> My suggestion is just:
> 76,78c76,81
> < b = numpy.array([sum(weights*y), sum(weights*y*x)])
> < A = numpy.array([[sum(weights), sum(weights*x)],
> < [sum(weights*x), sum(weights*x*x)]])
> ---
> > weights_x = weights*x
> > weights_y = weights*y
> > sum_weights_x = sum(weights_x)
> > b = numpy.array([sum(weights_y), sum(weights_y*x)])
> > A = numpy.array([[sum(weights), sum_weights_x],
> > [sum_weights_x, sum(weights_x*x)]])
>
> However, I'm going to leave this for Michiel to resolve (given he wrote the
> code in the first place).
>
Yes-- replacing numpy saves quite a bit of time. When I replaced the variable
so they werent recalculated every single time, it reduced unit test time 17%
compared to the original then repaklcing numpy receduced it to a net 38% from
the original so huge difference. Also, I suggest changing something if you all
decided to keep numpy. Minor but just a suggestion.
> weights_x = weights*x
> sum_weights_x = sum(weights_x)
> b = numpy.array([sum(weights*y), sum(weights_x*y)])
> A = numpy.array([[sum(weights), sum_weights_x],
> [sum_weights_x, sum(weights_x*x)]])
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
More information about the Biopython-dev
mailing list