[Biopython-dev] [Bug 2657] Improved Bio/Statistics/lowess.py

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Fri Nov 14 11:28:36 UTC 2008


http://bugzilla.open-bio.org/show_bug.cgi?id=2657


eric.pruitt at gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |eric.pruitt at gmail.com




------- Comment #5 from eric.pruitt at gmail.com  2008-11-14 06:28 EST -------
(In reply to comment #3)
> I've updated CVS to use standard four space indentation, add a doctest and the
> copyright statement etc.
> 
> James' code makes two code changes (shown against CVS revision 1.9).
> 
> 67,68c67,68
> <     h = [numpy.sort(abs(x-x[i]))[r] for i in range(n)]
> <     w = numpy.clip(abs(([x]-numpy.transpose([x]))/h),0.0,1.0)
> ---
> >     h = [numpy.sort(numpy.abs(x-x[i]))[r] for i in range(n)]
> >     w = numpy.clip(numpy.abs(([x]-numpy.transpose([x]))/h),0.0,1.0)
> 
> Due to the historic usage "from Numeric import *" this code did once use
> Numeric.abs here, so it makes sense to use numpy.abs now.  Probably just an
> oversight from the recent Numeric/numpy conversion.  This is another reminder
> that using "from XXX import *" is a bad idea.
> 
> 76,80c76,82
> <             b = numpy.array([sum(weights*y), sum(weights*y*x)])
> <             A = numpy.array([[sum(weights),   sum(weights*x)],
> <                        [sum(weights*x), sum(weights*x*x)]])
> <             beta = numpy.linalg.solve(A,b)
> <             yest[i] = beta[0] + beta[1]*x[i]
> ---
> >             theta = weights*x
> >             b_top = sum(weights*y)
> >             b_bot = sum(theta*y)
> >             a = sum(weights)
> >             b = sum(theta)
> >             d = sum(theta*x)
> >             yest[i] = (d*b_top-b*b_bot+(a*b_bot-b*b_top)*x[i])/(a*d-b**2)
> 
> I can see the point of calculating and caching these:
> weights*y
> weights*x
> sum(weights*x)
> 
> Was there a good reason for the name theta for weights*x?
> 
> I personally think using an explicit matrix solver is much nicer to read than
> that complex hand coded version.  Does it really save much time?
> 
> My suggestion is just:
> 76,78c76,81
> <             b = numpy.array([sum(weights*y), sum(weights*y*x)])
> <             A = numpy.array([[sum(weights),   sum(weights*x)],
> <                        [sum(weights*x), sum(weights*x*x)]])
> ---
> >             weights_x = weights*x
> >             weights_y = weights*y
> >             sum_weights_x = sum(weights_x)
> >             b = numpy.array([sum(weights_y), sum(weights_y*x)])
> >             A = numpy.array([[sum(weights),   sum_weights_x],
> >                        [sum_weights_x, sum(weights_x*x)]])
> 
> However, I'm going to leave this for Michiel to resolve (given he wrote the
> code in the first place).
> 

Yes-- replacing numpy saves quite a bit of time. When I replaced the variable
so they werent recalculated every single time, it reduced unit test time 17%
compared to the original then repaklcing numpy receduced it to a net 38% from
the original so huge difference. Also, I suggest changing something if you all
decided to keep numpy. Minor but just a suggestion.

>             weights_x = weights*x
>             sum_weights_x = sum(weights_x)
>             b = numpy.array([sum(weights*y), sum(weights_x*y)])
>             A = numpy.array([[sum(weights),   sum_weights_x],
>                        [sum_weights_x, sum(weights_x*x)]])


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.



More information about the Biopython-dev mailing list