[Biopython-dev] [Bug 2629] Updated Bio.NaiveBayes to listfns import

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Wed Nov 5 13:18:22 UTC 2008


http://bugzilla.open-bio.org/show_bug.cgi?id=2629





------- Comment #10 from mdehoon at ims.u-tokyo.ac.jp  2008-11-05 08:18 EST -------
See
http://coreygoldberg.blogspot.com/2008/07/python-counting-items-in-list.html
for some timings of this operation. I think Bruce's approach is most suitable,
except for the dict update method; I would use
        content_freqs[cval] = content_freqs.get(cval,0)+p_contents
instead. Depending on the contents of the list, sometimes it runs even faster
than the implementation in listfns.
> 
> Given the possible rounding issues, does doing the rescaling (dividing by the
> number of elements) at the start make a big time saving (over dividing each
> total at the end)?  I would feel happier with the division at the end (as done
> in the listfns code).
> 
I think the rescaling at the start is a good thing. If the list contains many
different objects, rescaling at the end can take a long time. Probably that is
not the typical use case here, but on the other hand I don't see a good reason
not to save time here.

Maybe just my nitpicking, but I think the get_content_freq function will be
more readable if we use different variable names inside this function.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.



More information about the Biopython-dev mailing list