[Biopython-dev] [Biopython (old issues only) - Bug #2693] LogisticRegression convergence criterion is too lenient

redmine at redmine.open-bio.org redmine at redmine.open-bio.org
Wed Nov 9 04:29:12 UTC 2016


Issue #2693 has been updated by Vincent Davis.


This seems worth checking into more. I will create a github issue.

----------------------------------------
Bug #2693: LogisticRegression convergence criterion is too lenient
https://redmine.open-bio.org/issues/2693#change-15334

* Author: Bruce Southey
* Status: New
* Priority: Normal
* Assignee: Biopython Dev Mailing List
* Category: Main Distribution
* Target version: Not Applicable
* URL: 
----------------------------------------
In R and SAS, the example in the code and tutorial provides the following parameters:

Intercept =  18.9622
x1        =  -0.0714
x2        =   0.0444

By default, Bio/LogisticRegression.py defines the following parameters
    MAX_ITERATIONS = 500
    CONVERGE_THRESHOLD = 0.01

The convergence threshold is too lenient so the iterations terminate before the expected values are obtained. Using more stringent criteria (CONVERGE_THRESHOLD = 0.000000001) permits convergence to the R/SAS values provided MAX_ITERATIONS is greater than 7761 with my system.

MAX_ITERATIONS and CONVERGE_THRESHOLD are fixed within Bio/LogisticRegression.py module but should be part of the API for the train function such as:
def train(xs, ys, update_fn=None, typecode=None, CONVERGE_THRESHOLD = 0.000000001, MAX_ITERATIONS=10000):

Note the algorithm used requires a large number of iterations and the train function does not display the degree of convergence attained when MAX_ITERATIONS is exceeded.

Jeffrey Whitaker provides Python code using an alternative algorithm: 
http://www.cdc.noaa.gov/people/jeffrey.s.whitaker/python/logistic_regression.py

Furthermore, the update_fn should also pass the previous likelihood or difference is likelihood so the actual convergence can be seen. Really the update_fn should be more general than this and be able to display more information but the attached patches provides the previous llh (old_llik).
def show_progress(iteration, old_llh, loglikelihood):
    print "Iteration:", iteration, "Old", old_llh, "Log-likelihood function:", loglikelihood, "Diff:", (old_llh-loglikelihood)

model = LogisticRegression.train(xs, ys, update_fn=show_progress)

---Files--------------------------------
logreg.diff (2.34 KB)


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython-dev/attachments/20161109/9dccfa82/attachment.html>


More information about the Biopython-dev mailing list