[Biopython-dev] [Biopython (old issues only) - Bug #2693] LogisticRegression convergence criterion is too lenient
redmine at redmine.open-bio.org
redmine at redmine.open-bio.org
Wed Nov 9 04:29:12 UTC 2016
Issue #2693 has been updated by Vincent Davis.
This seems worth checking into more. I will create a github issue.
----------------------------------------
Bug #2693: LogisticRegression convergence criterion is too lenient
https://redmine.open-bio.org/issues/2693#change-15334
* Author: Bruce Southey
* Status: New
* Priority: Normal
* Assignee: Biopython Dev Mailing List
* Category: Main Distribution
* Target version: Not Applicable
* URL:
----------------------------------------
In R and SAS, the example in the code and tutorial provides the following parameters:
Intercept = 18.9622
x1 = -0.0714
x2 = 0.0444
By default, Bio/LogisticRegression.py defines the following parameters
MAX_ITERATIONS = 500
CONVERGE_THRESHOLD = 0.01
The convergence threshold is too lenient so the iterations terminate before the expected values are obtained. Using more stringent criteria (CONVERGE_THRESHOLD = 0.000000001) permits convergence to the R/SAS values provided MAX_ITERATIONS is greater than 7761 with my system.
MAX_ITERATIONS and CONVERGE_THRESHOLD are fixed within Bio/LogisticRegression.py module but should be part of the API for the train function such as:
def train(xs, ys, update_fn=None, typecode=None, CONVERGE_THRESHOLD = 0.000000001, MAX_ITERATIONS=10000):
Note the algorithm used requires a large number of iterations and the train function does not display the degree of convergence attained when MAX_ITERATIONS is exceeded.
Jeffrey Whitaker provides Python code using an alternative algorithm:
http://www.cdc.noaa.gov/people/jeffrey.s.whitaker/python/logistic_regression.py
Furthermore, the update_fn should also pass the previous likelihood or difference is likelihood so the actual convergence can be seen. Really the update_fn should be more general than this and be able to display more information but the attached patches provides the previous llh (old_llik).
def show_progress(iteration, old_llh, loglikelihood):
print "Iteration:", iteration, "Old", old_llh, "Log-likelihood function:", loglikelihood, "Diff:", (old_llh-loglikelihood)
model = LogisticRegression.train(xs, ys, update_fn=show_progress)
---Files--------------------------------
logreg.diff (2.34 KB)
--
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython-dev/attachments/20161109/9dccfa82/attachment.html>
More information about the Biopython-dev
mailing list