[Biopython-dev] [Bug 2693] New: LogisticRegression convergence criterion is too lenient

Mon Dec 1 20:01:44 UTC 2008

http://bugzilla.open-bio.org/show_bug.cgi?id=2693

           Summary: LogisticRegression convergence criterion is too lenient
           Product: Biopython
           Version: Not Applicable
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P3
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: bsouthey at gmail.com

In R and SAS, the example in the code and tutorial provides the following
parameters:

Intercept =  18.9622
x1        =  -0.0714
x2        =   0.0444

By default, Bio/LogisticRegression.py defines the following parameters
    MAX_ITERATIONS = 500
    CONVERGE_THRESHOLD = 0.01

The convergence threshold is too lenient so the iterations terminate before the
expected values are obtained. Using more stringent criteria (CONVERGE_THRESHOLD
= 0.000000001) permits convergence to the R/SAS values provided MAX_ITERATIONS
is greater than 7761 with my system.

MAX_ITERATIONS and CONVERGE_THRESHOLD are fixed within
Bio/LogisticRegression.py module but should be part of the API for the train
function such as:
def train(xs, ys, update_fn=None, typecode=None, CONVERGE_THRESHOLD =
0.000000001, MAX_ITERATIONS=10000):

Note the algorithm used requires a large number of iterations and the train
function does not display the degree of convergence attained when
MAX_ITERATIONS is exceeded.

Jeffrey Whitaker provides Python code using an alternative algorithm: 
http://www.cdc.noaa.gov/people/jeffrey.s.whitaker/python/logistic_regression.py

Furthermore, the update_fn should also pass the previous likelihood or
difference is likelihood so the actual convergence can be seen. Really the
update_fn should be more general than this and be able to display more
information but the attached patches provides the previous llh (old_llik).
def show_progress(iteration, old_llh, loglikelihood):
    print "Iteration:", iteration, "Old", old_llh, "Log-likelihood function:",
loglikelihood, "Diff:", (old_llh-loglikelihood)

model = LogisticRegression.train(xs, ys, update_fn=show_progress)

-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.