[Biopython-dev] [Bug 2947] New: Bio.HMM calculates wrong viterbi path

Fri Nov 6 10:05:07 EST 2009

http://bugzilla.open-bio.org/show_bug.cgi?id=2947

           Summary: Bio.HMM calculates wrong viterbi path
           Product: Biopython
           Version: 1.47
          Platform: PC
        OS/Version: Windows XP
            Status: NEW
          Severity: normal
          Priority: P2
         Component: BioSQL
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: georg.lipps at fhnw.ch

Hi,

I have tested the Bio.HMM with some simple code (see below). However the
results of the viterbi path calculation are wrong (apparently they depend upon
the order of the state alphabet).
I also do not understand the number/score/probability returned along with the
viterbi path.

Greetings,

Georg

from Bio.HMM import MarkovModel, Trainer

## definition of the alphabets for hidden states and emissions
class coin:
    def __init__(self):
        self.letters = ["u", "f"]
        # must be a single letter alphabet or raises an error in viterbi
class outcome:
    def __init__(self):
        self.letters = ["head", "tail"]

coin=coin()
outcome=outcome()

## initialize HMM model 
build=MarkovModel.MarkovModelBuilder(coin, outcome)
build.allow_all_transitions()
build.set_equal_probabilities()

## build HMM model with test frequencies
build.set_transition_score("u", "f", 0.05)
build.set_transition_score("f", "u", 0.05)
build.set_transition_score("f", "f", 0.95)
build.set_transition_score("u", "u", 0.95)
build.set_emission_score("f", "tail", 0.5)
build.set_emission_score("f", "head", 0.5)
build.set_emission_score("u", "tail", 0.75)
build.set_emission_score("u", "head", 0.25)
model=build.get_markov_model()

print "Emission probabilites:\n", model.emission_prob
print "Transitions probabilities:\n", model.transition_prob, "\n"

observed_emissions=["tail"]*2
viterbi=model.viterbi(observed_emissions, coin)
seq=viterbi[0]
prob=viterbi[1]

print "============= Calculation of the most probable path"
## does not work for very short observations
## calculated path is dependant upon order in states alphabet
print observed_emissions
print seq
print prob, "\n"

OUTPUT:

Emission probabilites:
{('u', 'head'): 0.25, ('f', 'head'): 0.5, ('f', 'tail'): 0.5, ('u', 'tail'):
0.75}
Transitions probabilities:
{('f', 'u'): 0.050000000000000003, ('u', 'f'): 0.050000000000000003, ('u',
'u'): 0.94999999999999996, ('f', 'f'): 0.94999999999999996} 

============= Calculation of the most probable path
['tail', 'tail']
ff
4.46028871308

This is certainly not true, since the most probable path would be uu
(unfair/unfair)
When the sequence of observation is longer, e.g six the following results are
obtained:

============= Calculation of the most probable path
['tail', 'tail', 'tail', 'tail', 'tail', 'tail']
uuuuuf
13.1325601923

Which is still not true as the last coin should still be u (unfair).

Interestingly when the order of the state alphabet is changed, i.e.:

class coin:
    def __init__(self):
        self.letters = ["f", "u"]

the output is correct.

============= Calculation of the most probable path
['tail', 'tail', 'tail', 'tail', 'tail', 'tail']
uuuuuu
6.09287667828

Thus it appears to me that the viterbi algorithm is not robust enough and
biased towards the last letter of the state alphabet.

-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.