[Biopython-dev] [Bug 2947] New: Bio.HMM calculates wrong viterbi path
bugzilla-daemon at portal.open-bio.org
bugzilla-daemon at portal.open-bio.org
Fri Nov 6 15:05:07 UTC 2009
http://bugzilla.open-bio.org/show_bug.cgi?id=2947
Summary: Bio.HMM calculates wrong viterbi path
Product: Biopython
Version: 1.47
Platform: PC
OS/Version: Windows XP
Status: NEW
Severity: normal
Priority: P2
Component: BioSQL
AssignedTo: biopython-dev at biopython.org
ReportedBy: georg.lipps at fhnw.ch
Hi,
I have tested the Bio.HMM with some simple code (see below). However the
results of the viterbi path calculation are wrong (apparently they depend upon
the order of the state alphabet).
I also do not understand the number/score/probability returned along with the
viterbi path.
Greetings,
Georg
from Bio.HMM import MarkovModel, Trainer
## definition of the alphabets for hidden states and emissions
class coin:
def __init__(self):
self.letters = ["u", "f"]
# must be a single letter alphabet or raises an error in viterbi
class outcome:
def __init__(self):
self.letters = ["head", "tail"]
coin=coin()
outcome=outcome()
## initialize HMM model
build=MarkovModel.MarkovModelBuilder(coin, outcome)
build.allow_all_transitions()
build.set_equal_probabilities()
## build HMM model with test frequencies
build.set_transition_score("u", "f", 0.05)
build.set_transition_score("f", "u", 0.05)
build.set_transition_score("f", "f", 0.95)
build.set_transition_score("u", "u", 0.95)
build.set_emission_score("f", "tail", 0.5)
build.set_emission_score("f", "head", 0.5)
build.set_emission_score("u", "tail", 0.75)
build.set_emission_score("u", "head", 0.25)
model=build.get_markov_model()
print "Emission probabilites:\n", model.emission_prob
print "Transitions probabilities:\n", model.transition_prob, "\n"
observed_emissions=["tail"]*2
viterbi=model.viterbi(observed_emissions, coin)
seq=viterbi[0]
prob=viterbi[1]
print "============= Calculation of the most probable path"
## does not work for very short observations
## calculated path is dependant upon order in states alphabet
print observed_emissions
print seq
print prob, "\n"
OUTPUT:
Emission probabilites:
{('u', 'head'): 0.25, ('f', 'head'): 0.5, ('f', 'tail'): 0.5, ('u', 'tail'):
0.75}
Transitions probabilities:
{('f', 'u'): 0.050000000000000003, ('u', 'f'): 0.050000000000000003, ('u',
'u'): 0.94999999999999996, ('f', 'f'): 0.94999999999999996}
============= Calculation of the most probable path
['tail', 'tail']
ff
4.46028871308
This is certainly not true, since the most probable path would be uu
(unfair/unfair)
When the sequence of observation is longer, e.g six the following results are
obtained:
============= Calculation of the most probable path
['tail', 'tail', 'tail', 'tail', 'tail', 'tail']
uuuuuf
13.1325601923
Which is still not true as the last coin should still be u (unfair).
Interestingly when the order of the state alphabet is changed, i.e.:
class coin:
def __init__(self):
self.letters = ["f", "u"]
the output is correct.
============= Calculation of the most probable path
['tail', 'tail', 'tail', 'tail', 'tail', 'tail']
uuuuuu
6.09287667828
Thus it appears to me that the viterbi algorithm is not robust enough and
biased towards the last letter of the state alphabet.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
More information about the Biopython-dev
mailing list