[Biojava-l] HMM
Andreas Draeger
andreas.draeger at uni-tuebingen.de
Sun Jan 28 22:01:10 UTC 2007
Hi,
I do have a question regarding HMMs. I created a custom HMM following
the Dice example on the web site
(http://www.biojava.org/wiki/BioJava:Tutorial:Dynamic_programming_examples).
It works fine and I can ether generate sequences or the corresponding
state path. However, I would like to train the model and to get the
probabilities that a certain sequence was produced by this model. I
tried the following:
try {
DP dp = DPFactory.DEFAULT.createDP(createMyModel());
StatePath obs_rolls = dp.generate(4);
SymbolList roll_sequence = obs_rolls
.symbolListForLabel(StatePath.SEQUENCE);
SymbolList[] res_array = { roll_sequence };
StatePath v = dp.viterbi(res_array, ScoreType.PROBABILITY);
BaumWelchTrainer bwt = new BaumWelchTrainer(dp);
StoppingCriteria sc = new StoppingCriteria() {
public boolean isTrainingComplete(TrainingAlgorithm arg0) {
if (arg0.getCycle() > 100)
//if (Math.abs(arg0.getLastScore() - arg0.getCurrentScore()) < 0.5)
return true;
return false;
}
};
try {
BufferedReader br = new BufferedReader(new FileReader(args[0]));
SequenceDB db = new HashSequenceDB();
myAlphabet.putTokenization("token", new
NameTokenization(myAlphabet, true));
while (br.ready()) {
String line = br.readLine();
SymbolList sym = new
SimpleSymbolList(myAlphabet.getTokenization("token"), line);
db.addSequence(new SimpleSequence(sym, "",
line.replaceAll(" ", ""), Annotation.EMPTY_ANNOTATION));
}
bwt.train(db, 0.1, sc);
for (Iterator i=db.ids().iterator(); i.hasNext(); ) {
Sequence seq = db.getSequence(i.next().toString());
System.out.println(seq.seqString()+"\tprobability\t"+
bwt.getDP().forward(new SymbolList[] {seq},
ScoreType.PROBABILITY));
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} catch (ChangeVetoException e) {
e.printStackTrace();
}
SymbolList realstates = obs_rolls.symbolListForLabel(StatePath.STATES);
SymbolList realsymbols =
obs_rolls.symbolListForLabel(StatePath.SEQUENCE);
SymbolList states = v.symbolListForLabel(StatePath.STATES);
SymbolList symbols = v.symbolListForLabel(StatePath.SEQUENCE);// */
System.out.println("Output:\t" + realsymbols.seqString());
System.out.println("Position:\t" + realstates.seqString());
System.out.println("Probability:\t" + dp.forward(new
SymbolList[] {realsymbols}, ScoreType.PROBABILITY));
} catch (IllegalArgumentException e) {
e.printStackTrace();
} catch (BioException e) {
e.printStackTrace();
}
In createMyModel() I create my costum model, which is a modified
version of the aforementioned example.
When I comment the line bwt.train(db, 0.1, sc); the output of the line
System.out.println("Probability:\t" + dp.forward(new SymbolList[]
{realsymbols}, ScoreType.PROBABILITY));
will give negative probabilies like
Probability: -5.851716517873089
otherwise (when I use the BaumWelchTrainer) the probabilities will
even be NaN.
What is the meaning of this? Why are the probabilities not between 0
and 1 and why does the BaumWelchTrainer produce NaN values?
So my question is: how can I get the probability that the HMM emitts a
given sequence and how can I train the HMM properly?
I appreciate every answer!
Cheers
Andreas
---
Dipl.-Bioinform. Andreas Dräger
Eberhard Karls University Tübingen
Center for Bioinformatics (ZBIT)
Sand 1
72076 Tübingen
Germany
Phone: +49-7071-29-70436
Fax: +49-7071-29-5091
More information about the Biojava-l
mailing list