[Bioperl-l] Hidden Markov Model in Bioperl?

Aaron J. Mackey amackey at pcbi.upenn.edu
Mon Mar 28 08:11:33 EST 2005


Yes, in bioperl-ext, of course ...

On Mar 25, 2005, at 6:49 PM, Yee Man Chan wrote:

> 	I am thinking of an interface like this:
>
> Bio::Tools::HMM->new("symbols", "states")
> - instantiate an HMM object with a string of symbols (each character
> corresponds to one symbol) and a string of states. Other parameters of 
> the
> model is generated randomly. Good for starting a Baum-Welch training.

Why not expand this to be two arrayrefs of symbols or states?  You can 
convert them into whatever encoded single-char alphabet you'd like.  
Think Perl, not C.  This is a feature request, not a requirement, of 
course.

> Bio::Tools::HMM->ObsSeqProb("string of observed sequence")
> - return the probability of an observed sequence.

This is the Forward algorithm P()?  Perhaps an alias to Forward(), and 
the ability to specify an offset/index at which you want the Forward 
value (see below)?  Or is this the product of viterbi factors?

> Bio::Tools::HMM->Viterbi("string of observed sequence")
> - return a string of hidden sequence that maximize the probability of 
> the
> happening of the observed sequence.

this might also return the P() of the viterbi path; and again, instead 
of returning string of symbols, an arrayref of symbols.

> Bio::Tools::HMM->getInitArray()
> Bio::Tools::HMM->getStateMatrix()
> Bio::Tools::HMM->getEmissionMatrix()

Presumably these should be get/set methods?

What's missing is 1) posterior decoding and 2) partial path probability 
(i.e. F_{i}*v_{i+1}*v+{i+2}*...v*_{j-1}*B_{j}/F_{x}, where i < j, F and 
B are Forward and Backward values, v's are viterbi factors for each 
step in the partial path specified from i to j)

I'd also prefer lower case names (BaumWelch could just be called 
"train" or "learn_unsupervised" or somesuch)

Also, see the HMM functions available in Matlab that do the same ...

Good luck,

-Aaron

--
Aaron J. Mackey, Ph.D.
Dept. of Biology, Goddard 212
University of Pennsylvania       email:  amackey at pcbi.upenn.edu
415 S. University Avenue         office: 215-898-1205
Philadelphia, PA  19104-6017     fax:    215-746-6697



More information about the Bioperl-l mailing list