[Biojava-l] Parameter Settings in BaumWelchTraining]

mark.schreiber at group.novartis.com mark.schreiber at group.novartis.com
Fri Mar 12 01:00:43 EST 2004


When you call the train() method of the BaumWelchTrainer you supply it 
with a SequenceDB. The sequences from this DB are used to optimize the 
weights of the model.

However, I have a bad feeling that when you train your model with the 
BaumWelchTrainer your previously set counts will be ignored and 
overwritten. You could check by looking into AbstractModelTrainer.train() 
(which is what the BaumWelchTrainer extends). You could also run some 
tests to see if using a pre-trained model makes any difference to the 
final outcome. Does anyone more expert than me on the DP package (ie most 
people) know if the counts are overwritten?

- Mark





sacoca at mcb.mcgill.ca
Sent by: biojava-l-bounces at portal.open-bio.org
03/12/2004 01:30 PM

 
        To:     sacoca at mcb.mcgill.ca
        cc:     Biojava Mailing List <biojava-l at biojava.org>
        Subject:        Re: [Biojava-l] Parameter Settings in BaumWelchTraining]


Sorry for the previous error.
---------------------------- Original Message ----------------------------
Subject: Re: [Biojava-l] Parameter Settings in BaumWelchTraining
From:    sacoca at MCB.McGill.CA
Date:    Fri, March 12, 2004 12:27 am
To:      mark.schreiber at group.novartis.com
--------------------------------------------------------------------------

Here is the code I have for the training. Using what you told me below, I
can retreive all of the weights that I calculated manually for the hmm
(distributions for the transitions and distributions for the alphabet of 
each state). What I do not understand is how to use this information and
the sequences stored in a file to run the BaumWelchAlgorithm and then
retreive the optimized values calculated by the algorithm to set them back
into my HMM.

//Retreive the alphabet of all states
FiniteAlphabet SA = hmm.stateAlphabet();
Iterator i = SA.iterator();

SimpleModelTrainer MT = new SimpleModelTrainer();
MT.registerModel(hmm);

//go through each state
while(i.hasNext())
{Symbol Currentstate = (Symbol)i.next();

 //Retreive the distribution of all transitions from the current state
FiniteAlphabet From = hmm.transitionsFrom((State)Currentstate);
 Distribution d = hmm.getWeights((State)Currentstate);
 Iterator i2 = From.iterator();

 //go through it and look at all the weights for each of the transitions
while(i2.hasNext())
    {Symbol s = (Symbol)i2.next();
     System.out.println("From state "+Currentstate.getName()+
                        "To State   "+s.getName()+
                         "Weight     "+d.getWeight(s));}

 //get the distribution for the alphabet of the current state
 Distribution d2 =((EmissionState)Currentstate).getDistribution();
FiniteAlphabet IN = (FiniteAlphabet)hmm.emissionAlphabet();
 Iterator i3 = IN.iterator();
 //you can go through it the same way as above using a while loop
*****************************************************************
This is what I don't understand!!!!
*****************************************************************
here, we have a set of training sequences stored in a file in fasta format
that i'd like to use with the BaumWelch algorithm to optimize the
transition distributions mentionned above.

//This is the file with all the training sequences
BufferedInputStream is = new BufferedInputStream(new
FileInputStream("z:/Sequences.faa"));

//Load the file with the SequenceDB class
SequenceDB DB = SeqIOTools.readFasta(is, ProtAlphabet);

//use 100 cycles as the stop criteria
StoppingCriteria stopper = new StoppingCriteria()
     {public boolean isTrainingComplete(TrainingAlgorithm ta)
       {return (ta.getCycle() > 100);}};

*****************************************
This part is what I am clueless about
*****************************************
//How do I optimize my hmm with the BaumWelch algorithm and retreive //the
optimized values ? How do I train the distribution above with //the baum
welch and the sequences that I have ?
DP dp= DPFactory.DEFAULT.createDP(hmm);
BaumWelchTrainer bwt = new BaumWelchTrainer(dp);
}

PS : I do not know why you are helping all of us here but thank you. It
makes Biojava a lot easier to deal with.

Steve

> Hi Stephane -
>
> Within EmissionState you can set a Distribution that contains emission
probabilities for the Symbols states emission alphabet using the
setDistribution method. This Distribution will be your predetermined
weights.
>
> To set the transition probabilities you can use the setWeights(State
source, Distribution weights). The source is the state you are
> transitioning from and the weights is the probability of transitioning
to any State that the source connects too. Because States implement
Symbol you can put them in a Distribution.
>
> To make a Distribution of States that state 'a' could connect to use the
following pseudo code:
>
> State a;
> Model m;
> FiniteAlphabet endPoints;
>
> endPoints = m.transitionsFrom(a);
> Distribution d =
> DistributionFactory.DEFAULT.createDistribution(endPoints);
>
> //You can then train d or set it's weights and put it back in the model
with
>
> m.setWeights(a, d);
>
> Mark Schreiber
> Principal Scientist (Bioinformatics)
>
> Novartis Institute for Tropical Diseases (NITD)
> 1 Science Park Road
> #04-14 The Capricorn, Science Park II
> Singapore 117528
>
> phone +65 6722 2973
> fax  +65 6722 2910
>
>
>
>
>
> sacoca at mcb.mcgill.ca
> Sent by: biojava-l-bounces at portal.open-bio.org
> 03/12/2004 06:11 AM
>
>
>         To:     "Biojava Mailing List" <biojava-l at biojava.org>
>         cc:
>         Subject:        [Biojava-l] Parameter Settings in
> BaumWelchTraining
>
>
> Hi all. I'm trying to optimize the transition states probabilities for
my HMM. I already have set them to values which I think are pretty good.
Since I know the Baum Welch can only help with the scores and optimize
them up to a local maxima I thought of using the parameters I calculated
as a starting point. The problem is that I don't know how!
> I followed the example in biojava:
>
> ....
> //train the model to have uniform parameters
>     ModelTrainer mt = new SimpleModelTrainer();
>     //register the model to train
>     mt.registerModel(hmm);
>
> I want to use the values already set in my hmm  as the starting
parameters in the BaumWelch.  I don't want to use the uniform
distribution as indicated below!
>
>     //as no other counts are being used the null weight will cause
> everything to be uniform
>     mt.setNullModelWeight(1.0);
>     mt.train();
>
> I tried adding counts and looking up examples on the net but ended up
more confused than I started. How do I use the addCounts to make this
work!
>
> Stephane Acoca
> Master's Student
> McGill Center for Bioinformatics
>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at biojava.org
> http://biojava.org/mailman/listinfo/biojava-l
>
>
>



_______________________________________________
Biojava-l mailing list  -  Biojava-l at biojava.org
http://biojava.org/mailman/listinfo/biojava-l





More information about the Biojava-l mailing list