[Biojava-l] Parameter Settings in BaumWelchTraining]
sacoca at MCB.McGill.CA
sacoca at MCB.McGill.CA
Fri Mar 12 00:30:16 EST 2004
Sorry for the previous error.
---------------------------- Original Message ----------------------------
Subject: Re: [Biojava-l] Parameter Settings in BaumWelchTraining
From: sacoca at MCB.McGill.CA
Date: Fri, March 12, 2004 12:27 am
To: mark.schreiber at group.novartis.com
--------------------------------------------------------------------------
Here is the code I have for the training. Using what you told me below, I
can retreive all of the weights that I calculated manually for the hmm
(distributions for the transitions and distributions for the alphabet of
each state). What I do not understand is how to use this information and
the sequences stored in a file to run the BaumWelchAlgorithm and then
retreive the optimized values calculated by the algorithm to set them back
into my HMM.
//Retreive the alphabet of all states
FiniteAlphabet SA = hmm.stateAlphabet();
Iterator i = SA.iterator();
SimpleModelTrainer MT = new SimpleModelTrainer();
MT.registerModel(hmm);
//go through each state
while(i.hasNext())
{Symbol Currentstate = (Symbol)i.next();
//Retreive the distribution of all transitions from the current state
FiniteAlphabet From = hmm.transitionsFrom((State)Currentstate);
Distribution d = hmm.getWeights((State)Currentstate);
Iterator i2 = From.iterator();
//go through it and look at all the weights for each of the transitions
while(i2.hasNext())
{Symbol s = (Symbol)i2.next();
System.out.println("From state "+Currentstate.getName()+
"To State "+s.getName()+
"Weight "+d.getWeight(s));}
//get the distribution for the alphabet of the current state
Distribution d2 =((EmissionState)Currentstate).getDistribution();
FiniteAlphabet IN = (FiniteAlphabet)hmm.emissionAlphabet();
Iterator i3 = IN.iterator();
//you can go through it the same way as above using a while loop
*****************************************************************
This is what I don't understand!!!!
*****************************************************************
here, we have a set of training sequences stored in a file in fasta format
that i'd like to use with the BaumWelch algorithm to optimize the
transition distributions mentionned above.
//This is the file with all the training sequences
BufferedInputStream is = new BufferedInputStream(new
FileInputStream("z:/Sequences.faa"));
//Load the file with the SequenceDB class
SequenceDB DB = SeqIOTools.readFasta(is, ProtAlphabet);
//use 100 cycles as the stop criteria
StoppingCriteria stopper = new StoppingCriteria()
{public boolean isTrainingComplete(TrainingAlgorithm ta)
{return (ta.getCycle() > 100);}};
*****************************************
This part is what I am clueless about
*****************************************
//How do I optimize my hmm with the BaumWelch algorithm and retreive //the
optimized values ? How do I train the distribution above with //the baum
welch and the sequences that I have ?
DP dp= DPFactory.DEFAULT.createDP(hmm);
BaumWelchTrainer bwt = new BaumWelchTrainer(dp);
}
PS : I do not know why you are helping all of us here but thank you. It
makes Biojava a lot easier to deal with.
Steve
> Hi Stephane -
>
> Within EmissionState you can set a Distribution that contains emission
probabilities for the Symbols states emission alphabet using the
setDistribution method. This Distribution will be your predetermined
weights.
>
> To set the transition probabilities you can use the setWeights(State
source, Distribution weights). The source is the state you are
> transitioning from and the weights is the probability of transitioning
to any State that the source connects too. Because States implement
Symbol you can put them in a Distribution.
>
> To make a Distribution of States that state 'a' could connect to use the
following pseudo code:
>
> State a;
> Model m;
> FiniteAlphabet endPoints;
>
> endPoints = m.transitionsFrom(a);
> Distribution d =
> DistributionFactory.DEFAULT.createDistribution(endPoints);
>
> //You can then train d or set it's weights and put it back in the model
with
>
> m.setWeights(a, d);
>
> Mark Schreiber
> Principal Scientist (Bioinformatics)
>
> Novartis Institute for Tropical Diseases (NITD)
> 1 Science Park Road
> #04-14 The Capricorn, Science Park II
> Singapore 117528
>
> phone +65 6722 2973
> fax +65 6722 2910
>
>
>
>
>
> sacoca at mcb.mcgill.ca
> Sent by: biojava-l-bounces at portal.open-bio.org
> 03/12/2004 06:11 AM
>
>
> To: "Biojava Mailing List" <biojava-l at biojava.org>
> cc:
> Subject: [Biojava-l] Parameter Settings in
> BaumWelchTraining
>
>
> Hi all. I'm trying to optimize the transition states probabilities for
my HMM. I already have set them to values which I think are pretty good.
Since I know the Baum Welch can only help with the scores and optimize
them up to a local maxima I thought of using the parameters I calculated
as a starting point. The problem is that I don't know how!
> I followed the example in biojava:
>
> ....
> //train the model to have uniform parameters
> ModelTrainer mt = new SimpleModelTrainer();
> //register the model to train
> mt.registerModel(hmm);
>
> I want to use the values already set in my hmm as the starting
parameters in the BaumWelch. I don't want to use the uniform
distribution as indicated below!
>
> //as no other counts are being used the null weight will cause
> everything to be uniform
> mt.setNullModelWeight(1.0);
> mt.train();
>
> I tried adding counts and looking up examples on the net but ended up
more confused than I started. How do I use the addCounts to make this
work!
>
> Stephane Acoca
> Master's Student
> McGill Center for Bioinformatics
>
> _______________________________________________
> Biojava-l mailing list - Biojava-l at biojava.org
> http://biojava.org/mailman/listinfo/biojava-l
>
>
>
More information about the Biojava-l
mailing list