[Biojava-dev] Extensions to DP framework to permit 2-head training
David Huen
david.huen at ntlworld.com
Sun Jan 4 17:13:36 EST 2004
I have written a working Viterbi trainer that is capable of 1 and 2 head
training and hope to commit to CVS. The current API does not permit 2-head
training and will need changes to accomodate this.
I propose the following changes to permit it:-
a) introduction of a TrainingSet interface: In 2-head training, pairs of
sequences need to be available. I have generalised it into a means of
supplying n sequences per case so we can deal with n-head training when 20
gazillion Hz processors with 5 bazillion byte RAM become available.
public interface TrainingSet
{
public interface Iterator
{
/**
* get next group of sequences to train the model on.
*/
public Sequence[] next();
/**
* any further training sequence groups?
*/
public boolean hasNext();
}
/**
* get an iterator for the cases supplied by this TrainingSet.
*/
public Iterator getCases();
}
b) changes to TrainingAlgorithm interface:-
The current train method takes a SequenceDB which only works for 1-head
training. I propose a further method that takes a TrainingSet and
deprecating the current method.
This change will break code that derives from AbstractTrainer. But I think
I can cursorily patch up the AbstractTrainer and current BaumWelch code to
add the new call. I do not propose extending the BW code to handling 2-D
training at this stage.
c) changes to AbstractTrainer class:-
AbstractTrainer supplies a
protected abstract double singleSequenceIteration(ModelTrainer trainer,
SymbolList symList)
I propose changing it to:-
protected abstract double singleSequenceIteration(ModelTrainer trainer,
SymbolList [] symList)
Perhaps to avoid breaking too much, I should create a NewAbstractTrainer
class with the new method and derive my ViterbiTrainer from it instead
Comments are requested.
Regards,
David Huen
More information about the biojava-dev
mailing list