[Biojava-dev] Extensions to DP framework to permit 2-head training

Sun Jan 4 17:13:36 EST 2004

I have written a working Viterbi trainer that is capable of 1 and 2 head 
training and hope to commit to CVS. The current API does not permit 2-head 
training and will need changes to accomodate this.

I propose the following changes to permit it:-

a) introduction of a TrainingSet interface: In 2-head training, pairs of 
sequences need to be available.  I have generalised it into a means of 
supplying n sequences per case so we can deal with n-head training when 20 
gazillion Hz processors with 5 bazillion byte RAM become available.

public interface TrainingSet
{
    public interface Iterator
    {
        /**
         * get next group of sequences to train the model on.
         */
        public Sequence[] next();

        /**
         * any further training sequence groups?
         */
        public boolean hasNext();
    }

    /**
     * get an iterator for the cases supplied by this TrainingSet.
     */
    public Iterator getCases();
}

b) changes to TrainingAlgorithm interface:-

The current train method takes a SequenceDB which only works for 1-head 
training.  I propose a further method that takes a TrainingSet and 
deprecating the current method.

This change will break code that derives from AbstractTrainer.  But I think 
I can cursorily patch up the AbstractTrainer and current BaumWelch code to 
add the new call.  I do not propose extending the BW code to handling 2-D 
training at this stage.

c) changes to AbstractTrainer class:-
AbstractTrainer supplies a 

protected abstract double singleSequenceIteration(ModelTrainer trainer, 
SymbolList symList)

I propose changing it to:-
protected abstract double singleSequenceIteration(ModelTrainer trainer, 
SymbolList [] symList)

Perhaps to avoid breaking too much, I should create a NewAbstractTrainer 
class with the new method and derive my ViterbiTrainer from it instead

Comments are requested.

Regards,
David Huen