[Biojava-dev] ModelInState fixed?
David Huen
smh1008 at cus.cam.ac.uk
Tue Apr 15 13:17:28 EDT 2003
On Tuesday 15 Apr 2003 12:05 pm, Matthew Pocock wrote:
>
> Now for the next one - states that emit more than one
> symbol. At the moment, one state emits 1 symbol at a
> time. This makes the code simple. It sucks for things
> like aligning dna to protein as the DNA inserts want
> to be nucleotides but the dna-protein matches want to
> be codons. This can be fixed. The advance arrays don't
> need to contain values of just 0 or 1 - they could for
> example be 3. This has a knock-on for the emission
> alphabet in that now it emits both nucleotides and
> trinucleotides, but that's fixable. To make this work,
> we need to update the DP cursors to be aware that they
> have to store more than just the last one column.
>
The emission distribution would have to be over a compound alphabet too,
e.g. (DNA x Protein) or of more interest to me ((DNAxDNAxDNA)
x(DNAxDNAxDNA)). Under these circumstances, the alphabet of the model
needs to take on board the possibility that the state alphabet may be of a
higher order than the model alphabet.
Regards,
David Huen
More information about the biojava-dev
mailing list