[Dynamite] SingleModel
Ian Holmes
ihh@fruitfly.org
Sun, 5 Mar 2000 10:39:44 -0800 (PST)
Some comments
> / separate modules for the different models or not?
> // what about code sharing between them?
Probably should have separate modules. I don't quite understand this
module thing yet.
In fact.. now I think about it -- it seems fascistic to require that
things have to be in the same module to share internal representation. Not
to mention unworkable - surely any particular implementation of the
Dynamite IDL can do all the internal code-sharing it wants, e.g. by just
#including "super_generic_DP.h" for example?
I'm not really sure what's going on here, whether we're talking about a
coding style or a set of strict rules or what...
>
> module SingleModel { // Single means emits only one sequence
>
> interface State;
> interface Transition;
>
> typedef sequence<float> ProbabilityEmission;
== Alphabet::WeightVector.
No need to duplicate this.
>
> interface Transition {
> State from;
> State to;
> float transition_probability;
> ProbabilityEmission emission; // emission on the transitions.
As I said before I think parameters should be in a separate object.
If people don't like this then we could consider having a kind of memo
object that represents a parameterised model.
Possibly have a "boolean Transition::is_null" field for null transitions?
What about "fanned" transitions (i.e. if it's an "A" then go to state 1,
if it's a "G" then go to state 3, if it's a "C" go to state 4 etc)?
These can be inefficiently implemented just by having A times as many
Transitions (where A is the alphabet size) -- shall we just leave it at
this for now? (I think we probably should.)
> };
>
> typedef sequence<Transition> TransitionList;
>
> interface State {
> TransitionList all_Transitions();
> };
I think this method belongs in the model, not in an individual State.
i.e. Model should have the following methods:
sequence<Transition> outgoing_transitions (in State s);
sequence<Transition> all_transitions();
> typedef sequence<State> StateList;
I also think "SingleModel::State" should be a typedef to int, not an
interface of its own. States are lightweight things that are usually
treated as ints anyway.
This breaks down if we need to add too much information to a State.
However, with the parameters elsewhere, all we need is a name:
string state_name (in State s);
Incidentally (personal gripe) I dislike typedefs like the above one
("typedef sequence<State> StateList"). A sequence of States *IS* a "State
List" by *definition*, there is no need to typedef it; it looks like we
are generic-programming novices if we do. A typedef is acceptable (IMHO)
when it denotes a specialised *kind* of list, e.g.
typedef sequence<State> AlignmentPath;
typedef sequence<float> ProbabilityEmission;
I guess if we're using sequence<IncrediblyLongAndHardToTypeClassName> a
lot, then we might want to typedef it, but we should probably just choose
a shorter name for the class in the first place.
>
> interface Model {
> StateList all_States();
> };
To summarise the above edits:
interface Model {
sequence<State> all_states();
sequence<Transition> outgoing_transitions (in State s);
sequence<Transition> all_transitions();
string state_name (in State s);
};
>
> //
> // Have not done alignment yet
> //
>
> interface AlignmentFactory {
> attribute model Model;
> // also here a function pointer for compile-time function for this model
> Alignment make_alignment(in Seq seq);
I am not convinced that AlignmentFactory is a useful generalisation -- I
feel that ViterbiAlgorithm makes more sense.
Ian
>
> // can throw exceptions/errors of bad alphabet, other things...
> };
>
>
> }
>
--
Ian Holmes .... Howard Hughes Medical Institute .... ihh@fruitfly.org