[Dynamite] SingleModel
Ewan Birney
birney@ebi.ac.uk
Mon, 6 Mar 2000 04:48:38 +0000 (GMT)
>
> interface SingleTransitionParameters {
> float transition_probability;
> Alphabet::WeightVector emission_probability;
Grrrrr. <minor> Can't we call this Alphabet::ProbabilityVector? What's
wrong with sensible names?
> }
>
> interface SingleModelParameters {
> SingleTransitionParameters get_parameters (in Transition t);
> sequence<SingleTransitionParameters> all_parameters();
> // possibly also:
> // sequence<SingleTransitionParameters> outgoing_parameters (in State s);
> }
Ok. I think I see this. Basically I like this.
>
> The easiest way to keep the parameters & the model in sync is to stipulate
> that the sequence<SingleTransitionParameters> returned by
> SingleModelParameters::all_parameters() is indexed by the same index as
> the sequence<Transition> returned by the Model::all_transitions() method.
> Ditto the outgoing_parameters() method -- if you see what I mean.
>
> I regard this as perfectly valid coding practise as long as it's WELL
> documented. We could even incoporate a sanity check, by having a
> "Transition* my_transition" field in SingleTransitionParameters.
>
More worried about growing/shrinking the model. Perhaps that is a later
thing to think about.
> The next option for keeping the model & parameters in sync is to use the
> get_parameters(Transition) method in SingleModelParameters. If we don't
> use a ParameterisedModelMemento pattern (as I suggested above), then we
> care quite a lot about how fast these lookups are. We would have several
> implementation options:
I don't see this ParameterisedModelMemento pattern ... is this the
parallel arrays you are suggesting above?
>
> (0) Linear array of Transitions, searchable by brute force in
> time O(M) where M is the model size
> (1) Sorted array of Transitions, searchable by binary chop in time
> O(log M)
> (2) Large M*M array (so, O(M^2) memory)
> (3) Some kind of hashing on Transitions
>
> This may seem like a lot of effort (maybe this is what you meant by mad
> gymnastics -- I've got used to the STL doing all these algorithms for
> you!).
>
Gymnastics not this but the parallel array stuff. Ok as long as it is
documented. Sort of gives me the heebie-jeebies though.
> However, I really think the parameters are a separate thing from the
> model. I want to convince you.... Think about training, think about
> parameterising HMMs & GeneWise; even think about models as simple as Smith
> Waterman where the number of parameters is less than the number of
> transitions... think about Fisher kernels (where we will also want to have
> a parameter-like datastructure)... this is a somewhat intuitive feeling
> on my part, so I don't want to hammer it in without a consensus.
>
I have the same gut feeling. So - I think this is good...
> OK, the low-level-algorithmic rationale may have been wrong; but the point
> is I don't think states have any useful internal data (except their name),
> and the only thing you need to be able to DO with them is to test whether
> State s1 == State s2.
>
If the parameters are somewhere else, then yes.
But will we want to inheriet off them for things like draw-able models.
(position, colour etc). I don't see what we loose by making them
objects...
> > force of habit: Most (all?) IDL compilers bitch about using sequence<X>
> > outside of a typedef. Annoying eh?
>
> yes indeed... oh well.
> Perhaps call them XSequence to be consistent?
>
Name clash
Sequence -> biological polymer
Sequence -> sequence in IDL
We can't call them XSequence. It will confuse the fuck out of joe
bioinformatics guy. Has to be List or... Vector or something...
> Ian
>