[Biojava-l] Distribution
Matthew Pocock
mrp@sanger.ac.uk
Tue, 17 Apr 2001 16:54:15 +0100
Hi.
While writing the Distribution tutorial for the bootcamp, I noticed that
Distribution didn't actualy define a probability dencity funciton
because it does some trickery when handling ambiguity symbols. The
correct behavior is to sum the probability of each atomic symbol that
matches the ambiguity symbol and return that sum. This makes the
semantics of getWeight like - give me the probability that we observe
one of this set of symbols - rather than - give me the probability that
we observe one of this set of symbols given some null model. I think
this is a throw-back to the days before null-models realy existed.
Anyway, for DP with odds ratios the sum should give the expected result.
One up-side to this is that it makes Distribution play much better with
infinite sets like doubles - integrating Distribution over a range is
exactly what is expected now when handling an ambiguity symbol over
doubles that matches an interval (e.g. given the ambiguity symbol
[-Infinity, 10.0] we would integrate the associated probability dencity
function up to 10.o from -Infinity, wich is the normal meaning of
p(10.0) in stats).
If anybody disagrees, be vocal. The change is in CVS, but won't be
back-ported to 1.1 ever.
Matthew