[Dynamite] Score module

Ian Holmes ihh@fruitfly.org
Sun, 5 Mar 2000 19:45:17 -0800 (PST)


I think we also need a Score module (or some such name).

One concern I have is about how we represent scores; I have been bitten
quite hard with Handel/kimono, assuming that it will be OK to represent
scores as ints rounded to the nearest 1/1000 bit (cf HMMER), then hitting
overflow problems when you start adding together the scores for all 6,000
sequences in the yeast genome for example...

On the other hand it clearly makes sense to keep scores as integers within
a DP routine. The solution I am trying within Handel/kimono is to convert
to (floating point) log-likelihoods for large sums.

Here's the IDL:

module Score {
  
  // the following typedefs are included for clarity

  typedef float Probability; // probability space
  typedef float BitScore;    // log to base 2
  typedef float NatScore;    // log to base e
  typedef int   IntScore;    // internal score (or "integer score" if you prefer)

  // NB the IntScore-to-BitScore conversion factor is not explicitly exposed by the following interface.
  
  interface Scheme {

    // conversions that are independent of internal format (i.e. between probabilities, bits & nats)

    attribute BitScore bits_infinity;  // largest value that any .*2bits() call will return
    attribute NatScore nats_infinity;  // largest value that any .*2nats() call will return

    BitScore    prob2bits (Probability p);
    BitScore    nats2bits (NatScore n);

    NatScore    prob2nats (Probability p);
    NatScore    bits2nats (BitScore b);

    Probability bits2prob (BitScore b);
    Probability nats2prob (NatScore n);

    // conversions that aren't independent of internal format

    attribute IntScore ints_infinity;       // largest value that any .*2ints() call will return
    // NB the three infinities (bits_infinity, nats_infinity, ints_infinity) do NOT have to be "in sync"
    // i.e. there is no guarantee that ints2bits(ints_infinity) == bits_infinity, etc.

    BitScore    ints2bits (IntScore i);
    NatScore    ints2nats (IntScore i);
    Probability ints2prob (IntScore i);     // implementation detail: we may want to use a lookup table for this

    IntScore    prob2ints (Probability p);
    IntScore    bits2ints (BitScore b);
    IntScore    nats2ints (NatScore n);

    // fast math functions

    IntScore    prob_sum (IntScore i1, IntScore i2);    // == prob2ints (ints2prob(i1) + ints2prob(i2))
    // implementation detail: should use a lookup table, to speed up the Forward algorithm

    // In my codebase, I also have an algorithm for doing a maximum-accuracy probability-space sum over
    // an arbitrarily long list of IntScores by repeatedly extracting the lowest two scores, summing them,
    // and re-inserting the result into the (sorted) list.
    // This is safer than just cycling through the list and summing, since if the difference between the
    // two scores i1 & i2 in prob_sum() exceeds the size of the lookup table then roundoff error will occur.
    // I don't think we need to worry about this yet -- but just so it's documented... -ihh
  };
}




-- 
Ian Holmes  ....  Howard Hughes Medical Institute  ....  ihh@fruitfly.org