[Biojava-dev] Modifications to DistributionTools.java

Matthew Pocock matthew_pocock at yahoo.co.uk
Fri Feb 14 20:55:10 EST 2003


Lachlan Coin wrote:
> Hi,
> 
> I have been using DistributionTools.java and wanted to commit a few
> changes.  If noone particularly objects, then I will go ahead and commit
> these
> 
>   - I have added a method:
> public Distribution jointDistributionOverAlignment(Alignment a,
>                                boolean countGaps,
>                                double nullWeight, int[] cols)
>     this just returns the joint distribution of several columns in the
> alignment.  It is useful for calculating mutual information for two
> columns in an alignment

Sounds great. Commit this one.

> 
>   -  I have changed
> 	public Map shannonEntropy(distribution observed, double logBase)
> 
> 	 currently this creates a symbol->entropy map, and in the map it
> puts p*log(1/p).  I think it is more natural to put log(1/p) in the map,
> as this is a reflection of the uncertainty of a particular outcome (the
> other is the weighted uncertainty).  I.e. if we have a weighted coin with
> 0.1% probability of heads, then  a head carries log(10) bits of
> information.  I have also set things up so that the Map only has entries
> for symbols which have non-zero probability.

Not sure about this. My information theory is ropey at best, but I 
thought that the information of a probability was (- p * log (p)) or 
equivalently p * log (1/p) but perhaps I'm wrong. Could someone who 
knows tell me?

> 
>    - Consequently, I have also changed
> 
> 	public double bitsOfInformation(Distribution observed)
> 
> 	as it reliead on shannonEntropy to calculate this.  It now
> calculates the shannonEntropy map, and takes the average of the values in
> this map, weighted according to the probability according to the observed
> distribution.
> 
> 
> I have also added some jUnit tests to test these methods.
> 
> Thanks,
> 
> Lachlan
> 
> 
> 
> -------------------------------------------------------------
> Lachlan Coin
> Wellcome Trust Sanger Institute		Magdalene College
> Cambridge  CB10 1SA			Cambridge CB30AG
> Ph: +44 1223 494 820
> Fax: +44 1223 494 919
> ------------------------------------------------------------
> 
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at biojava.org
> http://biojava.org/mailman/listinfo/biojava-dev
> 


-- 
BioJava Consulting LTD - Support and training for BioJava
http://www.biojava.co.uk



More information about the biojava-dev mailing list