[Biojava-dev] Modifications to DistributionTools.java

Schreiber, Mark mark.schreiber at agresearch.co.nz
Tue Feb 25 11:33:38 EST 2003


Sorry to not get bak to you earlier (been in Singapore for the
Hackathon).

My understanding is that log (1/p) is log odds which is not shannon
entropy (Shannon 1948) I may be wrong on this but it not could the
method be changed back and a new log odds method be added to do what you
are calculting here.

Under shannon's theory a coin can only hold one bit of information.

- Mark


> -----Original Message-----
> From: Lachlan Coin [mailto:lc1 at sanger.ac.uk] 
> Sent: Thursday, 13 February 2003 11:17 p.m.
> To: biojava-dev at biojava.org
> Subject: [Biojava-dev] Modifications to DistributionTools.java
> 
> 
> Hi,
> 
> I have been using DistributionTools.java and wanted to commit 
> a few changes.  If noone particularly objects, then I will go 
> ahead and commit these
> 
>   - I have added a method:
> public Distribution jointDistributionOverAlignment(Alignment a,
>                                boolean countGaps,
>                                double nullWeight, int[] cols)
>     this just returns the joint distribution of several 
> columns in the alignment.  It is useful for calculating 
> mutual information for two columns in an alignment
> 
>   -  I have changed
> 	public Map shannonEntropy(distribution observed, double logBase)
> 
> 	 currently this creates a symbol->entropy map, and in 
> the map it puts p*log(1/p).  I think it is more natural to 
> put log(1/p) in the map, as this is a reflection of the 
> uncertainty of a particular outcome (the other is the 
> weighted uncertainty).  I.e. if we have a weighted coin with 
> 0.1% probability of heads, then  a head carries log(10) bits 
> of information.  I have also set things up so that the Map 
> only has entries for symbols which have non-zero probability.
> 
>    - Consequently, I have also changed
> 
> 	public double bitsOfInformation(Distribution observed)
> 
> 	as it reliead on shannonEntropy to calculate this.  It 
> now calculates the shannonEntropy map, and takes the average 
> of the values in this map, weighted according to the 
> probability according to the observed distribution.
> 
> 
> I have also added some jUnit tests to test these methods.
> 
> Thanks,
> 
> Lachlan
> 
> 
> 
> -------------------------------------------------------------
> Lachlan Coin
> Wellcome Trust Sanger Institute		Magdalene College
> Cambridge  CB10 1SA			Cambridge CB30AG
> Ph: +44 1223 494 820
> Fax: +44 1223 494 919
> ------------------------------------------------------------
> 
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at biojava.org 
> http://biojava.org/mailman/listinfo/biojava-dev
> 
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================



More information about the biojava-dev mailing list