[Biopython] Statistical similarity in microarray data

Tue Feb 16 22:19:36 UTC 2010

Hi Peter,

> Up until recently, we were using a Pearson correlation (from 
> scipy.stats) but this assumes the data is normally distributed, which is 
> probably isn't. The correlations were a little unreliable.

A possible way would be using Spearman's rank correlation coefficient or Mutual Information.

> After a bit of digging, I tried using a Wilcoxon (also from 
> scipy.stats), but this seems to give high correlations for things it 
> shouldn't, like files that are different samples. It also seems to lack 
> precision. I get p-values of 0 quite a lot; even 1e-80 would reassure me 
> that something is really happening underneath.

I also noted some strange behaviour recently with scipy.stats module, precisely with Kruskal-Wallis. However I did not test it rigorously to assert a real problem. Try using RPy module. 

Good luck,
Fred

_________________________________________________________________
No Messenger você pode tranformar sua imagem de exibição num vídeo. Veja aqui!
http://www.windowslive.com.br/public/tip.aspx/view/97?product=2&ocid=Windows Live:Dicas - Imagem Dinamica:Hotmail:Tagline:1x1:Mexa-se