[Biopython] bio.motifs P-value on pssm searches

Marco Galardini marco.galardini at unifi.it
Thu Nov 14 13:16:55 UTC 2013


Dear Bartek,

thanks for your prompt reply: I'll use the fpr threshold to filter the 
hits then. Thanks also for having clarified the meaning of the returned 
score.

Marco

On 11/14/2013 02:14 PM, Bartek Wilczynski wrote:
> Dear Marco,
>
> the score you mention is in fact a log-odds score. it represents a 
> logarithm of the ratio between the probability of the sequence in 
> question being generated from the motif or from a random generator.
>
> If you want to get some analog of a p-value (the probability of 
> obtaining a score of x or higher), you need to look into the score 
> distributions in the thresholds package. For example if you want to 
> know what score corresponds to a p-value of 0.05 for motif M you can do
>
> thresholds.ScoreDistribution(M).threshold_fpr(0.05)
>
> Please remember that the thresholds are computed approximately to a 
> given precision (in the scoreDistribution constructor).
>
> Naturally, if you are searching in a sequence of length 1000, you 
> should expect ~20 cases, for this given fpr.
>
> Hope that helps
> Bartek
>
>
> On Thu, Nov 14, 2013 at 1:30 PM, Marco Galardini 
> <marco.galardini at unifi.it <mailto:marco.galardini at unifi.it>> wrote:
>
>     Dear biopythoners,
>
>     the Bio.motifs search of PSSM is a really effective tool when
>     dealing with regulatory motifs. When searching a pssm in a DNA
>     sequence, a bit score is associated with each position; I was
>     wondering if you have any gotchas to obtain a P- or E-value from
>     such scores. I couldn't find any method in the package that does
>     that but maybe I've missed something.
>
>     Thanks for your help,
>     Marco
>
>     -- 
>     -------------------------------------------------
>     Marco Galardini, PhD
>     Dipartimento di Biologia
>     Via Madonna del Piano, 6 - 50019 Sesto Fiorentino (FI)
>
>     e-mail: marco.galardini at unifi.it <mailto:marco.galardini at unifi.it>
>     www: http://www.unifi.it/dblage/CMpro-v-p-51.html
>     phone: +39 055 4574737 <tel:%2B39%20055%204574737>
>     mobile: +39 340 2808041 <tel:%2B39%20340%202808041>
>     -------------------------------------------------
>
>     _______________________________________________
>     Biopython mailing list  - Biopython at lists.open-bio.org
>     <mailto:Biopython at lists.open-bio.org>
>     http://lists.open-bio.org/mailman/listinfo/biopython
>
>
>
>
> -- 
> Bartek Wilczynski
> ==================
> Institute of Informatics
> University of Warsaw
> http://www.mimuw.edu.pl/~bartek <http://www.mimuw.edu.pl/%7Ebartek>


-- 
-------------------------------------------------
Marco Galardini, PhD
Dipartimento di Biologia
Via Madonna del Piano, 6 - 50019 Sesto Fiorentino (FI)

e-mail: marco.galardini at unifi.it
www: http://www.unifi.it/dblage/CMpro-v-p-51.html
phone:  +39 055 4574737
mobile: +39 340 2808041
-------------------------------------------------




More information about the Biopython mailing list