[Biopython] bio.motifs P-value on pssm searches
Marco Galardini
marco.galardini at unifi.it
Thu Nov 14 13:16:55 UTC 2013
Dear Bartek,
thanks for your prompt reply: I'll use the fpr threshold to filter the
hits then. Thanks also for having clarified the meaning of the returned
score.
Marco
On 11/14/2013 02:14 PM, Bartek Wilczynski wrote:
> Dear Marco,
>
> the score you mention is in fact a log-odds score. it represents a
> logarithm of the ratio between the probability of the sequence in
> question being generated from the motif or from a random generator.
>
> If you want to get some analog of a p-value (the probability of
> obtaining a score of x or higher), you need to look into the score
> distributions in the thresholds package. For example if you want to
> know what score corresponds to a p-value of 0.05 for motif M you can do
>
> thresholds.ScoreDistribution(M).threshold_fpr(0.05)
>
> Please remember that the thresholds are computed approximately to a
> given precision (in the scoreDistribution constructor).
>
> Naturally, if you are searching in a sequence of length 1000, you
> should expect ~20 cases, for this given fpr.
>
> Hope that helps
> Bartek
>
>
> On Thu, Nov 14, 2013 at 1:30 PM, Marco Galardini
> <marco.galardini at unifi.it <mailto:marco.galardini at unifi.it>> wrote:
>
> Dear biopythoners,
>
> the Bio.motifs search of PSSM is a really effective tool when
> dealing with regulatory motifs. When searching a pssm in a DNA
> sequence, a bit score is associated with each position; I was
> wondering if you have any gotchas to obtain a P- or E-value from
> such scores. I couldn't find any method in the package that does
> that but maybe I've missed something.
>
> Thanks for your help,
> Marco
>
> --
> -------------------------------------------------
> Marco Galardini, PhD
> Dipartimento di Biologia
> Via Madonna del Piano, 6 - 50019 Sesto Fiorentino (FI)
>
> e-mail: marco.galardini at unifi.it <mailto:marco.galardini at unifi.it>
> www: http://www.unifi.it/dblage/CMpro-v-p-51.html
> phone: +39 055 4574737 <tel:%2B39%20055%204574737>
> mobile: +39 340 2808041 <tel:%2B39%20340%202808041>
> -------------------------------------------------
>
> _______________________________________________
> Biopython mailing list - Biopython at lists.open-bio.org
> <mailto:Biopython at lists.open-bio.org>
> http://lists.open-bio.org/mailman/listinfo/biopython
>
>
>
>
> --
> Bartek Wilczynski
> ==================
> Institute of Informatics
> University of Warsaw
> http://www.mimuw.edu.pl/~bartek <http://www.mimuw.edu.pl/%7Ebartek>
--
-------------------------------------------------
Marco Galardini, PhD
Dipartimento di Biologia
Via Madonna del Piano, 6 - 50019 Sesto Fiorentino (FI)
e-mail: marco.galardini at unifi.it
www: http://www.unifi.it/dblage/CMpro-v-p-51.html
phone: +39 055 4574737
mobile: +39 340 2808041
-------------------------------------------------
More information about the Biopython
mailing list