[Bioperl-l] p-value, e-value

Ian Korf ikorf@sapiens.wustl.edu
Tue, 8 Jan 2002 09:03:14 -0600 (CST)


The relationship between P and E is:

   P = 1 - e^-E

For small values, E is indistinguishable from P. So for any really good
hit, E and P will have the same value.

E is a positive real number. It is the expected number of alignments of
some score in a search space of some size with some scoring system.

P is the probability of an alignment with some score in a search space...

So for really lousy hits (or small ones) you would be better off
distinguishing them with E rather than P as it's much easier to see the
difference between 10 and 20 rather than 0.99995 and 0.99999. I think
this is why Ewan thinks E is more sensible than P.

While it is true that when you use gapped alignments you are no longer in
the strict Karlin-Altschul realm (really, lots of features of biological
sequences besides gaps violate K-A assumptions), you can get a good
estimate of lambda with simulations and then look them up later. WU-BLAST
has tables of scoring matrices and gap penalties built in which allow it
to adjust lambda and therefore E and P. This is really important because
lambda can change quite a bit if you change you gap penalties.

I'm not sure where the confusion about E and P is coming from, but
possibly it is from combined statistical significance which assigns a
P-value to not just one HSP but multiple HSPs from the same subject. With
WU-BLAST (recent ones anyway), you can have you choice between Poisson,
Sum (preferred), and turning off combinations with -kap.

-Ian

On Tue, 8 Jan 2002, Jonathan Epstein wrote:

> I think this is mostly correct, Ewan.  If memory serves, the p-values no longer became statistically meaningful when gapped alignments were introduced in BLAST 2.  It's possible that the P-values might be meaningful if you specify that an ungapped alignment should be performed.
>
> -Jonathan
>
> At 01:41 AM 1/8/2002 , Ewan Birney wrote:
> >I think the only thing which actually produces just P-values is BLAST 1
> >series (am I wrong?) and I think we should just document that we put the
> >Pvalue in the Evalue slot as it is also 10 year old software now and
> >people should have moved on from the BLAST 1 series by now.
>
>
>
> Jonathan Epstein                                Jonathan_Epstein@nih.gov
> Head, Unit on Biologic Computation              (301)402-4563
> Office of the Scientific Director               Bldg 31, Room 2A47
> Nat. Inst. of Child Health & Human Development  31 Center Drive
> National Institutes of Health                   Bethesda, MD 20892
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>