[Bioperl-l] E-value of a combined alignment?

Ian Korf ik1 at sanger.ac.uk
Wed Sep 3 13:33:48 EDT 2003


> I believe that this is actually the behavior of NCBI's BLASTP.  All of 
> the
> HSP's in a hit get the same evalue, which is about what you would get 
> if
> you summed the bit scores of the HSPs and then calculated a final 
> evalue.

This is definitely not what BLASTP or any other BLAST does. If this was 
the case, you could sum up the scores for highly insignificant HSPs 
(e.g. those with an E-value of 1.0) and come up with a very good 
E-value. The log(KMN) penalty for each HSP subtracts the background 
expected alignment score [every search has a score with an E-value of 
1.0, and this is log(KMN) in the limit of large sequences]. Combining 
alignments is not so straightforward if you want the HSPs to be 
consistent (e.g. the N-termini match and the C-termini match rather 
than the N-terminus matching the C-terminus). In this case, one must 
evaluate all HSPs to compare the overlaps. Since this is a quadratic 
operation, it doesn't scale well to large sequences. Setting high 
values of single-HSP cutoffs helps offset the cost as does gapped 
alignment, which produces fewer HSPs. The cutoff value is hard-coded in 
NCBI-BLAST but not WU-BLAST (parameters are S2 and gapS2).

> If "p" scores were really probabilities, we could combine them using 
> the
> formulas for either dependent or independent events.  Has anyone tried
> this?

There's loads of literature on this topic already. The papers are 
mostly theoretical though, and do not really concern themselves with 
the practicality of biological sequences. Finite sequence lengths pose 
some problems. For example, the log(KMN) expected score is a little too 
high. BLAST therefore uses some heuristics to bring this down. The code 
in NCBI-BLAST that does this is a little frightening and I've no idea 
what WU-BLAST does though it seems to take length into account in some 
manner.

This stuff (and a whole lot more) is discussed in the O'Reilly BLAST 
book (sorry for the shameless plug).

-Ian

>
> -Chris Dwan
>  CCGB, University of Minnesota.
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>



More information about the Bioperl-l mailing list