[Bioperl-l] E-value of a combined alignment?

Ian Korf ik1 at sanger.ac.uk
Wed Sep 3 07:46:02 EDT 2003


There are several publications on combined statistical significance of 
local alignment scores. The ones implemented in BLAST are not exactly 
the same as the publications though. You can get a pretty decent 
approximation by subtracting log(KMN) for each gap, but this isn't the 
proper formula.

WU-BLAST is much better for combined statistics than NCBI-BLAST because 
it shows the actual groups with the -links parameter and allows you to 
limit the number of groups with the -topcomboN and -topcomboE 
parameters. It also lets you fine-tune the groupings a bit with -olmax 
and -olfmax. If the sequences aren't too diverged, you might be better 
off keeping X low though.

-Ian

On Wednesday, September 3, 2003, at 12:06 AM, Yee Man Chan wrote:

>
> Hi folks,
>
> 	I am aligning mRNAs against human genome using ungapped tblastx. I
> got a bunch of HSPs with different e-values. I can observed that some 
> of
> them should be in the same group because they are exons of a gene. But
> then what is the e-value of all these HSPs combined?
>
> 	I know the formulas of e-value and bit score for BLOSUM62:
>
> Let S' be bit score, S be score, e be e-value, m be the length of HSP,
> n be length of database.
>
> S' = (0.318 * S - ln(0.135)) / ln(2)
>
> e = m * n / (2^(S'))
>
> 	I am guessing the formula for the e-value of
> non-overlapping combined e-value to be:
>
> S'' = (0.318 * sum_of_S - ln(0.135)) / ln(2)
>
> e' = sum_of_m * n / (2^(S''))
>
> 	Is this correct? Or do you know the right way to calculate it?
>
> Thanks in advance.
> Yee Man
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>



More information about the Bioperl-l mailing list