[Bioperl-l] IUPAC support for DNA alignment

Fri Jun 27 21:31:55 UTC 2008

On Jun 27, 2008, at 1:35 PM, Yee Man Chan wrote:

>
> Hi guys
>
> 	What about providing two switches; one for full score and one for
> probabilistic score?
>
> Assume match is +3 and mismatch -1
>
> Full score version:
> 1) T - U = +3 (I assume U is the same as T for alignment purpose,  
> right?)

Right.

>
> 2) A - W = +3
> 3) A - D = +3
> 4) A - N = +3
> 5) A - X = -1 (not so sure about this one)
>
> Probabilistic score version:
> 1) T - U = +3
> 2) A - W = +3/2-1/2 = +1
> 3) A - D = +3/3-1*2/3 = +1/3
> 4) A - N = +3/4-1*3/4 = 0
> 5) A - X = -1

Note that there are also M, R, V, and H, and their complements (which  
by definition would not match your example of 'A').

Note also that the above implicitly assumes 50% GC content or equal  
likelihood of the code-constituent bases, which in reality for most  
coding sequences is not true.

Also, if you have a known polymorphism at the site, for 3-letter  
ambiguities not all 3 may be equally likely. For example, if you have  
letter D for a [A/G] SNP, one may not want to give 1/3 of weight to  
possibility T.

I would at least allow for the possibility to assign expected base  
frequencies and weight the ambiguous possibilities by those.

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================