[BioPython] Substitution matrices
Iddo Friedberg
idoerg@cc.huji.ac.il
Sat, 30 Sep 2000 12:58:01 +0300 (GMT+0300)
Hi all,
I've been thinking about implementing a class for substitution matrices,
(apart from BLOSUM/PAM , seems like many people are generating their own
matrices from various data).
I thought of a class with the following attributes:
Alphabet: an instantiation of any of the Bio/IUPAC alphabet classes
ablist: a sorted list of letters in the alphabet (useful).
data: a dictionary in the following format:
{(i1,j1): n1, (i1,j2):n2,..., (ik,jk):nk}
Where (i,j) is a sorted tuple of alphabet letters, and n is some value in a
substitution/accepted replacements/frequency matrix, etc.
methods:
fullToHalf: fullToHalf convert to a half matrix
entropy: calculate matrix's entropy
verify: see that matrix has all it needs.
functions:
Generating an accepted replacement matrix:
Maybe some interface with a blast object, if users would like to take
their replacement data from there. But basically, I think users would
be more inclined to write this themselves, to interface with their own
paiwise/multiple aligned data.
Generating the observed frequency matrix
From the accepted replacement matrix
Generating an expected frequency matrix
Either from a known genomic/universal standard, or from user's data. I
think a frequency table class is in order here.
Generating a substitution frequency matrix
Generating a log-odds matrix
Thoughts? Suggestions? Is anybody already doing this? (if so, then I'm
very sorry)
Cheers,
Iddo
--
/* --- */main(c){float t,x,y,b=-2,a=b;for(;b-=a>2?.1/(a=-2):0,b<2;
/* | */putchar(30+c),a+=.0503) for(x=y=c=0;++c<90&x*x+y*y<4;y=2*
/* | */x*y+b,x=t)t=x*x-y*y+a;}
/* --- ddo Friedberg */