[BioPython] Substitution matrices

Iddo Friedberg idoerg@cc.huji.ac.il
Sat, 30 Sep 2000 12:58:01 +0300 (GMT+0300)


Hi all,

I've been thinking about implementing a class for substitution matrices,
(apart from BLOSUM/PAM , seems like many people are generating their own
matrices from various data).

I thought of a class with the following attributes:

Alphabet: an instantiation of any of the Bio/IUPAC alphabet classes
ablist: a sorted list of letters in the alphabet (useful).
data: a dictionary in the following format:
{(i1,j1): n1, (i1,j2):n2,..., (ik,jk):nk}
Where (i,j) is a sorted tuple of alphabet letters, and n is some value in a
substitution/accepted replacements/frequency matrix, etc.

methods:
fullToHalf: fullToHalf convert to a half matrix
entropy: calculate matrix's entropy
verify: see that matrix has all it needs.

functions:
Generating an accepted replacement matrix:
Maybe some interface with a blast object, if users would like to take
their replacement data from there. But basically, I think users would
be more inclined to write this themselves, to interface with their own
paiwise/multiple aligned data.


Generating the observed frequency matrix
   From the accepted replacement matrix
Generating an expected frequency matrix
   Either from a known genomic/universal standard, or from user's data. I
   think a frequency table class is in order here.

Generating a substitution frequency matrix
Generating a log-odds matrix

Thoughts? Suggestions? Is anybody already doing this? (if so, then I'm
very sorry)


Cheers,

Iddo


--

/* --- */main(c){float t,x,y,b=-2,a=b;for(;b-=a>2?.1/(a=-2):0,b<2;
/*  |  */putchar(30+c),a+=.0503) for(x=y=c=0;++c<90&x*x+y*y<4;y=2*
/*  |  */x*y+b,x=t)t=x*x-y*y+a;}
/* --- ddo Friedberg */