[Biojava-l] Expasy pI calculation algorythm
George Waldon
gwaldon at geneinfinity.org
Sat Apr 2 00:11:02 UTC 2011
Hello,
Sorry if this comes a bit late; we had to solve some email issues -
Thanks again to Andreas for doing it.
This is part of the email exchange I had with Christine Hoogland and
Gregoire Rossier a few years ago regarding the algorithm used by
"Compute pI/Mw" on the Expazy server. The code which was given to me
is included at the end of this email; I used it to update bj1.
Good luck to all GSoC candidates,
George
On Tue, May 22, 2007 at 9:26 AM, Christine Hoogland via RT
<tools at expasy.org> wrote:
Dear George,
Please find enclosed the algorithm we are using on ExPASy.
I hope this helps.
Best regards
Christine
>
> The pK values used for "Compute pI/Mw" can be found in
>
> # Bjellqvist, B.,Hughes, G.J., Pasquali, Ch., Paquet, N., Ravier, F.,
> Sanchez, J.-Ch., Frutiger, S. & Hochstrasser, D.F. The focusing
> positions of polypeptides in immobilized pH gradients can be predicted
> from their amino acid sequences. Electrophoresis 1993, 14, 1023-1031.
>
> MEDLINE: 8125050
>
> # Bjellqvist, B., Basse, B., Olsen, E. and Celis, J.E. Reference
> points
> for comparisons of two-dimensional maps of proteins from different
> human
> cell types defined in a pH scale where isoelectric points correlate
> with
> polypeptide compositions. Electrophoresis 1994, 15, 529-539.
>
> MEDLINE: 8055880
>
> The pK were defined by examining polypeptide migration between pH 4.5
> to
> 7.3 in an immobilised pH gradient gel environment with 9.2M and 9.8M
> urea at 15ºC or 25ºC. Prediction of protein pI for highly basic
> proteins
> is yet to be studied and it is possible that current Compute pI/Mw
> predictions may not be adequate for this purpose.
>
> I hope this helps.
>
>
> Best regards
> Gregoire Rossier
>
>
--------------------------------------------------------
Christine Hoogland
Swiss Institute of Bioinformatics
CMU - 1, rue Michel Servet Tel. (+41 22) 379 58 28
CH - 1211 Geneva 4 Switzerland Fax (+41 22) 379 58 58
Christine.Hoogland at isb-sib.ch http://www.expasy.org/
--------------------------------------------------------
// VERSION : 1.6
// DATE : 1/25/95
// Copyright 1993 by Swiss Institute of Bioinformatics. All
rights reserved.
//
// Table of pk values :
// Note: the current algorithm does not use the last two columns. Each
// row corresponds to an amino acid starting with Ala. J, O and U are
// inexistant, but here only in order to have the complete alphabet.
//
// Ct Nt Sm Sc Sn
//
static double cPk[26][5] = {
3.55, 7.59, 0. , 0. , 0. , // A
3.55, 7.50, 0. , 0. , 0. , // B
3.55, 7.50, 9.00 , 9.00 , 9.00 , // C
4.55, 7.50, 4.05 , 4.05 , 4.05 , // D
4.75, 7.70, 4.45 , 4.45 , 4.45 , // E
3.55, 7.50, 0. , 0. , 0. , // F
3.55, 7.50, 0. , 0. , 0. , // G
3.55, 7.50, 5.98 , 5.98 , 5.98 , // H
3.55, 7.50, 0. , 0. , 0. , // I
0.00, 0.00, 0. , 0. , 0. , // J
3.55, 7.50, 10.00, 10.00, 10.00 , // K
3.55, 7.50, 0. , 0. , 0. , // L
3.55, 7.00, 0. , 0. , 0. , // M
3.55, 7.50, 0. , 0. , 0. , // N
0.00, 0.00, 0. , 0. , 0. , // O
3.55, 8.36, 0. , 0. , 0. , // P
3.55, 7.50, 0. , 0. , 0. , // Q
3.55, 7.50, 12.0 , 12.0 , 12.0 , // R
3.55, 6.93, 0. , 0. , 0. , // S
3.55, 6.82, 0. , 0. , 0. , // T
0.00, 0.00, 0. , 0. , 0. , // U
3.55, 7.44, 0. , 0. , 0. , // V
3.55, 7.50, 0. , 0. , 0. , // W
3.55, 7.50, 0. , 0. , 0. , // X
3.55, 7.50, 10.00, 10.00, 10.00 , // Y
3.55, 7.50, 0. , 0. , 0. }; // Z
#define PH_MIN 0 /* minimum pH value */
#define PH_MAX 14 /* maximum pH value */
#define MAXLOOP 2000 /* maximum number of iterations */
#define EPSI 0.0001 /* desired precision */
//
// Compute the amino-acid composition.
//
for (i = 0; i < sequenceLength; i++)
comp[sequence[i] - 'A']++;
//
// Look up N-terminal and C-terminal residue.
//
nTermResidue = sequence[0] - 'A';
cTermResidue = sequence[sequenceLength - 1] - 'A';
phMin = PH_MIN;
phMax = PH_MAX;
for (i = 0, charge = 1.0; i < MAXLOOP && (phMax - phMin) > EPSI; i++)
{
phMid = phMin + (phMax - phMin) / 2;
cter = exp10(-cPk[cTermResidue][0]) /
(exp10(-cPk[cTermResidue][0]) + exp10(-phMid));
nter = exp10(-phMid) /
(exp10(-cPk[nTermResidue][1]) + exp10(-phMid));
carg = comp[R] * exp10(-phMid) /
(exp10(-cPk[R][2]) + exp10(-phMid));
chis = comp[H] * exp10(-phMid) /
(exp10(-cPk[H][2]) + exp10(-phMid));
clys = comp[K] * exp10(-phMid) /
(exp10(-cPk[K][2]) + exp10(-phMid));
casp = comp[D] * exp10(-cPk[D][2]) /
(exp10(-cPk[D][2]) + exp10(-phMid));
cglu = comp[E] * exp10(-cPk[E][2]) /
(exp10(-cPk[E][2]) + exp10(-phMid));
ccys = comp[C] * exp10(-cPk[C][2]) /
(exp10(-cPk[C][2]) + exp10(-phMid));
ctyr = comp[Y] * exp10(-cPk[Y][2]) /
(exp10(-cPk[Y][2]) + exp10(-phMid));
charge = carg + clys + chis + nter -
(casp + cglu + ctyr + ccys + cter);
if (charge > 0.0)
phMin = phMid;
else
phMax = phMid;
}
}
More information about the Biojava-l
mailing list