[Bioperl-l] Protein Isoelectric Point calculation

Aaron J Mackey Aaron J. Mackey" <amackey@virginia.edu
Thu, 12 Sep 2002 08:47:23 -0400 (EDT)


On Wed, 11 Sep 2002, Southern, Mark R wrote:

> Firstly; my feeling is that a C implementation is definitely the way
> to go.

Right; to calculate iep's on a database of approximately 25,000 proteins
(pentium III 1 GHz, mucho RAM) :

pICalcutor.pm, 2 places        :  59 seconds
pICalcutor.pm, 4 places        :  67 seconds

pIcalc.c (#define EPSI = 1e-2) :  27 seconds
pIcalc.c (#define EPSI = 1e-4) :  27 seconds

[no calc, just the SeqIO loop] :  24 seconds

[ btw, since we're so into timing these days, I'll note here that a simple
FASTA parser not using SeqIO runs in less than a second; perhaps our
parsers' bottlenecks lie not in the parsers themselves, but in all the
many layers of object creation, factories and inheritance trees? If that's
so, then it's a waste of time IMO to optimize the parsing code itself.]

So it's about 10-15 times faster; the usual prediction for C vs. Perl
code.

> Secondly; what functionality should we provide?

I like the methods (and function) you propose; one addition I'd have is
that the N- and C-terminal pKa's can differ, depending on the residue (see
the expasy pI calc code); so we should figure out some representation of
the data that can allow this as an option ...

> Thirdly; caveat programmer. I dug the expasy code out of the mailing list
> archive and have had a terrible time getting it to run on irix or tru64. The
> exp10 function could not be found in the standard math library so i
> converted from exp10(x) to pow(10.0,x)

exp(x*log(10)) would also have done the trick (although that may be
exactly how pow() works); and defining a const for log(10) would make
that even better; I'll probably make that change to pIcalc.c so that it's
better portable.

> I then had a problem that i traced to the following lines in the
> pIcalc routine;

Yikes!  Confirmed (although I can't understand why Insure doesn't pick it
up ...)

-Aaron

P.S. Tit-for-tat ;), you need something like this patch to make
pICalculator.pm run without spurious warnings:

--- pICalculator.pm~    Thu Sep 12 07:02:22 2002
+++ pICalculator.pm     Thu Sep 12 08:18:19 2002
@@ -86,6 +86,7 @@
     my $sequence = $seq->seq;
     my $count;
     for ( qw( K R H D E C Y ) ){ # charged AA's
+        $count->{$_} = 0;
         $count->{$_}++ while $sequence =~ /$_/ig;
     }
     return $count;


-- 
 Aaron J Mackey
 Pearson Laboratory
 University of Virginia
 (434) 924-2821
 amackey@virginia.edu