[Biopython-dev] Biopython 1.00a2 release
Andrew Dalke
dalke at acm.org
Thu Jun 28 03:42:21 EDT 2001
Thomas Sicheritz-Ponten <thomas at cbs.dtu.dk>:
># this is slow
>def EucDist2(v1, v2):
> return sqrt(sum((v1-v2)**2))
>
># this is faster
>def EucDist1(v1, v2):
> sum = 0
> for i in range(0,len(v1)):
> sum += (v1[i] -v2[i])**2
> return sqrt(sum)
The first does more work than the second. It has to find the
v1-v2 uses a "__sub__" method call, which then does the same
as v1[i] - v2[i], except with the method call overhead. Ditto
with ** defining "__pow__". It also makes itermediate objects
for every call. (C++ use to have that problem. We worked on
a system with a lot of overloaded 3-vectors. Got a huge performance
boost turning the calls into 3-arg form. OTOH, the overloaded
vector form was much easier to write and debug. Nowadays C++
people use expression templates.)
The only thing I can suggest you change is to get rid of the "0, "
in the range call.
Out of curiosity, I tried
for a1, a2 in zip(v1, v2):
sum += (a1-a2) ** 2
The 'zip' version was about 3 times slower. Here's my test
harness.
def main():
for n in range(1, 6):
v1 = range(0, 10**n)
v2 = range(n, 10**n+n)
t1 = time.time()
d1 = EucDist1(v1, v2)
t2 = time.time()
d2 = EucDist3(v1, v2)
t3 = time.time()
assert d1 == d2, (d1, d2)
print n, t2-t1, t3-t2
Andrew
More information about the Biopython-dev
mailing list