[Biopython-dev] Biopython 1.00a2 release

Andrew Dalke dalke at acm.org
Thu Jun 28 03:42:21 EDT 2001

Thomas Sicheritz-Ponten <thomas at cbs.dtu.dk>:
># this is slow
>def EucDist2(v1, v2):
>    return sqrt(sum((v1-v2)**2))
># this is faster
>def EucDist1(v1, v2):
>    sum = 0
>    for i in range(0,len(v1)):
>        sum += (v1[i] -v2[i])**2
>    return sqrt(sum)

The first does more work than the second.  It has to find the
v1-v2 uses a "__sub__" method call, which then does the same
as v1[i] - v2[i], except with the method call overhead.  Ditto
with ** defining "__pow__".  It also makes itermediate objects
for every call.  (C++ use to have that problem.  We worked on
a system with a lot of overloaded 3-vectors.  Got a huge performance
boost turning the calls into 3-arg form.  OTOH, the overloaded
vector form was much easier to write and debug.  Nowadays C++
people use expression templates.)

The only thing I can suggest you change is to get rid of the "0, "
in the range call.

Out of curiosity, I tried

   for a1, a2 in zip(v1, v2):
     sum += (a1-a2) ** 2

The 'zip' version was about 3 times slower.  Here's my test

def main():
   for n in range(1, 6):
      v1 = range(0, 10**n)
      v2 = range(n, 10**n+n)
      t1 = time.time()
      d1 = EucDist1(v1, v2)
      t2 = time.time()
      d2 = EucDist3(v1, v2)
      t3 = time.time()

      assert d1 == d2, (d1, d2)
      print n, t2-t1, t3-t2


More information about the Biopython-dev mailing list