[BioPython] Bio.distance

Peter biopython at maubp.freeserve.co.uk
Wed Oct 1 16:03:22 UTC 2008


On Wed, Oct 1, 2008 at 4:49 PM, Bruce Southey <bsouthey at gmail.com> wrote:
>
> Hi,
> Under the 'standard' install I do not think that there is any advantage of
> using Bio.cdistance within Bio.kNN. I tested this on a bioinformatics data
> set with almost 1500 data points, 8 explanatory variables and k=9. ...
> Actual maximum times across three runs were under 16.6 seconds with
> it [Bio.cdistance] and under 17.4 seconds without it [Bio.distance using
> Numeric]

Its interesting that the C version is only slightly faster than
Numeric - of course as you point out there are lots of possible
complications here like lapack and atlas (plus compiler options and
CPU features).

I think your numbers are good support for Michiel's proposition that
we should deprecate Bio.cdistance and Bio.distance and just use numpy
in Bio.kNN - this will simplify our code base and make very little
difference to the speed.

Peter



More information about the Biopython mailing list