[BioPython] Bio.distance
Bruce Southey
bsouthey at gmail.com
Wed Oct 1 11:49:53 EDT 2008
Michiel de Hoon wrote:
> Hi everybody,
>
> Since the 1.48 release, Biopython has been making good progress in the migration from Numerical Python to NumPy. As part of this process, we are now reviewing and consolidating the code in Biopython that makes use of Numerical Python / NumPy. Specifically, we are thinking to merge the code in Bio.distance into Bio.kNN, and to deprecate Bio.distance and Bio.cdistance. Since Bio.kNN is the only Biopython module in Biopython that makes use of Bio.distance, we think that this won't affect anybody. However, if you are using Bio.distance outside of Bio.kNN, please let us know so we can find an alternative solution.
>
> --Michiel.
>
>
>
> _______________________________________________
> BioPython mailing list - BioPython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>
>
Hi,
Under the 'standard' install I do not think that there is any advantage
of using Bio.cdistance within Bio.kNN. I tested this on a bioinformatics
data set with almost 1500 data points, 8 explanatory variables and k=9.
I only got a one second difference between using Bio.cdistance or
commenting it out on my system (after removing the build directory and
reinstalling everything). Actual maximum times across three runs were
under 16.6 seconds with it and under 17.4 seconds without it.
My system runs linux x86_64 (fedora 10) but it is not a 'clean' system
due to other cpu intensive processes running. I used Python 2.5 and
Numeric 2.4 as I forgot the order of imports. In my version the default
distance without Bio.cdistance uses the Numeric dot (I did not try the
python version) so I would expect this to be noticeably faster if lapack
or atlas are installed than if these are not present. (I used Fedora
supplied Numeric so while I think this timing is without lapack and
atlas I am not completely sure of that.)
I did not see an examples for k-nearest neighbor so below is (very bad)
code using the logistic regression example
(http://biopython.org/DIST/docs/cookbook/LogisticRegression.html).
Regards
Bruce
from Bio import kNN
xs = [[-53, -200.78], [117, -267.14], [57, -163.47], [16, -190.30], [11,
-220.94], [85, -193.94], [16, -182.71], [15, -180.41], [-26, -181.73],
[58, -259.87], [126, -414.53], [191, -249.57], [113, -265.28], [145,
-312.99], [154, -213.83], [147, -380.85], [93, -291.13]]
ys = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0]
model = kNN.train(xs, ys, 3)
ccr=0
tobs=0
for px, py in zip(xs, ys):
cp=kNN.classify(model, px)
tobs +=1
if cp==py:
ccr +=1
print tobs, ccr
More information about the BioPython
mailing list