[BioPython] Questions & suggestions

Michiel Jan Laurens de Hoon mdehoon at ims.u-tokyo.ac.jp
Mon Mar 22 22:51:19 EST 2004


Thomas:
> The xKMeans, KNN and KMeans clustering modules also seem to be obsolete in
> view of Michiel de Hoons clustering module.
> 
Michiel:
> The xKMeans and KMeans can be considered obsolete, as they are included in
> Bio.Cluster. The KNN and other modules under Bio/Tools/Classification are
> currently not obsolete, as they contain supervised learning methods, which
> are not included in Bio.Cluster.
Jeffrey Chang wrote:
> kMeans is superceded by Bio.Cluster, and can be deprecated.  Thomas wrote
> xkMeans, which is a visualizer for kMeans, and could be rewritten to use
> Bio.Cluster instead.
> 
Jeffrey Chang wrote:
> kNN, LogisticRegression, MaxEntropy, and NaiveBayes are still useful, but
> need more documentation.  Also, another idea is that they could be donated to
> the pyml project.  Currently, no code in Biopython depends on them.  However,
> they might be useful for a microarray package, in which case donating them
> would introduce another dependency. Okay. I guess this would involve a couple
> of steps:
Brad:
> 1. Starting to raise a Deprecation Warning for the kMeans module. 2. Trying
> to write some kind of short document on how to switch from using kMeans to
> using Bio.Cluster.kcluster. BioPerl has a document called DEPRECATED with
> this kind of info -- that seems like a reasonable step to follow. Jeff and
> Michiel, would it be possible to write something up quick. 3. Thomas needs to
> decide if he wants to rewrite xkMeans or deprecate it as well.

Michiel again:
1. OK.
2. OK I'll work on that.
3. If I understand correctly, the xkMeans module provides a visualization of the 
progress of the k-means clustering algorithm by showing the cluster sizes. If 
so, it would not be clear how to switch that to using the kcluster in 
Bio.Cluster. One of the key points in Bio.Cluster's kcluster is that it 
automatically repeats the k-means algorithm starting from different initial 
(random) clusterings. For the kMeans module, I assume it performs one run of the 
k-means algorithm, for which the visualization in xkMeans make sense. For 
repeated k-means runs, such a visualization may not be as useful.

--Michiel.


-- 
Michiel de Hoon, Assistant Professor
University of Tokyo, Institute of Medical Science
Human Genome Center
4-6-1 Shirokane-dai, Minato-ku
Tokyo 108-8639
Japan
http://bonsai.ims.u-tokyo.ac.jp/~mdehoon



More information about the BioPython mailing list