[BioPython] Spatial clustering

Michiel Jan Laurens de Hoon mdehoon at ims.u-tokyo.ac.jp
Sat Oct 11 04:28:32 EDT 2003


An easy solution would be to cluster the molecules using the 
treecluster routine in PyCluster (== Bio.Cluster in Biopython). This 
routine implements pairwise single-, maximum-, average-, and 
centroid-linkage hierarchical clustering. You will need to calculate the 
distances between the molecules and pass these distances as the 
"distancematrix" argument to the treecluster routine. This lets you 
define the distance measure as suitable for your problem. Unfortunately 
I am not familiar with density-based algorithm, but if you use a 
sensible definition for the distance between molecules then hierarchical 
clustering should give you a sensible clustering result.

You can also use the k-means routine in PyCluster, but that one won't 
let you specify the distance matrix yourself. This means that you can 
only use the distance measures built in to the k-means routine (e.g. the 
Euclidean distance), which may or may not be suitable for your problem. 
In my experience, choosing the right distance measure is often more 
important than the clustering algorithm, so I would go with hierarchical 
clustering if the distance measures in k-means clustering are not 
suitable for your task.

--Michiel.

Shu-Hsien Sheu wrote:
> Hi,
> 
> I am now working on a mapping protein binding site project which would 
> generate thousands of small organic molecules in cartesian coordinates. 
> Next step would be to cluster these small molecules. Is there any 
> modules available for this kind of task? PyCluster seems to work with 2D 
> gene expression data only, thoug through some modifications I can use it 
> as well. I am thinking of using RMSD matrix and then a density-based 
> algorithym. The following paper gave me some general ideas about the 
> approaches I can take:
> 
> http://prlab.ee.memphis.edu/frigui/CLUSTER_PAPERS/Ericasurvey.pdf
> 
> Any comments here?
> 
> thanks!
> 
> -shuhsien
> 
> 
> 
> _______________________________________________
> BioPython mailing list  -  BioPython at biopython.org
> http://biopython.org/mailman/listinfo/biopython
> 
> 

-- 
Michiel de Hoon, Assistant Professor
University of Tokyo, Institute of Medical Science
Human Genome Center
4-6-1 Shirokane-dai, Minato-ku
Tokyo 108-8639
Japan
http://bonsai.ims.u-tokyo.ac.jp/~mdehoon



More information about the BioPython mailing list