[BioPython] Clustering gene expression data

Michiel Jan Laurens de Hoon mdehoon@ims.u-tokyo.ac.jp
Tue, 23 Jul 2002 20:14:29 +0900


Dear bio-pythoneers,

We have developed a library of C routines that implement various 
clustering methods commonly used to analyze gene expression data. These 
include hierarchical (pairwise single-, maximum-, centroid-, and 
average-linkage) clustering, k-means clustering, and self-organizing 
maps on a 2D rectangular topology. The main routines are available as a 
Python extension module, which we generated using Pyfort. Since the 
numerically intensive part of the calculation is done in C, the speed of 
a compiled language is combined with the flexibility of Python, creating 
a better world for everybody.

The module is available at 
http://bonsai.ims.u-tokyo.ac.jp/%7Emdehoon/software/software.html , 
where you can also find a manual. To install, download 
Cluster-1.01.tar.gz from this site, unpack, and run python setup.py 
install as usual. For users of Python for Windows, there is a Windows 
installer Cluster-1.01.win32-py2.2.exe which will do the installation 
for you and saves you the compilation.

This software was released under the GNU Lesser General Public License. 
Bug reports and suggestions for improvement (better yet, patches) are 
most welcome.

Note also that it would have been difficult to create a similar package 
for Perl, as it lacks the equivalent of Numerical Python. The numerical 
analysis of gene expression data is therefore an area where we as 
bio-pythoneers can provide a functionality that others can't offer.

I will give a short overview on this software at the BOSC meeting next 
week in Edmonton.


Michiel de Hoon
University of Tokyo, Human Genome Center