[BioPython] Clustering gene expression data
Michiel Jan Laurens de Hoon
mdehoon@ims.u-tokyo.ac.jp
Tue, 23 Jul 2002 20:14:29 +0900
Dear bio-pythoneers,
We have developed a library of C routines that implement various
clustering methods commonly used to analyze gene expression data. These
include hierarchical (pairwise single-, maximum-, centroid-, and
average-linkage) clustering, k-means clustering, and self-organizing
maps on a 2D rectangular topology. The main routines are available as a
Python extension module, which we generated using Pyfort. Since the
numerically intensive part of the calculation is done in C, the speed of
a compiled language is combined with the flexibility of Python, creating
a better world for everybody.
The module is available at
http://bonsai.ims.u-tokyo.ac.jp/%7Emdehoon/software/software.html ,
where you can also find a manual. To install, download
Cluster-1.01.tar.gz from this site, unpack, and run python setup.py
install as usual. For users of Python for Windows, there is a Windows
installer Cluster-1.01.win32-py2.2.exe which will do the installation
for you and saves you the compilation.
This software was released under the GNU Lesser General Public License.
Bug reports and suggestions for improvement (better yet, patches) are
most welcome.
Note also that it would have been difficult to create a similar package
for Perl, as it lacks the equivalent of Numerical Python. The numerical
analysis of gene expression data is therefore an area where we as
bio-pythoneers can provide a functionality that others can't offer.
I will give a short overview on this software at the BOSC meeting next
week in Edmonton.
Michiel de Hoon
University of Tokyo, Human Genome Center