[Bioperl-l] Problem with clustering analysis of a large dataset ( 18000 rows and 17 columns)

Andreas Kahari ak at ebi.ac.uk
Sat May 22 03:59:19 EDT 2004


On Sat, May 22, 2004 at 06:32:52AM +0000, Gong Wuming wrote:
> >From: Andreas Kahari <ak at ebi.ac.uk>
> >
> >As Aaron said, it's probably just waiting for disk.
> >
> >I'm not sure this helps you, but you could buy more memory or
> >investigate other clustering approaches.
> >
> >MCL is fast:
> >
> >    http://www.micans.org/mcl/
> >
> >There is a package of it for Debian, and I've made a port of it
> >to OpenBSD.  Other unices should be able to compile it fairly
> >easily out of the box.  I've run larger sets than yours much
> >quicker with the same memory size.
>
> Hi.
> Thanks for your reply.
> I have done the same job under the MS Windows 2000 (using Eisen's original 
> cluster software for Windows) with the same computer (Pentium 4 2.4G with 
> 512M memory) and the same method (avarage linkage clustering algorithm), 
> and the job was done in about 30 minutes, while the job failed under RedHat 
> 9.0. Is the difference caused by the different OS or some other reasons? 
> I have another question about MCL. Could this method be used for gene 
> expression dataset not only for clustering proteins ?

MCL is a generic graph clustering algorithm which I believe
do not care very much about what the nodes and vertices it is
clustering represent.  As long as you can express your problem
as entities with distances between them, you can probably use
MCL.

This is no longer a BioPerl question, and I propose that
further questions should go to the mcl-devel mailing list (see
"http://micans.org/mcl/index.html#ml").


Regards,
Andreas

-- 
|()()| Andreas Kähäri      EMBL, European Bioinformatics Institute
| () |                     Wellcome Trust Genome Campus
|()()| Ensembl Developer   Hinxton, Cambridgeshire, CB10 1SD
| () | DAS Project Leader  United Kingdom


More information about the Bioperl-l mailing list