[Bioperl-l] Problem with clustering analysis of a large dataset ( 18000 rows and 17 columns)

Andreas Kahari ak at ebi.ac.uk
Fri May 21 08:02:01 EDT 2004


As Aaron said, it's probably just waiting for disk.

I'm not sure this helps you, but you could buy more memory or
investigate other clustering approaches.

MCL is fast:

    http://www.micans.org/mcl/

There is a package of it for Debian, and I've made a port of it
to OpenBSD.  Other unices should be able to compile it fairly
easily out of the box.  I've run larger sets than yours much
quicker with the same memory size.

Cheers,
Andreas


On Fri, May 21, 2004 at 05:37:02AM +0000, Gong Wuming wrote:
> Hi list:
> I use hierarchical clustering algorithm (average linkage algorithm) to do a 
> 
> clustering analysis of a large dataset (around 18000 rows and 17 columns) 
> by 
> perl Algorithm::Cluster module. My computer is Pentium 4 2.4G with 512M 
> memory. However, the computation lasted about 36 hours and still do not 
> complete yet. After the burst usage of CPU and memory at the begining of 
> the 
> analysis (less than half an hour), the CPU usage keeps at 0.1% and the 
> memory 
> usage around 80%. (The analysis could complete in a few minutes while the 
> dataset is 10000 X 20). 
> Is this case normal and how long could the computation be accomplished?
> 
> Sincerely
> 
> Wuming Gong
> College of Life Science, Wuhan University, China
> 
> _________________________________________________________________
> ???????????????????????????? MSN Messenger:  http://messenger.msn.com/cn  
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l

-- 
|{  }| Andreas Kähäri      EMBL, European Bioinformatics Institute
| }{ |                     Wellcome Trust Genome Campus
|{  }| Ensembl Developer   Hinxton, Cambridgeshire, CB10 1SD
| }{ | DAS Project Leader  United Kingdom


More information about the Bioperl-l mailing list