[Bioperl-l] Problem with clustering analysis of a large dataset ( 18000 rows and 17 columns)

Aaron J. Mackey amackey at pcbi.upenn.edu
Fri May 21 07:19:31 EDT 2004


I'm guessing your machine is swapping memory with the HD, and thus not 
actually spending much time at all computing, just thrashing.  Your 
computation will never finish, unless you get more memory, or use a 
more space-efficient algorithm (e.g. the R statistical programming 
package has many hierarchical clustering algorithms for you to play 
with).

-Aaron

On May 21, 2004, at 1:37 AM, Gong Wuming wrote:

> Hi list:
> I use hierarchical clustering algorithm (average linkage algorithm) to 
> do a
> clustering analysis of a large dataset (around 18000 rows and 17 
> columns) by perl Algorithm::Cluster module. My computer is Pentium 4 
> 2.4G with 512M memory. However, the computation lasted about 36 hours 
> and still do not complete yet. After the burst usage of CPU and memory 
> at the begining of the analysis (less than half an hour), the CPU 
> usage keeps at 0.1% and the memory usage around 80%. (The analysis 
> could complete in a few minutes while the dataset is 10000 X 20). Is 
> this case normal and how long could the computation be accomplished?
>
> Sincerely
>
> Wuming Gong
> College of Life Science, Wuhan University, China
>
> _________________________________________________________________
> 与联机的朋友进行交流,请使用 MSN Messenger:  http://messenger.msn.com/cn
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
--
Aaron J. Mackey, Ph.D.
Dept. of Biology, Goddard 212
University of Pennsylvania       email:  amackey at pcbi.upenn.edu
415 S. University Avenue         office: 215-898-1205
Philadelphia, PA  19104-6017     fax:    215-746-6697




More information about the Bioperl-l mailing list