[Bioperl-l] Bio::PopGen modules performance

Fri Nov 4 14:18:05 EST 2005

Hi all,

I used Bio::PopGen modules to calculate various statistics such as  
Tajima's D, Pi and so on. For single data, the performance is fine.  
But to get a sense of significance, I simulated the data using  
Hudson's "ms" program to generate 10000 simulated populations. When I  
used Bio::PopGen modules on the 10000 samples, it takes long time  
(finished 600 samples in about 10 hours, population size about 200,  
segregating size about 500). If I have a set of data, say 100, for  
each data I need 10000 simulated populations, I do not think it is  
doable. I am wondering if it makes sense for these modules or I can  
increase the performance by optimization of my code. I think 10000  
simulations are typical for population genetics analysis. Does any  
body have experiences with this issue and can anyone give me any  
suggestions about the performance?

Thanks a lot!

--bs