[Biopython-dev] Statistics code

Michiel de Hoon mjldehoon at yahoo.com
Thu Apr 3 09:49:45 UTC 2008

> I already need it now, but just for a very small thing: The chi-square
> test. It is quite easy to reimplement. If it ends up by being just
> chisquare (which I doubt, but I might be able to externalize to the
> user the conventional stats part), then I think the best thing would
> be just to reimplement and not to force the dependency. But I think
> that I will need to use more stats stuff as I implement functionality.
One solution is just to copy and paste whatever statistics code you need from S

> I think that NumPy has only basic stuff (standard deviation, mean). I
> might be wrong, but my research points to that.
The ideal solution would be to move the statistics stuff from SciPy to NumPy, or to expand the statistics stuff currently in NumPy. Since SciPy and NumPy come from the same group of developers, they may not mind too much. Having  a statistics library in NumPy would be a big encouragement to move from Numeric to NumPy.

> 3. The dependency (in case it appears) would be of zero impact outside
> of Bio.PopGen.Stats (maybe just setup.py to optionally allow using
> scipy)
In practice, when I make the Biopython releases it's special situations like these that cause trouble. For example, if I don't install SciPy on Windows, I can't test Bio.PopGen.Stats there, and errors will go unnoticed. This has happened in the previous Biopython releases.

> 4. I need to know "the rules of the game" before I write more code (in
> order to know what I can or cannot use, in case I need to use).
 I would strongly encourage not to add any new dependencies to Biopython. We have too many already; I was actually hoping that the number of dependencies could be reduced.


You rock. That's why Blockbuster's offering you one month of Blockbuster Total Access, No Cost.

More information about the Biopython-dev mailing list