[Biopython] SciPy paper: documenting statistical data structure design issues
Michiel de Hoon
mjldehoon at yahoo.com
Tue May 25 01:17:06 UTC 2010
Hi Vincent,
Thanks for letting us know. Statistics is central to many problems in computational biology, so this is important for us. What is the preferred way to contribute to this discussion? Should we join a mailing list or can we write something on a wiki?
Thanks,
--Michiel.
--- On Mon, 5/24/10, Vincent Davis <vincent at vincentdavis.net> wrote:
> From: Vincent Davis <vincent at vincentdavis.net>
> Subject: [Biopython] SciPy paper: documenting statistical data structure design issues
> To: "biopython" <biopython at lists.open-bio.org>
> Date: Monday, May 24, 2010, 3:45 PM
> "see the message below, cross posted
> from pystatsmodels"
>
> We have ben having some discussion on the pystatsmodels
> maling list about
> data objects, numpy arrays... I think it would be valuable
> for some
> biopython users to contribute some comments, examples or
> ideas to the scipy
> wiki that has been setup for this. I think at the heart of
> this is that
> although almost anything can be done with a numpy array we
> run into many
> problems that are difficult to solve with the current tools
> for numpy
> arrays. Because of this I think some nice examples of the
> data design
> problems that you have faced in the biopython and how they
> have been solved
> would be valuable.
>
> Thanks
> Vincent
>
> On Sat, May 22, 2010 at 7:22 PM, Wes McKinney <wesmckinn at gmail.com>
> wrote:
>
> > For my SciPy talk and paper in a little over a month,
> I was hoping to
> > render a somewhat coherent discussion of the design
> needs of
> > statistical data structures, based on my experience
> developing pandas
> > for quant finance research. I think these broadly fall
> into a few
> > categories: implementation ease, usability (for the
> non-developer
> > IPython-based console user), performance, and
> flexibility. Hopefully
> > this will be useful information that will help guide
> future
> > development efforts. What do you folks think?
> >
> > As part of this, I was thinking maybe we should start
> a wiki page (or
> > pages) somewhere to start listing out the various
> design issues (big
> > and small) where people can write their opinions and
> we can have a
> > structured discussion (e-mail is a bit hard for this
> sort of thing).
> > I'd also like to spend some time reading through other
> people's code
> > (e.g. all of the larry code) and writing down what I
> think about their
> > design choices in a constructive way.
> >
> > Part of what prompted my idea for a wiki was reading
> some of the larry
> > code and wanting to share my thoughts on various parts
> of it. Of
> > course I'm also prepared for other people to attack
> (and for me to
> > have to defend) my own code. For most of these things
> there isn't a
> > "right" and "wrong" and I am only interested in having
> constructive
> > discussions and hearing people's perspectives. Here's
> an example: in
> > pandas when adding two different-labeled 2d arrays,
> the result has the
> > *union* of all the labels. In la you get the
> intersection. Certainly
> > are pros and cons for either approach (in my case I
> don't want to lose
> > information, even if it's nulled out).
> >
> > We should also have a place where we document
> differences in
> > performance for various operations. I spent a lot of
> time even before
> > pandas was open-source obsessing over speed-- I'd like
> to think I
> > learned a few things but I was operating in a bubble
> so I might have
> > missed really obvious speedups. I also learned lots of
> odd things
> > about NumPy (did you know fancy indexing is a LOT
> slower than
> > ndarray.take?). We should probably establish some
> apples-to-apples
> > performance benchmarks to help people decide what to
> use for their
> > applications if speed matters.
> >
> > Best,
> > Wes
>
> *Vincent Davis
> 720-301-3003 *
> vincent at vincentdavis.net
> my blog <http://vincentdavis.net> |
> LinkedIn<http://www.linkedin.com/in/vincentdavis>
> _______________________________________________
> Biopython mailing list - Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>
More information about the Biopython
mailing list