[BioPython] [Biopython-dev] Statistics in population genetics module - Part I
Bruce Southey
bsouthey at gmail.com
Mon Nov 3 20:50:14 UTC 2008
Giovanni Marco Dall'Olio wrote:
> On Fri, Oct 31, 2008 at 12:58 AM, Tiago Antão <tiagoantao at gmail.com> wrote:
>
>
>> Hi,
>>
>> Statistics is the most important part of population genetics modules.
>> In fact one could say that statistics where invented FOR population
>> genetics (check http://en.wikipedia.org/wiki/Ronald_Fisher ).
>> When I started to work on the population genetics module I decided to
>> delay the statistics module a bit, in order to get experience with the
>> whole biopython project before committing to do the most important
>> thing.
>> Irrespective of it is possible or not to link scipy or not, now seems
>> to be the time to advance, especially considering that Giovanni is
>> interested in participating.
>> A few of points need to be said before suggesting on how to put
>> statistics in Bio.PopGen
>>
>> 1. Whatever design is put in, it should be reasonably future proof: in
>> a few releases it should not be a good idea to break older code. That
>> should be avoided in as much as possible.
>>
>
>
> For how much time do you think a biopython module should be kept compatible
> with older versions, more or less?
> It will take a long time to develop the module, and it is sure that we will
> make some mistakes. So, what is the best way to proceed? What if we create a
> separated biopython branch where we can test all the new features?
> At the moment I am working with a separated git repository for all the
> popgen modules. The problem is that I didn't include all biopython modules
> in the repository, so, if any of my changes breaks something in biopython, I
> won't know it until I'll merge everything with biopython code.
> On the other side, if I include a biopython release in my popgen repository,
> I won't be able to track changes made in biopython, and my popgen code will
> be compatible with that version only.
> I think git provides some options to handle this kind of situations... I am
> not very used to cvs, so I don't know.
>
If you have modified a Biopython module you probably see if it is
acceptable to change the main Biopython distribution especially if it
involves an API change or modify your code because I do not think it is
good idea to have different versions of the same Biopython module or any
name clashes with Biopython. Otherwise, you just need to check that it
runs with a very recent version of Biopython (and under the Biopython
supported Python versions).
If you have not done so, I would suggest developing unit tests that not
only ensure code accuracy but also maintain future compatibility. A
failed test will indicate some problem that needs resolving and the
solution will mean that the code will be made compatible if necessary.
> p.s. When python3000 will be released, it will be probably necessary to
> rewrite large portions of biopython, if not creating a 'biopython 2' version
> (I think they were discussing something like this in bioperl's list).
> I thought that maybe, even if we make some 'mistakes' in this version of
> biopython, we will be able to fix them in a later version.
>
Python 3 can not be discussed until all incompatible modules like numpy
or Biopython can be used under Python 3 (rc1 is available). Further, the
advice from above (see Guido's blog
http://www.artima.com/weblogs/viewpost.jsp?thread=227041) is that the
conversion should be a direct port without any changes especially API
ones. So correcting any major 'mistakes' in the existing module probably
will not be acceptable to the community. Further any correction at any
time to the main distribution is not trivial especially as you must
first get the users informed (I saw that with changing histogram in numpy).
There is a lot of flexibility in a separate project that you will lose
when a project is widely released or included in an well established
project like Biopython. I think that you should maintain a separate
project of some type until everything is sufficiently acceptable to the
Biopython community. This gives sufficient time to address various
concerns and enables an easy integration.
Finally, if you require additional dependencies than those currently
required by Biopython (especially something like scipy) then I think it
will be very hard or impossible for you to get any code associated with
these dependencies into Biopython.
Just my opinions on your questions,
Bruce
More information about the Biopython
mailing list