[BioPython] [PopGen] a random Haplotype Sets generator

Tiago Antão tiagoantao at gmail.com
Thu Nov 13 16:33:18 UTC 2008


On Thu, Nov 13, 2008 at 3:29 PM, Bruce Southey <bsouthey at gmail.com> wrote:
> While I have not looked at the code, my view is that must remain integrated
> into the PopGen module. I would expect that a user would some Biopython
> (PopGen) modules with some simulated SNPs. I would prefer that Biopython
> remains as much as possible a set of integrated tools rather than just a
> collection of tools. This is a clear example where if it is not totally
> integrated then I don't see the point in including it in Biopython.


There are several dimensions here, and I would like to sum up my ideas
on several things being floated around:

1. Support for tools with a small user base: I do think that the user
base size should not be a fundamental criteria. As long as tools are
maintained (which, I agree, might be a problem with some fringe
applications), this should not be a issue. A good example is fdist
support on PopGen: The user base seems to be increasing quite a lot
for the method because of code done on top of it Bio.PopGen.FDist
(something I was not expecting, to be honest).
2. Integration inside PopGen: Up to now, there has been an effort in
PopGen to have a coherent module where all parts interoperate. With
the exception of Simcoal output, all the rest works in a cohesive way,
you can take a genepop file, and feed it to fdist, for instance as the
module has provisions for interop (the same for the new LDNe code that
I have).
3. Integration with the rest of biopython. I do expect things to work
quite smoothly. Like SNP extraction from sequencies and feed in to
fdist, ldne and (future) statistics. I see issues with
microsatelllite/STR + RFLPs stuff, but that is because there might be
little provision in the rest of biopython for that type of markers.
4. New code and new developers. I think that an overly stringent
process will put new people off. I have no problems in accepting _non
crucial code_ that does _not impose big maintenance hurdles_, though
that code might be somewhat naive in the big picture (maybe this
particular example should actually go to the test base, BTW). The
truth is, an overly stringent process, while it might assure fantastic
code puts a gigantic barrier for new people. I am more in favor of a
learning process where less fundamental code can be accepted at the
beginning. I don't want to discourage new people, I think a balance
between quality and encouragement can be made.

> The second aspect is that it must have a very stable API, similarly to
> Michiel's comment is that changing APIs after a release is also a pain
> especially if the module has been around a long time. Based on your first
> post, I would argue that you are not quite at this stage yet.

Agree, especially with crucial functionality (but maybe not so much
with less crucial parts). That is why I have avoided comiting my
statistics code to bioopython (although it exists for quite a long
time - available on GIT): The API has to be future-resilient! In fact
I have a proposal to make in this front, but because I want to be sure
that the API is future proof in as much as possible, the proposal will
not be all-enconpassing for now (I still don't know how to have a
future proof API for multi-loci statistics like simple linkage
desiquilibrium or more modern things like EHH).

But yes, to be honest I think open-bio projects err on the excessive
bureaucratic side and discourage new people.



More information about the Biopython mailing list