[Biopython-dev] PopGen code

Tiago Antão tiagoantao at gmail.com
Fri Jan 12 14:16:53 UTC 2007


Hi,

Thanks for the answer.

> I suspect BioPython currently has no active developers who feel
> qualified to interpret your population genetics code.  I was hoping that
> you and Ralph Haygood would combine forces - if you are both happy with
> some code that does bode well.  Any comments Michiel?

I think Ralph (who subscribes to this list, and thus can comment) has
strong time constraints, and will probably have little available time
in the near future...

> Regarding population genetic file formats - from a very quick search
> about Arlequin it sounds like this file format can hold lots of
> different types of data.  I would encourage you to try and come up with
> a generic population record data object that could hold this or
> information from GenePop or Fdist as well.  I have no idea how easy this
> would be...

I have been thinking a lot about a generic data structure to hold
population genomic (ie not only genetic) data. I have, in fact,
implemented (in CAML, not Python) quite a few different data
representations. I was not happy with none of them. Different kinds of
markers (that sometimes overlap - eg sequences and SNPs), linkage
disequilibrium (thus relations between markers...), ploidy (no need to
think on different organisms, think mitochondria, nuclear chromosomes,
Y chromosome), ... make a general solution not trivial.
As I see it, there are a few options:
1. Have a grand, unified structure, but that will take time to mature
2. Assume that there will be different representations for different
scopes, assume that that is a bad thing and live with that
3. Assume that there will be different representations, and that that
is good, in the sense that a one size, fits all approach in this case
has lots of problems

I think the pragmatic approach for now is not to have a generic
representation. I would lean more to let things mature (develop
statistics, parsers, ...) and after there is more experience (and,
hopefully, user feedback) then reassess the issue of a general
representation. I am aware that this will entail each part of code
having a different calling data structure, but I think that with care
and common sense that won't be very problematic.

I don't mind having the code on an alpha branch for as long as you see
fit, I just want to be sure that whatever effort I put in converting
(or creating new) my code to BioPython is not lost, that is why I
would like feedback on what will happen to the code that I am
submitting. I am willing to accommodate any reasonable requirements
regarding code quality and development process...

Regards,
Tiago

-- 
Good judgment comes from experience.
Experience comes from bad judgment.
- Unknown author



More information about the Biopython-dev mailing list