[BioPython] calculate F-Statistics from SNP data

Tiago Antão tiagoantao at gmail.com
Fri Oct 17 14:07:18 EDT 2008


Hi,

On Fri, Oct 17, 2008 at 10:39 AM, Giovanni Marco Dall'Olio
<dalloliogm at gmail.com> wrote:
> Let's say I try to write a parser for these two file formats. In which
> biopython object should I save them? Is there any kind of 'Individual' or
> 'Population' object in biopython?
> I see from the cookbook that Bio.GenPop.Record is representanting
> populations and individual as list[3], and that there is not a 'Population'
> or 'Individual' object.

No, there are no concepts of individuals or populations for now.
Bio.PopGen.GenePop is just a representation of a GenePop file (which
is a de facto standard in frequency based population genetics).
Currently Bio.PopGen philosophy is more of a wrapper for existing
software (e.g., I don't implement a coalescent simulator, like in
BioPerl, I wrap Simcoal2). The disadvantage is that it is not "Pure
Python" and is dependent on external applications. The advantage is
that, if the external application is good, than good functionality
becomes available inside Biopython. For example, coalescent simulation
in BioPerl is (at least last time I've checked it) orders of magnitude
less flexible than BioPython's (based on SimCoal2).
In this philosophy, I now have a (partial) wrapper for the GenePop
application to calculate statistics (voila, Fst).
That doesn't mean that core statistics functionality should not be
available in Bio.PopGen. I think it should be (that is why I have
quite done work on that - implementing from scratch Fst, allelic
richness, expected heterosigosity, ...). The same goes to the concept
of Population and Individual.
For a number of cumulative reasons, the work on that front is stalled.
But, if there is some interest, I would more than welcome reopening
that front...


> Moreover, python 2.6 will implement a new kind of data object, called 'named
> tuple' [4], to implement these kind of records. It could be a good
> compromise (maybe I'll better start a new thread about this and explain
> better).

I think the ad-hoc policy in Biopython is to support previous versions
of Python, so I don't think it will be easy to do things in a 2.6 only
way (although, for NEW functionality, from my part, I don't see a
problem with it).

Tiago


More information about the BioPython mailing list