[BioPython] calculate F-Statistics from SNP data
Giovanni Marco Dall'Olio
dalloliogm at gmail.com
Fri Oct 17 05:39:41 EDT 2008
On Thu, Oct 16, 2008 at 12:23 PM, Peter <biopython at maubp.freeserve.co.uk>wrote:
> On Thu, Oct 16, 2008 at 11:02 AM, Giovanni Marco Dall'Olio
> <dalloliogm at gmail.com> wrote:
> > Hi,
> > I was going to write a python program to calculate Fst statistics from a
> > sample of SNP data. Is there any module already available to do that
> > in biopython, that I am missing? I saw there is a 'PopGen' module, but
> > the Cookbook says it doesn't support sequence data.
> > Is someone actually writing any module in python to calculate such
> > statistics?
> I think this will be a question for Tiago (the Bio.PopGen author),
> although others on the list may have also tackled similar questions.
> In terms of reading in the SNP data, what file format will you be
> loading? Does Bio.SeqIO currently suffice?
thank you very much all of you for the replies.
Actually I am going to use tped and tfam files as input, formatted
with the plink program.
Bio.SeqIO doesn't support these format, but this is right because they don't
cointain only sequences but rather elements like Tiago was saying.
Let's say I try to write a parser for these two file formats. In which
biopython object should I save them? Is there any kind of 'Individual' or
'Population' object in biopython?
I see from the cookbook that Bio.GenPop.Record is representanting
populations and individual as list, and that there is not a 'Population'
or 'Individual' object.
I think that it is a good approach, because these kind of files tend to be
very big and instantiating an Individual object instead of a tuple for every
line of the file would be take much memory.
But are you going to implement some kind of 'Individual' or 'Population'
Moreover, python 2.6 will implement a new kind of data object, called 'named
tuple' , to implement these kind of records. It could be a good
compromise (maybe I'll better start a new thread about this and explain
 tped, tfam: http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml#tr
 plink: http://pngu.mgh.harvard.edu/~purcell/plink/index.shtml
 biopython cookbook, popgen:
 named tuples in python 2.6: http://code.activestate.com/recipes/500261/
> Have you looked into what (if any) additional python libraries you
> would need? For any Biopython addition, a dependency on just numpy
> that would be preferable, but Tiago has previously suggested an
> optional dependency on scipy for additional statistics needed in
> population genetics.
My Blog on Bioinformatics (italian): http://bioinfoblog.it
More information about the BioPython