[BioPython] calculate F-Statistics from SNP data

Giovanni Marco Dall'Olio dalloliogm at gmail.com
Wed Oct 22 17:12:24 UTC 2008


On Wed, Oct 22, 2008 at 5:52 PM, Tiago Antão <tiagoantao at gmail.com> wrote:

> Hi,
>
> [Back in office now]
>
> > Ok, I have uploaded the code to:
> > - http://github.com/dalloliogm/biopython---popgen
> >
> > I put the code I wrote before writing in this mailing list in the folder
> > PopGen/Gio
>
> Thanks I will have a look and get acquainted with GIT.
>

It' s the first time I am using github for something serious, too.
Please tell me if you need me to add you as a 'collaborator' in the project
or something like this.
I am using eclipse with a plugin for git (http://www.jgit.org/update-site)
and it works very well.
I think there is a plugin for vim, too.
Sorry, today I couldn't do too much - I spent most of the day in seminars
and meetings :(.


>
>
> > Yes, I agree. It was just a first try. We should collect some good
> > use-cases.
>
>
> In my head I divide statistics in the following dimensions:
> 1. genetic versus genomic (e.g. Fst is single locus, LD can be seen as
> requiring more than 1 locus, therefore is "genomic")
> 2. frequency based versus marker based (some statistics require
> frequencies only - ie, you can calculate them irrespective of the type
> of marker - This is the case of Fst. Others are marker dependent, say
> Tajima D requires sequences and can only be used with sequences)
> 3. population structure versus no pop structure. Some stats require
> population structure (again, Fst), others don't (e.g., allelic
> richness)
>
> From my point of view, a long-term solution needs to take into account
> these dimensions (and others that I might be forgetting).
>
> One can think in a solution based on Populations and Individuals as
> fundamental objects (as opposed to statistics), but, from my
> experience it is very difficult to define what is an "individual"
> (i.e., what kind of information you need to store - I can expand on
> this). It is easier to think in terms of statistics.
>
> One fundamental point is that we don't have many opportunities to make
> it right: if we define an architecture which proves in the future to
> be not sufficient, then  we will have to both maintain the old legacy
> (because there will be users around whose code cannot be constantly
> broken when a new version is made available) while hack the new
> features in.
>

ok... but we can try :).
We could use the github's wiki to better organize these ideas.
I will answer to you better tomorrow (or tonight).
Now, I need a bit of fresh air! :)

-- 
-----------------------------------------------------------

My Blog on Bioinformatics (italian): http://bioinfoblog.it




More information about the Biopython mailing list