[BioPython] [PopGen] HapMap

Tiago Antão tiagoantao at gmail.com
Fri Nov 14 11:49:05 UTC 2008


On Thu, Nov 13, 2008 at 6:57 PM, Bruce Southey <bsouthey at gmail.com> wrote:
> One particular use of generating SNPs pertains to known genes or sequences.
>  In such cases to would be great to use a known sequence as a base for the
> simulation. Further, it would be very useful be able incorporate known SNP
> data especially frequencies from some source like Hapmap
> (http://www.hapmap.org/). A nice but harder problem is to do this based on a
> protein sequence since many diseases refer to amino acids.

Talking about hapmap, and in a different front I have some code
available to deal with HapMap. The problem is that, in order for it to
be useful (performance), it injects all the data in an SQL database.
That requires a schema for persistance, but I have been "ping-ponged"
regarding where people in Biopython say that they prefer things to be
on BioSQL and people on BioSQL say they don't care (and, this being
voluntary work I simply don't have the patience to fight the
bureaucracy).

> Perhaps my biggest 'disappointment' is the lack of ancestry control because
> I also interested in families or some admixture in a population. This just
> generates sequences randomly assuming you are randomly selecting individuals
> from a homogenous population. I do understand this usage so it is not that
> important to include this here.

You can use the Simcoal module to generate (coalescent based)
sequences. I don't know if that helps you. The only hurdle is that
simcoal churns data in the Arlequin Format and I still haven't got
round to finalize one (although I could increase the priority if there
is interest).



More information about the Biopython mailing list