[BioPython] [PopGen] a random Haplotype Sets generator

Giovanni Marco Dall'Olio dalloliogm at gmail.com
Thu Nov 13 16:07:51 UTC 2008


On Thu, Nov 13, 2008 at 4:29 PM, Bruce Southey <bsouthey at gmail.com> wrote:
> Tiago Antão wrote:
>>>
>>> This is right: which word can I use, then?
>>> HaplotypesSampler? RandomHaplotypesSpawner?
>>> HaplotypesCreator?
>>>
>>
>> Considering that this is probably a small piece of code in the long
>> run (correct me if I am wrong), I suggest creating
>> Bio.PopGen.Utils.NameToBeDecided.py
>> _______________________________________________
>> BioPython mailing list  -  BioPython at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biopython
>>
>>
>
> Hi,
> I really don't mean to be negative, but you have certain responsibilities
> once you release code into the Biopython community. Part of my concern is
> that some of this is being overlooked especially in terms of the user of the
> code. I do see that simulation of SNPs is useful for users so it is
> important that it integrated correctly.
>
> I think Michiel's recent comment in 'a sequence set object in biopython'
> thread is important here as well:
>
> "Adding new classes to Biopython should be done very carefully ... once
> they're in, it's difficult to remove them again. In the past, removing
> classes that turned out to be less than ideal was a real headache."
>
> While I have not looked at the code, my view is that must remain integrated
> into the PopGen module. I would expect that a user would some Biopython
> (PopGen) modules with some simulated SNPs. I would prefer that Biopython
> remains as much as possible a set of integrated tools rather than just a
> collection of tools. This is a clear example where if it is not totally
> integrated then I don't see the point in including it in Biopython.
>
> The second aspect is that it must have a very stable API, similarly to
> Michiel's comment is that changing APIs after a release is also a pain
> especially if the module has been around a long time. Based on your first
> post, I would argue that you are not quite at this stage yet.

ehi, wait :) I wasn't proposing to integrate this module in biopython,
at least not yet!! :)
This is a module to generate test sets to help the development of the
other future PopGen modules.

For example, we wanted to write a function to calculate the Fst
statistics over snps data.
The Fst is an index that tells you if, given two populations, they
follow the same pattern of variability, and therefore can be
considered as two subpopulations of the same population or not.
To test such a script, you will need a module like the one I wrote
here: for example, you could create two samples of 200 individuals
with the same frequencies at every site, and see what your Fst script
tells. Then, probably, compare the results with another tool that is
already know to calculate the Fst correctly.

So I was just asking for any suggestions - which models should I
implement in this generator? And how? Which parameters should it
accept? Should it use the random module?


> Bruce
> _______________________________________________
> BioPython mailing list  -  BioPython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>



-- 
-----------------------------------------------------------

My Blog on Bioinformatics (italian): http://bioinfoblog.it




More information about the Biopython mailing list