[Biopython] weighted sampling of a dictionary

Thu Oct 27 20:29:43 UTC 2011

On Thu, Oct 27, 2011 at 1:16 PM, George Devaniranjan
<devaniranjan at gmail.com> wrote:
> Hi,
>
> I am not sure if this question is more suitable for biopython or a python
> forum.
>
>
> I have the following dictionary.
>
> dict ={'YLE': 6, 'QYL': 36, 'PTD': 32, 'AGG': 145, 'QYG': 34, 'QYD': 34,
> 'AGD': 188, 'QYS': 35, 'AGS': 177, 'AGA': 154, 'QYA': 23, 'AGL': 16, 'LAU':
> 1, 'PTA': 7, '
> AGY': 7, 'QYY': 19, 'QYE': 6, 'PAT': 57, 'QYT': 28, 'AGT': 10, 'QYQ': 34,
> 'AGQ': 140, 'QYP': 32, 'AGP': 167, 'TAT': 31, 'SGS': 174, 'TAP': 18, 'YLP':
> 49, 'TA
> Q': 23, 'UQE': 5, 'UAQ': 9, 'UAT': 8, 'UAE': 7, 'TAD': 1, 'TAG': 15, 'TAA':
> 20, 'TAS': 1, 'YUP': 1, 'TAL': 45, 'ALU': 20, 'PEP': 14, 'UAG': 6, 'EAL':
> 16, 'SY
> Y': 36, 'EAS': 35, 'SYT': 29, 'EAA': 16, 'SYQ': 13, 'EAG': 28}
>
> The keys are the different amino acid triplets (all possible triplets
> extracted from a culled list of PDB), the numbers next to them are the
> frequency that they occour in.
>
> I was wondering if there is a way in biopython/python to sample them at the
> frequecy indicated by the no's next to the key.
>
> I have only given a snippet of the triplet dictionary, the entire dictionary
> has about 1400 key entries.
>
> I would appreciate any help in this matter --thank you very much.
>
> George
> _______________________________________________
> Biopython mailing list  -  Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>


you could try the one of these (presumably the class king)
http://eli.thegreenplace.net/2010/01/22/weighted-random-generation-in-python/

you'll have something like:

import operator
aminos, weights = zip(*sorted(adict.items(), key=operator.itemgetter(1)))

amino_gen = WeightedRandomGenerator(weights)

for i in xrange(nsims):
    idx = amino_gen.next()
    rand_aa = aminos[idx]