[Biopython] weighted sampling of a dictionary

George Devaniranjan devaniranjan at gmail.com
Fri Oct 28 13:23:22 UTC 2011


Thanks guys for all your suggestions -I am going to try these out.

Best,
George

On Thu, Oct 27, 2011 at 4:52 PM, David Winter
<winda002 at student.otago.ac.nz>wrote:

> Hi George,
>
> I was actually doing this yesterday :)
>
> The function I came up with takes two lists:
>
> import random
>
> def weighted_sample(population, weights):
>  """ Sample from a population, given provided weights """
>  if len(population) != len(weights):
>    raise ValueError('Lengths of population and weights do not match')
>  normal_weights = [ float(w)/sum(weights) for w in weights ]
>  val = random.random()
>  running_total = 0
>  for index, weight in enumerate(normal_weights):
>    running_total += weight
>    if val < running_total:
>      return population[index]
>
> Which seems to do the trick:
>
> population = ['AAU' ,'AAC', 'AAG']
> weights = [2,5,3]
> sample = [weighted_sample(population, weights) for _ in range(1000)]
> sample.count('AAC') #should be about 500
>
> If that's too slow, check out numpy's random.multinomial() function.
>
> I haven't tested this, but this should get you the number of times you get
> each codon from 1000 "draws":
>
> import numpy as np
>
> codons, weights = codon_dict.items()
> denom = sum(weights)
> normalised_weights = [float(w)/denom for w in weights]
> np.random.multinomial(codons, weights, 1000)
>
> Cheers,
> David
>
>
>
> Quoting George Devaniranjan <devaniranjan at gmail.com>:
>
>  Hi,
>>
>> I am not sure if this question is more suitable for biopython or a python
>> forum.
>>
>>
>> I have the following dictionary.
>>
>> dict ={'YLE': 6, 'QYL': 36, 'PTD': 32, 'AGG': 145, 'QYG': 34, 'QYD': 34,
>> 'AGD': 188, 'QYS': 35, 'AGS': 177, 'AGA': 154, 'QYA': 23, 'AGL': 16,
>> 'LAU':
>> 1, 'PTA': 7, '
>> AGY': 7, 'QYY': 19, 'QYE': 6, 'PAT': 57, 'QYT': 28, 'AGT': 10, 'QYQ': 34,
>> 'AGQ': 140, 'QYP': 32, 'AGP': 167, 'TAT': 31, 'SGS': 174, 'TAP': 18,
>> 'YLP':
>> 49, 'TA
>> Q': 23, 'UQE': 5, 'UAQ': 9, 'UAT': 8, 'UAE': 7, 'TAD': 1, 'TAG': 15,
>> 'TAA':
>> 20, 'TAS': 1, 'YUP': 1, 'TAL': 45, 'ALU': 20, 'PEP': 14, 'UAG': 6, 'EAL':
>> 16, 'SY
>> Y': 36, 'EAS': 35, 'SYT': 29, 'EAA': 16, 'SYQ': 13, 'EAG': 28}
>>
>> The keys are the different amino acid triplets (all possible triplets
>> extracted from a culled list of PDB), the numbers next to them are the
>> frequency that they occour in.
>>
>> I was wondering if there is a way in biopython/python to sample them at
>> the
>> frequecy indicated by the no's next to the key.
>>
>> I have only given a snippet of the triplet dictionary, the entire
>> dictionary
>> has about 1400 key entries.
>>
>> I would appreciate any help in this matter --thank you very much.
>>
>> George
>> ______________________________**_________________
>> Biopython mailing list  -  Biopython at lists.open-bio.org
>> http://lists.open-bio.org/**mailman/listinfo/biopython<http://lists.open-bio.org/mailman/listinfo/biopython>
>>
>>
>
>
>



More information about the Biopython mailing list