[Biopython] PWM using gapped alignments

Chris Gowen gowencm at vcu.edu
Thu Jul 28 16:28:40 UTC 2011


Hello all,

We are trying to perform pwm calculations using the Motif.pwm() function,
and many of our alignments have gaps, which raise KeyError when it tries the
key '-'. I am fairly inexperienced with this analysis technique, but from
looking at the source, it seems the error itself may be avoided by adding a
line before line 97 to skip that letter in the calculation. Would this mess
up the calculation for the pwm scores? Has anyone dealt with this problem in
a more clever way?

Thanks for any advise you can offer.

Best,
Chris Gowen


82 - <http://biopython.org/DIST/docs/api/Bio.Motif._Motif-pysrc.html#> def
pwm<http://biopython.org/DIST/docs/api/Bio.Motif._Motif.Motif-class.html#pwm>
(self,laplace=True):
 83 """  84 returns the PWM computed for the set of instances  85  86 if
laplace=True (default), pseudocounts equal to self.background multiplied by
self.beta are added to all positions.  87 """  88  89 if self.
_pwm_is_current:  90 return
self._pwm<http://biopython.org/DIST/docs/api/Bio.Motif._Motif-pysrc.html#>
91 #we
need to compute new pwm  92
self._pwm<http://biopython.org/DIST/docs/api/Bio.Motif._Motif-pysrc.html#>
= []  93 for i<http://biopython.org/DIST/docs/api/Bio.Motif._Motif-pysrc.html#>
in xrange <http://biopython.org/DIST/docs/api/Bio.Motif._Motif-pysrc.html#>(
self.length<http://biopython.org/DIST/docs/api/Bio.Motif._Motif-pysrc.html#>
):  94 dict = {}  95 #filling the dict with 0's  96 for letter in self.
alphabet <http://biopython.org/DIST/docs/api/Bio.Motif._Motif-pysrc.html#>.
letters <http://biopython.org/DIST/docs/api/Bio.Motif._Motif-pysrc.html#>: 97
if laplace:  98 dict[letter]=self.beta*self.background[letter]  99 else: 100
dict[letter]=0.0 101 if self.has_counts: 102 #taking the raw counts 103 for
letter in self.alphabet<http://biopython.org/DIST/docs/api/Bio.Motif._Motif-pysrc.html#>
.letters <http://biopython.org/DIST/docs/api/Bio.Motif._Motif-pysrc.html#>:
104 dict[letter]+=self.counts[letter][i<http://biopython.org/DIST/docs/api/Bio.Motif._Motif-pysrc.html#>
] 105 elif self.has_instances: 106 #counting the occurences of letters in
instances 107 for
seq<http://biopython.org/DIST/docs/api/Bio.Motif._Motif-pysrc.html#>
in self.instances: 108 #dict[seq[i]]=dict[seq[i]]+1 109
dict[seq<http://biopython.org/DIST/docs/api/Bio.Motif._Motif-pysrc.html#>
[i <http://biopython.org/DIST/docs/api/Bio.Motif._Motif-pysrc.html#>]]+=1
110 self._pwm<http://biopython.org/DIST/docs/api/Bio.Motif._Motif-pysrc.html#>
.append <http://biopython.org/DIST/docs/api/Bio.Motif._Motif-pysrc.html#>(
FreqTable <http://biopython.org/DIST/docs/api/Bio.Motif._Motif-pysrc.html#>.
FreqTable <http://biopython.org/DIST/docs/api/Bio.Motif._Motif-pysrc.html#>(
dict,FreqTable<http://biopython.org/DIST/docs/api/Bio.Motif._Motif-pysrc.html#>
.COUNT <http://biopython.org/DIST/docs/api/Bio.Motif._Motif-pysrc.html#>,
self.alphabet<http://biopython.org/DIST/docs/api/Bio.Motif._Motif-pysrc.html#>
)) 111 self._pwm_is_current=1 112 return
self._pwm<http://biopython.org/DIST/docs/api/Bio.Motif._Motif-pysrc.html#>



More information about the Biopython mailing list