[Biopython-dev] Calculating motif scores
Michiel de Hoon
mjldehoon at yahoo.com
Fri Jul 17 02:25:22 UTC 2009
> The function you are looking for is called search_pwm:
> search_pwm(self, sequence, normalized=0, masked=0,
> threshold=0.0, both=True)
> a generator function, returning found hits in a given
> sequence with the pwm score higher than the threshold
OK, that comes close to what I had in mind.
> Nonetheless, if you have a function in c doing just that,
> we could incorporate it into biopython, for fast exhaustive
> searches on shorter sequences.
It doesn't have to be so short. I've been running these calculations for whole mammalian chromosomes. For the human chromosome 1, this would take
247249719 * 4 bytes = 943 MB to store the scores in a Numerical Python array. This can still be comfortably handled by today's computers.
I'll upload a C version to CVS so you guys can have a look and try it out.
How would you feel about having a separate PWM class in Bio.Motif? Some of the stuff currently in the class Motif is actually more about the PWM by itself; it may make sense to separate that out.
More information about the Biopython-dev