[Biopython-dev] Calculating motif scores

Michiel de Hoon mjldehoon at yahoo.com
Thu Jul 16 22:25:22 EDT 2009


> The function you are looking for is called search_pwm:
> 
> search_pwm(self, sequence, normalized=0, masked=0,
> threshold=0.0, both=True)
> a generator function, returning found hits in a given
> sequence with the pwm score higher than the threshold

OK, that comes close to what I had in mind.

> Nonetheless, if you have a function in c doing just that,
> we could incorporate it into biopython, for fast exhaustive
> searches on shorter sequences.

It doesn't have to be so short. I've been running these calculations for whole mammalian chromosomes. For the human chromosome 1, this would take
247249719 * 4 bytes = 943 MB to store the scores in a Numerical Python array. This can still be comfortably handled by today's computers.

I'll upload a C version to CVS so you guys can have a look and try it out.

How would you feel about having a separate PWM class in Bio.Motif? Some of the stuff currently in the class Motif is actually more about the PWM by itself; it may make sense to separate that out.

--Michiel.

--Michiel.



      


More information about the Biopython-dev mailing list