[Biopython-dev] Fwd: Fast instance search of motif in a sequence

Michiel de Hoon mjldehoon at yahoo.com
Wed Feb 13 02:06:33 UTC 2013


Hi Sefa,

Bio.Motif._Motif.search_instances() searches for exact instances of a motif, but it looks like your code searches for motifs based on its PSSM score. Then, isn't it the same as the current code in Bio/Motif/_pwm.c (or Bio/motifs/_pwm.c)?

Best,
-Michiel.

--- On Tue, 2/12/13, Sefa Kılıç <sefakilic at gmail.com> wrote:

> From: Sefa Kılıç <sefakilic at gmail.com>
> Subject: [Biopython-dev] Fwd: Fast instance search of motif in a sequence
> To: biopython-dev at biopython.org
> Date: Tuesday, February 12, 2013, 6:18 PM
> Hi all,
> 
> I am working on comparative genomics and I frequently use
> Motif module of
> Biopython. One of the most frequent operations that I do is
> to build a
> motif out of sites and search a sequence to find instances
> that are similar
> to the motif [Bio.Motif._Motif.search_instances()].
> 
> The problem is that the sequence that instances are searched
> is huge.
> Mostly it is the genome sequence itself, with its reverse
> complement. For
> example, scanning the E.coli genome + its reverse complement
> with a motif
> of length ~20 takes almost a minute in my machine.
> 
> To make it faster, I implemented a C version of it and a
> Python interface
> so that you can call it from Python. It is pretty fast, it
> takes about ~2.5
> seconds.
> 
> Current implementation can be found at:
> 
> https://github.com/sefakilic/yassi
> 
> If anyone is interested and it is appropriate, I would like
> to modify the
> current implementation and integrate it into Biopython.
> 
> Thanks!
> 
> Sefa Kilic
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
> 




More information about the Biopython-dev mailing list