[Biopython-dev] Fwd: Fast instance search of motif in a sequence
Sefa Kılıç
sefakilic at gmail.com
Tue Feb 12 23:18:17 UTC 2013
Hi all,
I am working on comparative genomics and I frequently use Motif module of
Biopython. One of the most frequent operations that I do is to build a
motif out of sites and search a sequence to find instances that are similar
to the motif [Bio.Motif._Motif.search_instances()].
The problem is that the sequence that instances are searched is huge.
Mostly it is the genome sequence itself, with its reverse complement. For
example, scanning the E.coli genome + its reverse complement with a motif
of length ~20 takes almost a minute in my machine.
To make it faster, I implemented a C version of it and a Python interface
so that you can call it from Python. It is pretty fast, it takes about ~2.5
seconds.
Current implementation can be found at:
https://github.com/sefakilic/yassi
If anyone is interested and it is appropriate, I would like to modify the
current implementation and integrate it into Biopython.
Thanks!
Sefa Kilic
More information about the Biopython-dev
mailing list