[BioPython] Matrix search?

bartek wilczynski bartek at rezolwenta.eu.org
Thu Jul 14 11:09:17 EDT 2005


Maximilian Haeussler <maximilianh at gmail.com> wrote:

> Hi,
> 
> I'm searching for a module that can scan DNA-sequences against weight
> matrices (PWMs) from e.g. Transfac to find putative transcription
> factor binding sites. I have searched biopython.org and the mailing
> list but couldn't find anything appropriate.
> 
> There is the motif from the AlignAce package, but it's too specified
> and tailored for AlignACE. Is there really no module in Biopython
> apart from this?
> 
There is also module called MEME. They are a little bit redundant and both
provide a class called Motif. That's more or less something you are looking
for.

You may try to do something like this

import Bio.AlignAce.Motif as motif
from Bio.Seq import Seq
from Bio.Alphabet import IUPAC

m = motif.Motif()
a = IUPAC.unambiguous_dna
m.add_instance(Seq("ATATAT",a))
m.add_instance(Seq("ATATTT",a))
m.set_mask("******")
print m.__str__()

t = Seq("ATTATTATTATTATTATATATTT",a)

for o in m.search_instances(t): # search for exact matches
    print o

for o in m.search_pwm(t): # scan the whole sequence and score all positions
    print o

for o in m.search_pwm(t,0.5): # select only hits with score above 0.5
    print o



> Others: There is internal support in BioPerl, with an external module
> BioPerl called TFBS, in Biojava with its "Distributions", though they
> don't seem to support loading/saving matrices, and a standalone
> program which I found via this mailing list called "tacg".

TFBS is quite a large program developed and distributed also separately. It's
written in perl, so no quick way of incorporating this into BioPython.
 
> 
> Hum...any other ideas? Standalone programs/external modules for
> searching transcription factor binding sites that I've missed?
> 
The thing is, that if you like to scan sequences for motifs , you should better
know waht you are doing. There is no single scoring function and there are
always special cases. The code from AlignAce is very simplistic, but you should
have no problems extending it.

If you want to scan for transfac, you can check out the alibaba website:
http://www.alibaba2.com

Speaking of alignAce, they are also distributing something called ScanAce -
small quick program written in C.
-- 
regards
   Bartek Wilczynski
--
For every complex problem there is an answer that is clear, simple, and wrong. 
                   H. L. Mencken



More information about the BioPython mailing list