[BioPython] code/algorithm for pattern mining
Catherine Letondal
pise@pasteur.fr
Wed, 20 Nov 2002 14:56:50 +0100
Andreas Kuntzagk wrote:
> Hi,
>
> I want to find frequent patterns in a set of sequences. Is there any
> ready-to-use code in biopython? If not, where can I find some
> description of algorithms? I'm still not very experienced in
> BioInformatics so I'm not sure what to look for.
>
> (I'm looking for unknown patterns!)
>
> Thanks, Andreas
>
You can use one of the program available in Pise for pattern discovery:
http://bioweb.pasteur.fr/seqanal/motif/intro-uk.html
through the Pise/python API
(see http://www-alt.pasteur.fr/~letondal/Pise/#pisepython)
So, say you use SMILE (Structured Motif Inference and Evaluation, L.Marsan, J. Allali):
you can do this:
# -----------------------------------------------------
# ----------- run SMILE from a Pise API ---------------
from Pise import PiseFactory
import sys
factory = PiseFactory() # you can provide an email here
smile = factory.program("smile",
seq=sys.argv[1], # sequences file
alphabet='dna.alphabet',
quorum=50, minlen=2, maxlen=6, subst=0)
job=smile.run()
print job.jobid()
print job.content('smile.result')
job.save('smile.result')
# -----------------------------------------------------
Smile can be *very* CPU consuming of course - and in such case please provide an email:
factory = PiseFactory('you@somewhere')
but you can try with this small dataset:
>s1
atatatgccc
>s2
atatatgccc
>s3
atatatgccc
>s4
atatagccc
(results example: http://bioweb.pasteur.fr/seqanal/tmp/smile/A86715103779983/)
To play with Smile parameters, maybe first go here:
http://bioweb.pasteur.fr/seqanal/interfaces/smile.html
(or here: http://bioweb.pasteur.fr/seqanal/interfaces/smile2.html)
To know available parameters: pydoc Pise.smile
To know what to do with results: pydoc Pise.PiseJob
--
Catherine Letondal -- Pasteur Institute Computing Center