[BioPython] code/algorithm for pattern mining

Catherine Letondal pise@pasteur.fr
Wed, 20 Nov 2002 14:56:50 +0100


Andreas Kuntzagk wrote:
> Hi, 
> 
> I want to find frequent patterns in a set of sequences. Is there any
> ready-to-use code in biopython? If not, where can I find some
> description of algorithms? I'm still not very experienced in
> BioInformatics so I'm not sure what to look for.
> 
> (I'm looking for unknown patterns!)
> 
> Thanks, Andreas
> 

You can use one of the program available in Pise for pattern discovery:
http://bioweb.pasteur.fr/seqanal/motif/intro-uk.html
through the Pise/python API
(see http://www-alt.pasteur.fr/~letondal/Pise/#pisepython)

So, say you use SMILE (Structured Motif Inference and Evaluation, L.Marsan, J. Allali):
you can do this:

# -----------------------------------------------------
# ----------- run SMILE from a Pise API ---------------

from Pise import PiseFactory
import sys

factory = PiseFactory()					# you can provide an email here
smile = factory.program("smile",
                        seq=sys.argv[1],                # sequences file 
                        alphabet='dna.alphabet',
                        quorum=50, minlen=2, maxlen=6, subst=0)
job=smile.run()
print job.jobid()
print job.content('smile.result')
job.save('smile.result')
# -----------------------------------------------------


Smile can be *very* CPU consuming of course - and in such case please provide an email:

	factory = PiseFactory('you@somewhere')

but you can try with this small dataset:

>s1
atatatgccc
>s2
atatatgccc
>s3
atatatgccc
>s4
atatagccc

(results example: http://bioweb.pasteur.fr/seqanal/tmp/smile/A86715103779983/)

To play with Smile parameters, maybe first go here:
http://bioweb.pasteur.fr/seqanal/interfaces/smile.html
(or here: http://bioweb.pasteur.fr/seqanal/interfaces/smile2.html)

To know available parameters: pydoc Pise.smile
To know what to do with results: pydoc Pise.PiseJob

--
Catherine Letondal -- Pasteur Institute Computing Center