[Bioperl-l] Re: Motif finding

Boris Lenhard Boris.Lenhard@cgb.ki.se
20 Feb 2002 19:41:19 +0100


> 
> Is there a way of finding motifs in a strand of DNA?
> The thing is, I want to find exact matches and some fuzzy ones (i.e 80% exact). Is 
> there a perl module to do it?
> 
> Thanks.
> 
> Desmond
> 

Try TFBS at http://forkhead.cgb.ki.se/TFBS/ .

But please wait a few hours, I am uploading the 0.3 release tonight.

It represents DNA patterns using matrices, but has modules for
converting a set of DNA motifs to matrix representation. In your case,
if you have e.g. motif "ACATTAGATTT", you would do

   my $patterngen =
      TFBS::PatternGen::SimplePFM->new(-sequences=>["ACATTAGATTT"]);

   my $frequency_matrix = $patterngen->pattern;
   my $weight_matrix = $frequency_matrix->to_PWM;

       # suppose you want to scan a sequence in a Bio::Seq object 
       # called $seqobj, with 80% score threshold

   my $binding_site_set = $weight_matrix->search_seq(-seqobj=>$seqobj,
                                                     -threshold=>"80%");

       # to loop through the $binding_site_set, do

   my $iterator = $binding_site_set->iterator;
   while (my $binding_site = $iterator->next) {
	# do whatever you want with $binding_site;
	# $binding_site is a TFBS::Site object,
	# which is a subclass of Bio::SeqFeature::Generic
	# and has all its functionality
   }

There are other ways to go, too.

Cheers, 

Boris


#####################################
 Boris Lenhard, Ph.D.
 Center for Genomics and Bioinformatics
 Karolinska Institutet
 Berzelius väg 35, B322
 171 77 Stockholm, SWEDEN
 Phone: +46 (0)8 728 6142
 FAX: +46 (0)8 32 48 26
 E-mail: Boris.Lenhard@cgb.ki.se
#####################################