[Bioperl-l] seq_word and pattern counts

Tue Feb 28 22:47:01 UTC 2006

Staffa, Nick (NIH/NIEHS) [C] wrote:
> The real problem is this:
> We want to count sites in a long sequence where a restriction enzyme would cut.
> This restriction enzyme, in the example I gave will recognize GGnnCC,
> that is two G separated by two of any bases followed by two C.
> The GCG program findpatterns will do this, but bioperl makes certain statistics easy.
> I'm sure there is some module somewhere for this purpose. 

(Nick - please respond to me AND the bioperl-l at bioperl.org mailing list 
ie. "Reply All", so others can benefit from the Q&A - I've re-sent your 
past responses already).

Perhaps this module?

http://doc.bioperl.org/bioperl-live/Bio/Tools/RestrictionEnzyme.html

With this code?

my $enz = "GGNNCC";
my $re = new Bio::Tools::RestrictionEnzyme(-NAME =>"NicksResEnz--$enz",
	  			  	 -MAKE =>'custom');
@fragments = $re->cut_seq($seqobj);
print "$enz cuts ", $seqobj->display_id, " ", scalar(@fragments), " 
times.\n";

-- 
Torsten Seemann
Victorian Bioinformatics Consortium, Monash University, Australia
http://www.vicbioinformatics.com/
Phone: +61 3 9905 9010