[Bioperl-l] Bio::Tools::SeqPattern

Mark, Terry tmark@amgen.com
Mon, 17 Dec 2001 09:26:17 -0800


Hi all,

I recently took at a look at this module.  The documentation was pretty
sparse; as near as I can tell the module is supposed to generate strings to
be used with Perl's regexp facilities, and the syntax for the expressions is
thus exactly as with Perl.  (I am a little uncertain about this, however, as
one of the POD examples has a pattern that contains '(GXX){3,2}', which
would generate an error if the corresponding regexp were run by itself in
Perl.)

Anyways, i have found what appears to be bugs in the module.

Consider the following stub of code:
	use Bio::Tools::SeqPattern;
	$a = '[C][C][C].{0,13}A';
	print "original pattern is '$a'\n";
	my $bioPat = new Bio::Tools::SeqPattern (-SEQ => $a,
      	  -TYPE => 'Dna');
	print "forward: \n";
	print $bioPat->expand . "\n";
	print "reverse\n";
	print $bioPat->revcom()->expand . "\n";
	print "reverse (expanded)\n";
	print $bioPat->revcom(1)->expand . "\n";

Which generates the output:
	[tmark@xena scripts]$ perl test.pl
	original pattern is '[C][C][C].{0,13}A'
	forward:
	CCC.{0,13}A
	reverse
	T.G{0,13}GG
	reverse (expanded)
	T.{0,13}GGG

Note, the reverse complement output differs from the
expanded-before-rev-complement output, even though the underlying pattern
contains no ambiguity codes - in which case, if I understand correctly, both
expanded strings should be the same.  
The problem seems to stem from the (here degenerate) use of character
classes as the pattern 'CCC.{0,13}A' seems OK.

Has anybody experienced problems with this module ? I'm using 0.7.2.

Thanks,
terry