Albert,

Try that, I think it will help...

Pierre


#!/usr/bin/perl


use strict;

my $seq = "CGATCAACGAATCGTACGTACTC";
my $gapped_seq =
"GGGGGGCG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTCGGGGG";


my @qqq       = split ( //, $seq );
my $pat       = join( '-*', @qqq );
my $patRegExp = qr/$pat/;

if ( $gapped_seq =~ /$patRegExp/g){
   #print the matching part
  print $&."\n"; 
}



On Fri, 2007-02-23 at 09:59 +0000, Albert Vilella wrote:

> now that we are at this pattern matching thread, I was wondering if
> any perl guru could enlighten me on the issue of matching exact
> sequence patterns on a gapped target sequence. E.g.:
> 
> my $seq = "CGATCAACGAATCGTACGTACTC";
> my $gapped_seq =
> "GGGGGGCG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTCGGGGG";
> 
> and one would like to get as a result:
> 
> "CG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTC"
> 
> which is the match of $seq but in $gapped_seq.
> 
> Cheers,
> 
>     Albert.
> 
> 
> On 2/23/07, Heikki Lehvaslaiho <heikki@sanbi.ac.za> wrote:
> > Kurt,
> >
> > There are  few things in your code to note:
> >
> > - regexp /C*T/ matches any T preceded by zero or more Cs,
> >   not what you meant
> > - $- and $+ are among the "expensive" perl functions worth
> >   not using unless you have to. Using them once in your
> >   code slows execution down considerable. There is always
> >   an other way.
> > - Keep in mind what you want to use the match positions for:
> >   Human readable locations usually start counting with 1 but
> >   perl code uses 0 as the first location. The code below assumes
> >   you want to print the locations out.
> >
> > Study my example code below.
> >
> > Yours,
> >         -Heikki
> >
> > ###################################################################
> > #!/usr/bin/perl
> > $seq = "GATCAAT";
> > #$pattern=  'C*T';
> > $pattern=  'C.*T';
> >
> > while ($seq =~ m/($pattern)/gi) {
> >
> >     $match = $1;
> >     $end = pos($seq);
> >     $start = $end - length($match) +1;
> >
> >     print "$match : $start - $end\n";
> > }
> >
> > ###################################################################
> >
> >
> > On Thursday 22 February 2007 22:41:37 Kurt Gobain wrote:
> > > Hi every1..
> > > I m facing a great deal of problem in simple pattern matching between
> > > sequence & a pattern ..Program shod be designed such a way that it shod be
> > > able do two things 1) normal matching...For eg: GATCAAT....if TC is
> > > entered... output shod be 2...2) matching using spl character..In same
> > > example if C*T value is entered It shod give o/p as 3 & seq to b displayed
> > > is CAAT..I m easily getting 1st part...But in 2nd part Its giving sum
> > > problem..output I m gettin as 1 instead of 3...Code is really simple!
> > >
> > > #!/usr/bin/perl
> > > $alphabet = "GATCAAT";
> > > $pattern=  "C*T ";
> > >
> > > $alphabet =~ /($pattern)/i;
> > >
> > > print "The entire '$pattern' match began at $-[0] and ended at $+[0]\n";
> > >
> > > ====================
> > > OUTPUT!
> > > The entire C*T match began at 1 and ended at 2
> > > ====================
> > >
> > > but the o/p shod be 3????
> > > & Is there n e chance I can get seq too..I mean instead of C*T'' i need
> > > 'CAAT'...????
> > >
> > > Well..Its not compulsion to use regex....But I find it quite simple..can
> > > there be n e other method??
> > >
> > > Thanx in advance!
> > > Kurt!
> >
> >
> >
> > --
> > ______ _/      _/_____________________________________________________
> >       _/      _/
> >      _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
> >     _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
> >    _/  _/  _/  SANBI, South African National Bioinformatics Institute
> >   _/  _/  _/  University of Western Cape, South Africa
> >      _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> > ___ _/_/_/_/_/________________________________________________________
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

