Albert, Try that, I think it will help... Pierre #!/usr/bin/perl use strict; my $seq = "CGATCAACGAATCGTACGTACTC"; my $gapped_seq = "GGGGGGCG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTCGGGGG"; my @qqq = split ( //, $seq ); my $pat = join( '-*', @qqq ); my $patRegExp = qr/$pat/; if ( $gapped_seq =~ /$patRegExp/g){ #print the matching part print $&."\n"; } On Fri, 2007-02-23 at 09:59 +0000, Albert Vilella wrote: > now that we are at this pattern matching thread, I was wondering if > any perl guru could enlighten me on the issue of matching exact > sequence patterns on a gapped target sequence. E.g.: > > my $seq = "CGATCAACGAATCGTACGTACTC"; > my $gapped_seq = > "GGGGGGCG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTCGGGGG"; > > and one would like to get as a result: > > "CG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTC" > > which is the match of $seq but in $gapped_seq. > > Cheers, > > Albert. > > > On 2/23/07, Heikki Lehvaslaiho wrote: > > Kurt, > > > > There are few things in your code to note: > > > > - regexp /C*T/ matches any T preceded by zero or more Cs, > > not what you meant > > - $- and $+ are among the "expensive" perl functions worth > > not using unless you have to. Using them once in your > > code slows execution down considerable. There is always > > an other way. > > - Keep in mind what you want to use the match positions for: > > Human readable locations usually start counting with 1 but > > perl code uses 0 as the first location. The code below assumes > > you want to print the locations out. > > > > Study my example code below. > > > > Yours, > > -Heikki > > > > ################################################################### > > #!/usr/bin/perl > > $seq = "GATCAAT"; > > #$pattern= 'C*T'; > > $pattern= 'C.*T'; > > > > while ($seq =~ m/($pattern)/gi) { > > > > $match = $1; > > $end = pos($seq); > > $start = $end - length($match) +1; > > > > print "$match : $start - $end\n"; > > } > > > > ################################################################### > > > > > > On Thursday 22 February 2007 22:41:37 Kurt Gobain wrote: > > > Hi every1.. > > > I m facing a great deal of problem in simple pattern matching between > > > sequence & a pattern ..Program shod be designed such a way that it shod be > > > able do two things 1) normal matching...For eg: GATCAAT....if TC is > > > entered... output shod be 2...2) matching using spl character..In same > > > example if C*T value is entered It shod give o/p as 3 & seq to b displayed > > > is CAAT..I m easily getting 1st part...But in 2nd part Its giving sum > > > problem..output I m gettin as 1 instead of 3...Code is really simple! > > > > > > #!/usr/bin/perl > > > $alphabet = "GATCAAT"; > > > $pattern= "C*T "; > > > > > > $alphabet =~ /($pattern)/i; > > > > > > print "The entire '$pattern' match began at $-[0] and ended at $+[0]\n"; > > > > > > ==================== > > > OUTPUT! > > > The entire C*T match began at 1 and ended at 2 > > > ==================== > > > > > > but the o/p shod be 3???? > > > & Is there n e chance I can get seq too..I mean instead of C*T'' i need > > > 'CAAT'...???? > > > > > > Well..Its not compulsion to use regex....But I find it quite simple..can > > > there be n e other method?? > > > > > > Thanx in advance! > > > Kurt! > > > > > > > > -- > > ______ _/ _/_____________________________________________________ > > _/ _/ > > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za > > _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho > > _/ _/ _/ SANBI, South African National Bioinformatics Institute > > _/ _/ _/ University of Western Cape, South Africa > > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 > > ___ _/_/_/_/_/________________________________________________________ > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l