[Bioperl-l] regular expression help!

Barry Moore barry.moore at genetics.utah.edu
Fri Jan 21 13:51:38 EST 2005


Excellent reply.  I think we all learned something from that one.

Barry

James D. White wrote:

>Sorry about double posting, but I forgot to change the subject before
>sending the first message.
>
>  
>
>>Starting with:
>>
>>$regex =~ /\S+(\S+)(\S{10}).*(??{$rev=reverse(\2 =~ tr/ATCG/TAGC/i);})\1.*/i;
>>
>>The slashes in tr/// confused the Perl parser.  You need to use
>>different delimiters for the m// operator (the m is implied by //)
>>and the tr/// operator.  Also the tr/// operator does not use the
>>i flag, so lower case needs to be handled explicitly.  So let's
>>try the following:
>>
>>$regex =~ m:\S+(\S+)(\S{10}).*(??{$rev=reverse(\2 =~ tr/ATCGatcg/TAGCtagc/);})\1.*:i;
>>
>>This gives the error:
>>Can't modify constant item in transliteration (tr///) at (re_eval 1)
>>line 1, near "tr/ATCGatcg/TAGCtagc/)"
>>
>>Inside the (??{ CODE }) sequence, use $1, $2, ..., instead of
>>\1, \2, ... (See Programming Perl, 3rd Edition, "Match-time pattern
>>interpolation", p. 213) Inside the evaluated CODE, \2 is a
>>constant, not the value of the second captured substring.  Also I'm
>>not sure what modifying $2 would do, so let's try:
>>
>>$regex =~ m:\S+(\S+)(\S{10}).*(??{$rev = $2; $rev =~ tr/ATCGatcg/TAGCtagc/; reverse($rev);})\1.*:i;
>>
>>This works, but I would get rid of the leading "\S+" and trailing
>>".*".  The ".*" adds nothing useful, so just drop it.  You
>>probably don't need the leading "\S+", because the pattern is not
>>anchored to the beginning of the string with "^".  The leading
>>"\S+" gobbles up the entire string, forcing the match to backtrack
>>character by character from the end.  It also forces the substring
>>match saved in $1 to occur after the first character.  Unless you
>>never want $1 to consider the first character, just drop the
>>leading "\S+".  If you don't want to search the first character,
>>then just use "\S".  This results in:
>>
>>$regex =~ m:(\S+)(\S{10}).*(??{$rev = $2; $rev =~ tr/ATCGatcg/TAGCtagc/; reverse($rev);})\1:i;
>>
>>Finally I would probably change the remaining ".*" to ".*?".  If
>>you search with ".*" on a long sequence which could contain
>>multiple sequences of interest, the ".*" pattern will match the rest
>>of the sequence and force backtracking to match the first occurrence
>>of "$1$2" with the last occurrence of "revcomp($2)$1".  If you use
>>".*?", you match the first occurrence of "$1$2" with the nearest
>>occurrence of "revcomp($2)$1".  This results in the final regular
>>expression:
>>
>>$regex =~ m:(\S+)(\S{10}).*?(??{$rev = $2; $rev =~ tr/ATCGatcg/TAGCtagc/; reverse($rev);})\1:i;
>>
>>    
>>
>>>Date: Fri, 14 Jan 2005 14:12:46 -0500
>>>From: Guojun Yang <gyang at plantbio.uga.edu>
>>>Subject: [Bioperl-l] regular expression help!
>>>To: bioperl-l at portal.open-bio.org
>>>Message-ID: <20050114141246.94c7cb46 at dogwood.plantbio.uga.edu>
>>>Content-Type: text/plain;       charset="us-ascii"
>>>
>>>Hi, Everybody,
>>>I was trying to use a regex recognizing a patter of inverted repeat DNA seq flanked by direct repeats (see below), it returns errors saying "(?{...}) not terminated or {...} not balanced. Can anybody help me sorting this out?
>>>The regex I have is:
>>>$regex =~ /\S+(\S+)(\S{10}).*(??{$rev=reverse(\2 =~ tr/ATCG/TAGC/i);})\1.*/i;
>>>Thank you,
>>>Yang
>>>
>>>      
>>>
>>--
>>James D. White   (jdw at ou.edu)
>>Director of Bioinformatics
>>Department of Chemistry and Biochemistry/ACGT
>>University of Oklahoma
>>101 David L. Boren Blvd., SRTC 2100
>>Norman, OK 73019
>>Phone: (405) 325-4912, FAX: (405) 325-7762
>>    
>>
>
>--
>James D. White   (jdw at ou.edu)
>Director of Bioinformatics
>Department of Chemistry and Biochemistry/ACGT
>University of Oklahoma
>101 David L. Boren Blvd., SRTC 2100
>Norman, OK 73019
>Phone: (405) 325-4912, FAX: (405) 325-7762
>
>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at portal.open-bio.org
>http://portal.open-bio.org/mailman/listinfo/bioperl-l
>  
>

-- 
Barry Moore
Dept. of Human Genetics
University of Utah
Salt Lake City, UT



More information about the Bioperl-l mailing list