[Bioperl-l] A perl regex query

Benno Puetz puetz at mpipsykl.mpg.de
Tue Sep 18 14:12:47 UTC 2007


James Smith wrote:
>
> Neeti,
>
> This isn't really a bioperl query - but I will try and explain a simple
> solution...
>
> warn simplify( 'Cyclic-2,3-bisphospho-D-glycerate' );
>
> sub simplify {
>   local $_ = "-$_[0]-";
>         ## Quick hack add -'s at start and end! as always match 
> "-string-"
>   s/-(
>     Cyclic | # The prefix "cyclic"
>     \d+    | # a single number between two "-"s
>     \d+,\d+| # number,number between two "-"s
>     \w       # a single letter between two "-"s
>   )(?=-)//ixg;  ## case-insensitive, commented, multiple matches!
>         ## 0-width +ve lookahead assertion - so can match
>         ## multiple consecutive -x- constructions in same regexp!
>   s/-//g;
>         ## remove remaining "-"s from string...
> }
>
> Not sure what other test strings you may want - but most should be 
> able to
> fit in the () brackets in the first regexp of simplify
>
> James
Along the same line

# some test for most of the removals below
my $string = "Alpha-Cyclic-2,3-bi-sphos-1,2,5-pho-D-beta-glycerate";
my @ra_bad_terms = (  '-?(D|R|S)-',
                      '-?([aA]lpha|[bB]eta|[gG]amma)-',
                      '-?([cC]is|[tT]rans)-',
                      '-?[cC]yclic-',
                    # '-?\d+(,\d+)+-',   # uncomment to remove numbers, too
                      '(?<!\d)-' );          # '-' following number
print "$string\n";
foreach ( @ra_bad_terms ){

  eval { $string =~ s/$_//g; };
  print "$_:$string\n";   # for feedback only
}
#$string =~ s/<@ra_bad_terms>//g;

print lc($string),"\n";


-- 
Benno Pütz
Statistische Genetik
Max-Planck-Institut f. Psychiatrie            Tel.: +49-89-30622-222
Kraepelinstr. 10                              Fax : +49-89-30622-601
80804 München, Germany




More information about the Bioperl-l mailing list