[BioRuby] restriction enzyme module

Toshiaki Katayama ktym at hgc.jp
Fri Apr 6 00:55:14 EDT 2007


Trevor,

I changed AA#to_re method symmetrically to the new NA#to_re.

    def to_re(seq)
      replace = {
        'B' => '[DNB]',
        'Z' => '[EQZ]',
        'J' => '[ILJ]',
        'X' => '[ACDEFGHIKLMNPQRSTVWYUOX]',
      }
      replace.default = '.'

      str = seq.to_s.upcase
      str.gsub!(/[^ACDEFGHIKLMNPQRSTVWYUO]/) { |aa|
        replace[aa]
      }
      Regexp.new(str)
    end

I also added the default value '.' to rescue unexpected char in the sequence.

There are two possibilities:
1. replace abnormal alphabet to '.' (current implementation)
2. left the alphabet as is (previous implementation)

I suppose 1. is better if the sequence has gap signs '-' etc. which may
have different meaning in regexp.

Toshiaki


On 2007/04/06, at 8:21, Trevor Wennblom wrote:

>
> On Apr 5, 2007, at 9:55 AM, Toshiaki Katayama wrote:
>
>>
>> I'll forward the patch I sent to you.
>> Do you think this is applicable?
>
>
> Looks good to me, patch applied.  Now how about taking care of AA:
>
>      def to_re(seq)
>        str = seq.to_s.upcase
>        str.gsub!(/[^BZACDEFGHIKLMNPQRSTVWYU]/, ".")
>        str.gsub!("B", "[DN]")
>        str.gsub!("Z", "[EQ]")
>        Regexp.new(str)
>      end
>
> What if we changed it to:
>
>      def to_re(seq)
>        str = seq.to_s.upcase
>        str.gsub!(/[^BZACDEFGHIKLMNPQRSTVWYU]/, ".")
>        str.gsub!("B", "[BDN]")
>        str.gsub!("Z", "[ZEQ]")
>        str.gsub!("J", "[JIL]")
>        Regexp.new(str)
>      end
>
> Note i've added Xle to the list.
>
>
> Trevor
> _______________________________________________
> BioRuby mailing list
> BioRuby at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioruby



More information about the BioRuby mailing list