[Biojava-l] Behavior of the createRegex() method (MotifTool class)

Matthew Pocock matthew_pocock@yahoo.co.uk
Sun, 01 Dec 2002 21:20:59 +0000


Well spotted Sylvain,

Keith, there's a method in AlphabetTools - getAllSymbols(). Feed it with 
  the matches() map of the symbol & cat together the tokens from each of 
these.

Matthew

Keith James wrote:
>>>>>>"Sylvain" == Sylvain Foisy <sylvain.foisy@bioneq.qc.ca> writes:
> 
> 
>     Sylvain> Hi, I used the createRegex() method to return a regular
>     Sylvain> expression from a sequence of DNA inputted by the user to
>     Sylvain> scan a genome for that motif. I just discovered an
>     Sylvain> interesting thing about that method: if n is in the motif
>     Sylvain> to seek, the regex will not have n as a possibility.
> 
>     Sylvain> Ok, I have that motif: atgnnnndgta.
> 
>     Sylvain> CreateRegex would return: atg[atcg]{4}gta and it does
> 
>     Sylvain> What if my sequence to scan contains n: atgagcngta, for
>     Sylvain> exemple.  Java.util.regex would not find the
>     Sylvain> pattern. Unless mistaken, the pattern should be
>     Sylvain> atg[atcgn]{4}gta.
> 
>     Sylvain> Am I wrong? Any input would be appreciated
> 
> You are correct about the behaviour, but not about the solution. An
> ambiguous target sequence could contain n, but could also contain r,
> y, m, k, s, w, h, b, v and d. To match correctly the regex would have
> to take into account that the symbols represented by n are a superset
> of those represented by the other ambiguity symbols.
> 
> As MotifTools is generic (it will work for any alphabet) implementing
> generation of regexes for searching ambiguous SymbolLists requires a
> more complex algorithm than the current one. I'll take a look at this
> as soon as I can.
> 
> Keith
> 


-- 
BioJava Consulting LTD - Support and training for BioJava
http://www.biojava.co.uk

__________________________________________________
Do You Yahoo!?
Everything you'll ever need on one web page
from News and Sport to Email and Music Charts
http://uk.my.yahoo.com