[Biojava-l] Behavior of the createRegex() method (MotifTool class)
Matthew Pocock
matthew_pocock@yahoo.co.uk
Sun, 01 Dec 2002 21:20:59 +0000
Well spotted Sylvain,
Keith, there's a method in AlphabetTools - getAllSymbols(). Feed it with
the matches() map of the symbol & cat together the tokens from each of
these.
Matthew
Keith James wrote:
>>>>>>"Sylvain" == Sylvain Foisy <sylvain.foisy@bioneq.qc.ca> writes:
>
>
> Sylvain> Hi, I used the createRegex() method to return a regular
> Sylvain> expression from a sequence of DNA inputted by the user to
> Sylvain> scan a genome for that motif. I just discovered an
> Sylvain> interesting thing about that method: if n is in the motif
> Sylvain> to seek, the regex will not have n as a possibility.
>
> Sylvain> Ok, I have that motif: atgnnnndgta.
>
> Sylvain> CreateRegex would return: atg[atcg]{4}gta and it does
>
> Sylvain> What if my sequence to scan contains n: atgagcngta, for
> Sylvain> exemple. Java.util.regex would not find the
> Sylvain> pattern. Unless mistaken, the pattern should be
> Sylvain> atg[atcgn]{4}gta.
>
> Sylvain> Am I wrong? Any input would be appreciated
>
> You are correct about the behaviour, but not about the solution. An
> ambiguous target sequence could contain n, but could also contain r,
> y, m, k, s, w, h, b, v and d. To match correctly the regex would have
> to take into account that the symbols represented by n are a superset
> of those represented by the other ambiguity symbols.
>
> As MotifTools is generic (it will work for any alphabet) implementing
> generation of regexes for searching ambiguous SymbolLists requires a
> more complex algorithm than the current one. I'll take a look at this
> as soon as I can.
>
> Keith
>
--
BioJava Consulting LTD - Support and training for BioJava
http://www.biojava.co.uk
__________________________________________________
Do You Yahoo!?
Everything you'll ever need on one web page
from News and Sport to Email and Music Charts
http://uk.my.yahoo.com