[Biojava-l] [1.4pre1] BioJava's-Regex with ambigous symbols
Jesse
jesse-t at chello.nl
Mon Jun 6 06:08:03 EDT 2005
Hi Cor,
Thanks for your reply.
I corrected the pattern by doing the following.
When BioJava's org.biojava.bio.molbio.RestrictionEnzyme.forwardRegex()
returns the regex of a RestrictionEnzyme "gtakm" it will return
"gta[gtk][acm]". In which k (G or T) and m (A or C) are ambiguous.
So the ambiguous symbol "k" is converted ambiguous "[gtk]", by putting the
"k" in the brackets.
I simply solved it by removed all ambiguous symbols from the returned regex
string.
String searchPattern = re.getForwardRegex().replaceAll("[rymkswbdhvn]", "");
Regards,
Jesse
-----Original Message-----
From: Cor
Subject: RE: [Biojava-l] [1.4pre1] BioJava's-Regex with ambigous symbols
Hi Jesse,
Although I am a newbie myself, I have written some example code based on
existing BioJava-testcode :
String symbols = "atgcgacgtcttaannnnnnatgcaac";
SymbolList sl = DNATools.createDNA(symbols);
String patternString = "g[ag]cg[ct]c";
PatternFactory fact = PatternFactory.makeFactory(DNATools.getDNA());
Pattern pattern = fact.compile(patternString);
Matcher matcher = pattern.matcher(sl);
if (matcher.find()) {
System.out.println("match found");
}
else {
fail("failed to find target ");
}
In the pattern, you have to use [ag] in stead of [agr]. Otherwise you will
get
the error:
org.biojava.utils.regex.RegexException: all variant symbols must be atomic.
at
org.biojava.utils.regex.PatternChecker.parseVariantSymbols(PatternChecker.ja
va:363)
Regards,
Cor
More information about the Biojava-l
mailing list