[Biojava-l] Pattern matching
Jerome LANE
Jerome.Lane at igh.cnrs.fr
Mon Jun 25 18:24:10 UTC 2007
Hi,
I have used biojava Pattern class to match DNA sequence. But I can't
find all matches for my sequence. For example here a bit of code that I
have implemented to search for "aa" pattern in "aaaa" DNA sequence :
-----------------------------------
try {
// Variables needed...
org.biojava.utils.regex.Matcher occurences ;
FiniteAlphabet IUPAC = DNATools.getDNA();
SymbolList WorkingSequence = DNATools.createDNA("aaaa");
// Create pattern using pattern factory.
org.biojava.utils.regex.Pattern pattern;
PatternFactory FACTORY = PatternFactory.makeFactory(IUPAC);
try{
pattern = FACTORY.compile("aa");
} catch(Exception e) {e.printStackTrace(); return;}
System.out.println("Searching for:
"+pattern.patternAsString( ) );
// Obtain iterator of matches.
try {
occurences = pattern.matcher( WorkingSequence );
} catch(Exception e) {e.printStackTrace(); return;}
// Foreach match
while( occurences.find( ) ) {
System.out.println("Match: " +"\t"+ WorkingSequence
+"\n"+ occurences.start() +"\t"+
occurences.group().seqString());
}
} catch (Exception ex) {
ex.printStackTrace();
System.exit(1);
}
----------------------------
And this is the output :
----------------------------
Searching for: aa
Match: org.biojava.bio.symbol.SimpleSymbolList at ea82ff69 length: 4
1 aa
Match: org.biojava.bio.symbol.SimpleSymbolList at ea82ff69 length: 4
3 aa
--------------------------------
But for the input sequence "aaaa" I should have 3 matchs at postion 1, 2
and 3. Is there any parameter to provide for it ?
Best regards
Jerome
More information about the Biojava-l
mailing list