[Biojava-l] Pattern matching

Jerome LANE Jerome.Lane at igh.cnrs.fr
Mon Jun 25 18:24:10 UTC 2007


Hi,

I have used biojava Pattern class to match DNA sequence. But I can't 
find all matches for my sequence. For example here a bit of code that I 
have implemented to search for "aa" pattern in "aaaa" DNA sequence :

-----------------------------------
try {
               // Variables needed...
               org.biojava.utils.regex.Matcher occurences ;
               FiniteAlphabet IUPAC = DNATools.getDNA();
               SymbolList WorkingSequence = DNATools.createDNA("aaaa");
                         // Create pattern using pattern factory.
               org.biojava.utils.regex.Pattern pattern;
               PatternFactory FACTORY = PatternFactory.makeFactory(IUPAC);
               try{
                   pattern = FACTORY.compile("aa");
               } catch(Exception e) {e.printStackTrace(); return;}
               System.out.println("Searching for: 
"+pattern.patternAsString( ) );
                         // Obtain iterator of matches.
               try {
                   occurences = pattern.matcher( WorkingSequence );
               } catch(Exception e) {e.printStackTrace(); return;}
                   // Foreach match
               while( occurences.find( ) ) {
                   System.out.println("Match: " +"\t"+ WorkingSequence
                                   +"\n"+ occurences.start() +"\t"+ 
occurences.group().seqString());
               }
           } catch (Exception ex) {
               ex.printStackTrace();
               System.exit(1);
           }
----------------------------

And this is the output :

----------------------------
Searching for: aa
Match:     org.biojava.bio.symbol.SimpleSymbolList at ea82ff69 length: 4
1    aa
Match:     org.biojava.bio.symbol.SimpleSymbolList at ea82ff69 length: 4
3    aa
--------------------------------
But for the input sequence "aaaa" I should have 3 matchs at postion 1, 2 
and 3. Is there any parameter to provide for it ?

Best regards

Jerome



More information about the Biojava-l mailing list