[Biojava-l] How to use SuffixTree?
Matthew Pocock
matthew_pocock@yahoo.co.uk
Wed, 25 Sep 2002 13:35:53 +0100
Hi,
The suffix tree code is quite old now, it could probably do with an
overhaul. Given a SymbolList symL, you can see if it is in the tree by
doing something like:
int countWord(SuffixTree suffTree, SymbolList symL) {
SuffixTree.SuffixNode node = suffTree.getRoot();
for(int i = 1; i <= symL.length(); i++) {
int sym = suffTree.indexForSymbol(symL.symbolAt(i));
if(node.hasChild(sym)) {
node = suffTree.getChild(node, sym);
} else {
return 0;
}
}
return (int) node.getNumber();
}
The suffix tree interfaces are a bit suckey. The indexing should be
moved out to an AlphabetIndex delegate and the node/tree api slit is a
bit silly. Also, the implementation for these trees gets too big too
quickly. Mmm.
Matthew
hannah schmidt-glenewinkel wrote:
> I was happy to see that there is a SuffixTree-class in biojava...but now I'm
> just not sure how to use it.
> I think I understood the concept of a suffix tree in general: it holds
> references to all suffices of a given string, so that I can search for a pattern
> that may or may not occur in that String very fast.
>
> So shouldn't the SuffixTree-class provide methods like:
> boolean doesOccur(String pattern) or
> int occursAt(String pattern)
>
> I'd just like to know what I can actually do with a SuffixTree once I
> created it.
> Thank you very much for any help!
>
> Hannah
>
--
BioJava Consulting LTD - Support and training for BioJava
http://www.biojava.co.uk