[Biojava-l] RE: Bug in HashedAlphabetIndex??
Matthew Pocock
mrp@sanger.ac.uk
Wed, 07 Mar 2001 12:24:54 +0000
Hi Mark,
This looks like something I should sort out.
If you build an alphabet that represents the cross-product of other
aphabets, its size is obviously the product of the sizes of each
alphabet you combine. This can get very large, esp for alignments
(protein^10 = 20^10 = 1.024e13 symbols). In effect, this is the same
issue that makes alignment algorithms computationaly expensive for
aligning any reasonable number of sequences symultaneously to each
other. Obviously, we can't be expected to hold this number of objects in
memory, so there are some optimized implementation of FiniteAlphabet
that attempt to make symbols 'appear' when needed, and vanish when
discarded. There is obviously something up in the magic.
I'll get back to you when it's fixed.
Matthew
Schreiber, Mark wrote:
> Actually after loooking at the debugger I find that the Finite alphabet
> produced by the statement
>
> //create a cross product of N dna alphabets
> FiniteAlphabet nOrderAlpha =
> (FiniteAlphabet)AlphabetManager.getCrossProductAlphabet(
>
> Collections.nCopies(order.intValue(),DNATools.getDNA())
> );
>
> is very different depending on the value returned by order.intValue() If it
> is 3 then a shiney happy SimpleCrossProduct object is returned if it is
> larger than 4 a SparseCrossProduct object is returned??
>
> Is this a "feature"??
>
> Mark