[Biojava-l] equals() method for SymbolList
Keith James
kdj@sanger.ac.uk
11 Oct 2002 16:47:08 +0100
>>>>> "Phillip" == Phillip Lord <p.lord@russet.org.uk> writes:
>>>>> "Matthew" == Matthew Pocock <matthew_pocock@yahoo.co.uk> writes:
Matthew> SymbolList should be behaving like a string over its
Matthew> symbols. It is silly if it doesn't do this. Hash codes
Matthew> should realy be calculated in a different (but
Matthew> sequence-dependant) way to avoid scanning the whole of
Matthew> very large sequences just to do a hash lookup. Anyone got
Matthew> any ideas?
Phillip> Just make the hash out of say the first 10 elements in
Phillip> the list. The hashcode is not meant to be unique for all
Phillip> sequences, it's just a performance enhancement. So long
Phillip> as equals returns false for different sequences, then
Phillip> there is no problem.
in a similar vein, the array sampling techniques at
http://www273.pair.com/med/columns/Durable6.html
would work, but equals would get called more often for sequences with
similar base composition. How about first 10 and then add in values
for just the indices that are powers of two?
Keith
--
- Keith James <kdj@sanger.ac.uk> bioinformatics programming support -
- Pathogen Sequencing Unit, The Wellcome Trust Sanger Institute, UK -