[Biojava-l] ssaha
Matthew Pocock
matthew_pocock@yahoo.co.uk
Thu, 07 Mar 2002 13:35:25 +0000
Dear all,
We now have a src-1.4 directory in BioJava - curtosey of Thomas. If you
are on a 1.4 compliant platform, this source tree will be built allong
with src. If you are on an earlier platform, it will be silently ignored.
The first addition to this directory is an implementation of the SSAHA
searching algorithm developed at the Sanger Centre. It currently doesn't
scale (being bound by a 2GB limit on hash-table size, and I'm not sure
that the NIO packages in the 1.4 release are bug-free). I will be
working on it to ensure that it can handle the full 2^64 byte data
tables available via the c++ implementations. The java and c++ hash
tables are unlikely to be binary compatible. The java hash tables should
be network portable, assuming that you move them as binary ;-)
If someone is keen, they could write a NIO-based socket server for the
SSAHA search engine so that we could set up highly efficient
client-server search services (should be able to handle 1000s of clients
with NIO and a thread-pool). Also, it currently reports hits but not as
collections of HSPs. There is the possibility of doing bounded
alignments using SSAHA hits as anchor points. By replacing the Packing
object, we could do a codon based SSAHA, a protein SSAHA, or any other
funkey alphabet you can come up with. The rules for discarding frequent
words are bad at the moment (absolute threshold), so this could be
replaced with some nice histogram maths. I don't have the time to tidy
all of this, but perhaps you do.
NIO rocks!
Have fun,
Matthew