[GSoC] GSoC 2013 is ON
Ketil Malde
ketil at malde.org
Tue Apr 2 04:03:58 EDT 2013
[CC everybody including the biohaskell list. Let me know if any of you
want off. :-) ]
Pjotr Prins <pjotr2010 at thebird.nl> writes:
> http://www.open-bio.org/wiki/Google_Summer_of_Code
> For Biopython (3x), BioRuby (5x) and BioJava (4x) I found project ideas.
> The others are missing.
> There is still a (rather small) window of opportunity for adding
> ideas.
I have one thing that might work well as a SOC project, if the right
student could be found.
Basically, I and a colleague recently developed and published a method
and implementation for more sensitive pairwise alignments. The paper is
here, I think (PLoS ONE seems to be down atm):
http://dx.plos.org/10.1371/journal.pone.0054422
I'm really happy about the results, if nothing else, check the SCOP
benchmark. Although it's difficult to construct a good test case using
more complex methods (training sets for HMMs and whatnot) I don't know
anything that is as good as this. We're using it for annotation of
genes.
The current implementation is in Haskell, and although it works
correctly, it is a bit slow, and more problematic, it consumes too much
memory (so going multi-threaded, although pretty easy, won't be of any
help).
I would like to make this into a less resource intensive (and thus more
practical) tool, and there are two ways I can think of to go about this:
1) Optimize the Haskell program
2) Reimplement the algorithm (or parts of it) in a different language
Advantages of 1:
* Already have a working program, and the type system makes it easy to
refactor without introducing errors.
* Haskell supports lots of good multi-threading programming models (like
STM)
* I know Haskell pretty well, and will be hopefully be able to mentor.
Disadvantages:
* Haskell has some good debugging tools, but they tend to work really
poorly for large memory (i.e. it takes a long time to generate
profiles)
* Needs somebody with a bit (or a lot) of experience optimizing Haskell,
and good knowledge of high-perf libraries (like vector)
Advantages of 2:
* Easier to get a student with adequate skills.
* More predictable performance models in other languages.
* Easier to compile and install for many users.
Disadvantages:
* Ideally, should know enough Haskell to read and understand the code.
* Likely needs a co-mentor with knowledge of the language in question.
Is this something I could or should submit as a task?
-k
--
If I haven't seen further, it is by standing in the footprints of giants
More information about the GSoC
mailing list