[Dynamite] compile status / radical idea
Guy Slater
guy@ebi.ac.uk
Wed, 26 Apr 2000 19:21:21 +0100 (BST)
On Mon, 17 Apr 2000, Ian Holmes wrote:
> On Tue, 18 Apr 2000, Guy Slater wrote:
>
> > OK. Religious preferences aside, I think a major problem
> > with perl will be performance.
> >
> > If telegraph was to be only a code generation system, then I reckon
> > it would be viable, however I imagine having runtime code in C will
> > incur quite a performance hit, and in perl it could be prohibitive.
> >
> > I know that practical performance isn't a major issue for a prototype
> > implementation, but it will impede development if it is too slow.
> >
> > Maybe the perl interpreter is better than that.
> > Can either of you convince me with a short DP example in perl?
>
> Attached is a dummy SW implementation. It takes about a second of user
> time to do a 100*100 matrix on my 450MHz box.
>
> This _is_ too slow for end users, but I think it will be OK for
> development of core ideas. By the time we start trying to implement
> Genewise, we should be calling C routines from Perl for the DP.
I've had a play with this now. The brevity of the code is impressive.
(but I can't believe I'm editing perl. hmmff.)
I'm not sure if it is fast enough for core ideas
- many of them will require longer sequences.
Anything requiring longer sequences, like intron modelling,
7tms etc will be difficult.
Even at seqlen=1000, the performance and memory usage turns to shit.
Comparison running on laptop with PII-400, 128Mb:
ssearch3 : 1.1Mb, 0.9 seconds
sw-demo.pl : 117.0Mb 112.2 seconds
I know this is a linear space vs quadratic space implementations,
but the memory usage is still really bad. Is this a bug ?
I still think that this problem (and many others) could be solved
elegantly and flexibly by having some sort of language layer
between the model/algorithms and the compute (maybe even XML).
Something that just describing arrays, matrices and the values that need
to be maxed over. Just enough to get dp scores and tracebacks.
I think this would make the whole system a lot more modular.
It would be much easier to slot in different parts either side of
the language layer. Things to come later like distributed calculation
or specialised hardware ports would be a lot easier.
Guy.
--
%!PS % <------ Guy St.C. Slater ------> http://www.ebi.ac.uk/~guy/ <------
210 297/a{def}def/b{translate}a b 36/c{rotate}a c 0 1 0 1 12/d{exch moveto}
a/e{closepath stroke}a/f{index}a/g{0 0 0 0 4 f}a/h{setlinewidth newpath dup
g}a{pop exch 1 f add 0 h neg d lineto 72 c lineto e 2 h d 3 f 0 108 arc d e
18 c 0 2 f neg b 18 c}for 72 c newpath add g 0 7 arc d e pop showpage