[Dynamite] compile status / radical idea

Ewan Birney birney@ebi.ac.uk
Sun, 16 Apr 2000 22:21:08 +0100 (BST)


On Sun, 16 Apr 2000, Ian Holmes wrote:

> Ewan,
> 
> Thanks for your considered mail. First off I think one theme that is
> emerging is that we are all slightly uncomfortable with the current 
> paralysis, and that the constructive thing to do is identify the
> bottlenecks.

Great point Ian. I am about to disconnect for the trip back to the uk, but
I thought I'd respond quickly here...

> 
> Three options are on the table: (1) keep going with IDL-to-C; (2) go Perl,
> possibly mixed with C; (3) go pure C. I am continuing to argue for (2),
> for reasons described below. I am also sympathetic to (3).
> 
> > i have been dreaming code last night, and I sort of realised that
> > *internally* in the telegraph package we only "virtual
> > function"/implementation flipping in a limited number of areas 
> > 
> > - getting sequences out of a database (but not sequences themselves)
> 
> I thought you were keen on sequences being virtual too. Do you regard this
> as less crucial now that you've implemented virtual contigs for EnsEMBL?
> (just curious really...)
> 

It is more that the momento design pattern allows you to get out of the
problem of having virtual sequences. Once you let in momento's there is
not a great deal of point in having virtua; attributes to sequecnes, just
virtual wyas of getting sequence momentos.

this is a different argument than what I was arguing 2 months ago. Culpa
mea.

> > - running the algorithms either run-time or compile-time.
> > 
> > - perhaps some training code.
> > 
> > Everything else neither needs run-time method binding nor that much
> > inheritance.
> > 
> > 
> > So - rather than moving to Perl (drawbacks in my book -
> > 
> > 	a) hard to maintain a large Perl code base - look at ensembl
> 
> Actually, I don't think this *would* be a large Perl codebase. This
> project is well-contained, and our object model is already laid out.
> I think we could do it in a dozen or so smallish modules. Probably less
> code than "idlstubs.pl". (And probably quicker to write.)

Hmmmm. Maybe. I see it getting nasty.


> 
> > 	b) execute heavy pieces going run like a stuck cow
> 
> Yes, but the Perl implementation is proof of concept only. We'd have two
> options to improve performance:
> 
> 	(1) port DP routines to C
> 	(2) autogenerate C (c.f. original Dynamite) - VERY easy using Perl

Doing calls Perl->C->Perl, which we might have to do sucks big time. 

Kevin is in the same bind of porting son-of-gaze, written in Perl into C -
but the port is triggering an almost global rewrite into C.

> 
> > 	c) guy wont do anything
> 
> ;-)
> 
> I had hoped that Guy would be interested in converting parts of the
> package from Perl to C. The DP algorithms, for example.
> 
> What makes this idea so attractive to me is quick publication. Let me
> elaborate, then you can shoot me down if you disagree...
> 
> I/we can write Telegraph in Perl *very quickly*. We are talking about a
> matter of days here. OK, so it runs slow, but we have proof of concept of
> everything - the whole object model, the idea of polymer HMMs, the
> parameter space translation, the training code. _Everything_.
> 
> We then start to port parts of it to C, using the same object model as for
> the Perl. (The original Perl version must be so object-oriented that it
> has a halo.) We can even mix Perl & C initially, using XS. We can aim to
> eventually implement the entire library in standalone C, or just the DP
> algorithms, or whatever is feasible. There is no shame in leaving the
> training algorithms in Perl, because the training code can be decoupled
> from the DP code very easily. It is entirely feasible for the training and
> the DP code to communicate by means of XS calls, or over sockets, or even
> through temporary files: the only object that passes from the DP phase to
> the training phase is a Param::Value::Buf, which is easily serialisable.
> 
> We can work in parallel. No bottlenecks, and we can write a paper at any
> stage, because we have 100% proof of concept: a working Perl program. I
> hypothesise that a useful division of labour would be for me to do the
> initial Perl implementation, perhaps with Ewan. Then Ewan and Guy could
> take over the porting to C, while I could either write more Perl (e.g.
> experimental training code, XML I/O) or help with the C port.
> 
> As soon as we publish, we can go all-out Open Source, i.e. publicise the
> mailing list, give away bottles of champagne, etc etc. Perhaps people will
> even help us with the C conversion.
> 
> Being able to publish early, even just a poster at ISMB, is *very*
> *attractive*. It will really get the ball rolling; a collaboration with a
> publication to its name is collaboration that has come of age.


Hmmm. This is a good argument. I *do* like the cut of your jib Ian. 

Let me mull on this a bit. 

> 
> > 
> > I suggest - 
> > 
> > 	Using "Standard" C methods, with some pointer-to-funtion for
> > database streaming/database access, algorithm implementation to allow
> > compile time code coming in cleanly and possibly training interface.
> > 
> > I have a clean sequence stuff already with pointer-to-function for
> > database streaming. I can bind these via CORBA to bioperl.
> > 
> > 
> > What do people think?
> 
> I'm not completely sure I follow you. Are you proposing abandoning our IDL
> object model but sticking with C?
> 
> If so then I guess this would certainly remove the IDL-to-C bottleneck
> that arguably has contributed to our current paralysis. We would be
> throwing out a few babies with the bathwater though...
> 
> 	(Baby #1) Yes we are only making sparing use of inheritance and
> 		  dynamic binding, but IMO the main advantage of
> 		  "object-oriented C" is having a logical object model,
> 		  making the library nicer & more logical to use.
> 	          Our IDL-to-C mapping enforces this.
> 
> 	(Baby #2) The formality of using an IDL-to-C mapping also provides
> 		  for future scenarios such as interfacing to CORBA or
> 		  Perl XS.
> 
> I have no interest in pushing idlstubs if you are both uncomfortable using
> it. I have always been concerned that using an in-house compiler would
> give people the willies, especially if it is opaque to everyone except
> me.
> 
> Most of my recent work on idlstubs has been aiming towards making it more
> comprehensible, by separating out the C-generating part from the IDL
> parser. With these improvements, it would be straightforward for you guys
> to edit to the C without having to delve into the idlstubs Perl.
> 
> I estimate the new improved idlstubs would be ready by the end of the
> month, unless we abandon IDL-to-C in which case I won't work on it.
> 
> On balance, I think the bottleneck problem probably outweighs the
> advantages of IDL-to-C. But I'd like to see a little more discussion on
> this list first.
> 
> I still favour Perl, because I see this being the quickest way by far of
> getting a working library. Dissuade me...
> 

I dont think I can.

Ok. I vote for a fast perl implementation, sequences coming from bioperl
and then rewrite of DP in C looking first for XS links.


guy should put his $0.02 first before we leap.




> Ian
> 

-----------------------------------------------------------------
Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
<birney@ebi.ac.uk>. 
-----------------------------------------------------------------