[Dynamite] compile status / radical idea

Guy Slater guy@ebi.ac.uk
Tue, 18 Apr 2000 00:02:58 +0100 (BST)


On Sun, 16 Apr 2000, Ewan Birney wrote:

> On Sun, 16 Apr 2000, Ian Holmes wrote:
> 
> > Ewan,
> > 
> > Thanks for your considered mail. First off I think one theme that is
> > emerging is that we are all slightly uncomfortable with the current 
> > paralysis, and that the constructive thing to do is identify the
> > bottlenecks.
>
> Great point Ian. I am about to disconnect for the trip back to the uk, but
> I thought I'd respond quickly here...
> 
> > 
> > Three options are on the table: (1) keep going with IDL-to-C; (2) go Perl,
> > possibly mixed with C; (3) go pure C. I am continuing to argue for (2),
> > for reasons described below. I am also sympathetic to (3).

I'm not quite clear about the differences between (1) and (3).

Scroll to end for my $0.02

> > > i have been dreaming code last night, and I sort of realised that
> > > *internally* in the telegraph package we only "virtual
> > > function"/implementation flipping in a limited number of areas 
> > > 
> > > - getting sequences out of a database (but not sequences themselves)
> > 
> > I thought you were keen on sequences being virtual too. Do you regard this
> > as less crucial now that you've implemented virtual contigs for EnsEMBL?
> > (just curious really...)
> > 
> 
> It is more that the momento design pattern allows you to get out of the
> problem of having virtual sequences. Once you let in momento's there is
> not a great deal of point in having virtua; attributes to sequecnes, just
> virtual wyas of getting sequence momentos.
> 
> this is a different argument than what I was arguing 2 months ago. Culpa
> mea.
> 
> > > - running the algorithms either run-time or compile-time.
> > > 
> > > - perhaps some training code.
> > > 
> > > Everything else neither needs run-time method binding nor that much
> > > inheritance.
> > > 
> > > 
> > > So - rather than moving to Perl (drawbacks in my book -
> > > 
> > > 	a) hard to maintain a large Perl code base - look at ensembl
> > 
> > Actually, I don't think this *would* be a large Perl codebase. This
> > project is well-contained, and our object model is already laid out.
> > I think we could do it in a dozen or so smallish modules. Probably less
> > code than "idlstubs.pl". (And probably quicker to write.)
> 
> Hmmmm. Maybe. I see it getting nasty.
> 
> 
> > 
> > > 	b) execute heavy pieces going run like a stuck cow
> > 
> > Yes, but the Perl implementation is proof of concept only. We'd have two
> > options to improve performance:
> > 
> > 	(1) port DP routines to C
> > 	(2) autogenerate C (c.f. original Dynamite) - VERY easy using Perl
> 
> Doing calls Perl->C->Perl, which we might have to do sucks big time. 
> 
> Kevin is in the same bind of porting son-of-gaze, written in Perl into C -
> but the port is triggering an almost global rewrite into C.
> 
> > 
> > > 	c) guy wont do anything
> > 
> > ;-)
> > 
> > I had hoped that Guy would be interested in converting parts of the
> > package from Perl to C. The DP algorithms, for example.
> > 
> > What makes this idea so attractive to me is quick publication. Let me
> > elaborate, then you can shoot me down if you disagree...
> > 
> > I/we can write Telegraph in Perl *very quickly*. We are talking about a
> > matter of days here. OK, so it runs slow, but we have proof of concept of
> > everything - the whole object model, the idea of polymer HMMs, the
> > parameter space translation, the training code. _Everything_.
> > 
> > We then start to port parts of it to C, using the same object model as for
> > the Perl. (The original Perl version must be so object-oriented that it
> > has a halo.) We can even mix Perl & C initially, using XS. We can aim to
> > eventually implement the entire library in standalone C, or just the DP
> > algorithms, or whatever is feasible. There is no shame in leaving the
> > training algorithms in Perl, because the training code can be decoupled
> > from the DP code very easily. It is entirely feasible for the training and
> > the DP code to communicate by means of XS calls, or over sockets, or even
> > through temporary files: the only object that passes from the DP phase to
> > the training phase is a Param::Value::Buf, which is easily serialisable.
> > 
> > We can work in parallel. No bottlenecks, and we can write a paper at any
> > stage, because we have 100% proof of concept: a working Perl program. I
> > hypothesise that a useful division of labour would be for me to do the
> > initial Perl implementation, perhaps with Ewan. Then Ewan and Guy could
> > take over the porting to C, while I could either write more Perl (e.g.
> > experimental training code, XML I/O) or help with the C port.
> > 
> > As soon as we publish, we can go all-out Open Source, i.e. publicise the
> > mailing list, give away bottles of champagne, etc etc. Perhaps people will
> > even help us with the C conversion.
> > 
> > Being able to publish early, even just a poster at ISMB, is *very*
> > *attractive*. It will really get the ball rolling; a collaboration with a
> > publication to its name is collaboration that has come of age.
> 
> 
> Hmmm. This is a good argument. I *do* like the cut of your jib Ian. 
> 
> Let me mull on this a bit. 
> 
> > 
> > > 
> > > I suggest - 
> > > 
> > > 	Using "Standard" C methods, with some pointer-to-funtion for
> > > database streaming/database access, algorithm implementation to allow
> > > compile time code coming in cleanly and possibly training interface.
> > > 
> > > I have a clean sequence stuff already with pointer-to-function for
> > > database streaming. I can bind these via CORBA to bioperl.
> > > 
> > > 
> > > What do people think?
> > 
> > I'm not completely sure I follow you. Are you proposing abandoning our IDL
> > object model but sticking with C?
> > 
> > If so then I guess this would certainly remove the IDL-to-C bottleneck
> > that arguably has contributed to our current paralysis. We would be
> > throwing out a few babies with the bathwater though...
> > 
> > 	(Baby #1) Yes we are only making sparing use of inheritance and
> > 		  dynamic binding, but IMO the main advantage of
> > 		  "object-oriented C" is having a logical object model,
> > 		  making the library nicer & more logical to use.
> > 	          Our IDL-to-C mapping enforces this.
> > 
> > 	(Baby #2) The formality of using an IDL-to-C mapping also provides
> > 		  for future scenarios such as interfacing to CORBA or
> > 		  Perl XS.
> > 
> > I have no interest in pushing idlstubs if you are both uncomfortable using
> > it. I have always been concerned that using an in-house compiler would
> > give people the willies, especially if it is opaque to everyone except
> > me.
> > 
> > Most of my recent work on idlstubs has been aiming towards making it more
> > comprehensible, by separating out the C-generating part from the IDL
> > parser. With these improvements, it would be straightforward for you guys
> > to edit to the C without having to delve into the idlstubs Perl.
> > 
> > I estimate the new improved idlstubs would be ready by the end of the
> > month, unless we abandon IDL-to-C in which case I won't work on it.
> > 
> > On balance, I think the bottleneck problem probably outweighs the
> > advantages of IDL-to-C. But I'd like to see a little more discussion on
> > this list first.
> > 
> > I still favour Perl, because I see this being the quickest way by far of
> > getting a working library. Dissuade me...
> > 
> 
> I dont think I can.
> 
> Ok. I vote for a fast perl implementation, sequences coming from bioperl
> and then rewrite of DP in C looking first for XS links.
> 
> 
> guy should put his $0.02 first before we leap.
> 

OK.  Religious preferences aside, I think a major problem
with perl will be performance.

If telegraph was to be only a code generation system, then I reckon
it would be viable, however I imagine having runtime code in C will
incur quite a performance hit, and in perl it could be prohibitive.

I know that practical performance isn't a major issue for a prototype
implementation, but it will impede development if it is too slow.

Maybe the perl interpreter is better than that.
Can either of you convince me with a short DP example in perl?

Also, do you know of any projects which have migrated smoothly
from a perl prototype implementation to a clean C implementation ?

I'd be happy to do perl->C reimplementations,
but I'm afraid I wouldn't be any use with the perl implementations.

Guy
--