O/R mapping [was Re: [Bioperl-l] pipeline]

Chris Mungall cjm@fruitfly.bdgp.berkeley.edu
Tue, 12 Mar 2002 16:33:53 -0800 (PST)


On Tue, 12 Mar 2002, Ewan Birney wrote:

> I think Arne has toyed around with the idea of having the schema defined
> inside the adaptor and emitted somehow - I think all that is holding him
> back is time --- we need to get the new objects working well).
> 
> 
> Re: auto-generated object<->relational systems. From someone who has only
> seen this from afar (but a number of times - OPM, other things...)
> 
>   (a) There is no standard. People always want to write their own system
> to have control over some aspects of it

yup; you'd think that if this approach was sound there would be some
decent tool to do this by now, rather than thousands of incompatible
pieces of crap.

>   (b) It seems to generate hard-to-find bugs. Of course, this may be
> because the system is so good the standard "easy to find" bugs are
> eliminated. Because the system gets complex, in general it ends up that
> these bugs bottleneck on one person who understands the O<->R mapping
> system

...and it sucks to be that one person (actually there's two of us for
gadfly)
 
>   (c) It seems to encourage large numbers of objects which actually
> prevent understanding the system well.

Agree with all these points. 

Also: (e) for queries more complex than can be dealt with by canned
templates, you end up either bypassing the O/R layer, or writing a complex
non-standard object query language, or forcing the api user to slurp
everything into memory and do the query with imperative code.

(f) inefficient

Having attempted to write automatic O/R systems a number of times and
ending up just overriding the automatated parts with canned sql templates
I've slowly come to the opinion that the approach is fundamentally
flawed. You can come up with some powerful code, but you always end up
getting your hands dirty fiddling with the adapters, so to speak.

Riffing off of what Imre has been doing, I do thing that automatic mapping
(relational, xml etc) *does* make a lot of sense when you use an ontology
langauge to model your data. For one thing you're starting off on a better
formal framework than UML, and you have properly exposed metadata.
Plus you don't have the class explosion problem, because you don't
actually create any java/perl class code for your ontology classes. Of
course you still need human-tuned optimisation, but at least you have a
nice split between the physical model and the logical model, the main
thing is to avoid modeling the same data multiple ways (object, xml, db)
and maintaining complex error-prone adapters.

>   (d) it becomes hard or virtually impossible to view the data as pure
> "data" as we do in things like Lite. I used to be against a pure data view
> of the world, but experience over the last couple of years, and watching
> what people can do with the data view (eg, Arek) has changed my mind.

Kudos to Arek, what he's been doing with Ensembl is awesome.

Until the ontological pipe dream I'm on about starts to take form, I think
a more relation centric view could be useful.

* declarative vs imperative (imperative modeling code is evil)
* split between logical and physical

Maybe I'm just turning into a cranky old timer but it does kind of seem
that we're all destined to repeat the mistakes that Codd already solved
all those years ago.

One thing that's different now is that now with postgres and soon with
mysql4.1 we have decent ways of doing proper relational stuff without
paying corporate bucks.

Another cool thing to explore is a predicate-logic interpretation of the
relational data, eg datalog, but that's another tangent.

> Elia - your/Jerm's proposal looks on track. I'll read it more carefully
> and comment.
> 
> 
> > 
> > Rgds.,
> > 
> > i
> > 
> 
> -----------------------------------------------------------------
> Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
> <birney@ebi.ac.uk>. 
> -----------------------------------------------------------------
>