[BioSQL-l] Re: [Biojava-l] ontology exception, addSequence & BioSQLSequenceDB

Thu Oct 16 10:33:20 EDT 2003

Hi all,

I've added this table to my copy of biosql:

CREATE TABLE term_relationship_term (
    term_relationship_id INTEGER NOT NULL,
    term_id INTEGER NOT NULL,
        PRIMARY KEY ( term_relationship_id, term_id ),
        UNIQUE ( term_relationship_id ),
        UNIQUE ( term_id ) );

This could be modelled more correctly by adding a nullable field to 
term_relationship that refers to a term_id, but doing it this way didn't 
break existing biosql schemas. This lets associate a single term with a 
term_relationship effecively allowing us to treat triples as 1st class 
terms.

* Why bother?

For stooring simple facts, you just need terms and triples. For this 
case, the existing schema is fine.

For stooring inference rules, I think you need both variables (which 
currently we identify by leading unerscors) and the ability to compose 
complex expressions from simple triples and terms e.g. this is a rule 
for transitive closure

transitive_closure as
implies(and(isa(_t, transitive), and(_t(_x, _y), _t(_y, _z))),
        _t(_x, _z))

And here's one for doing a dumb linking of SO terms to embl feature 
types by just appending "urn:so:" to the embl feature name:

so2ft { convert between so terms and feature table types }
equal(so2ft(_so, _ft),
      and(and(term_name(_so_name, _so), term_name(_ft_name, _ft)),
          concatenation(_so_name, ["urn:so:", _so_name])))

This would let us store the inference rules that are needed to interpret 
a biosql database actually in that database, which must be a good thing. 
These expressions seem to be able to express prety much everything. They 
are also explict about how they can be interpreted.

A large class of things which are represented in this form can be 
directly turned into prolog expressions (ok, prolog can't do higher 
order logic, but HiLog or something else could be used, you get the idea).

* what biojava does

Our ontolgy API has Terms and Triples. The Triples extend Term, so that 
every Triple is a 1st class Term that can be reasoned about. Triples 
that are not the subject, object or object of other triples are the 
things that look like prolog predicates. Terms that are not triples are 
atoms (the alphabet of your language) and simple triples are used as 
normal for basic rules like isa(x, y) and things.

* what we do in biosql

So, whenever a triple is persisted to biosql, we now write an entry to 
term_relationship_term and to term reprenting the triple. This could be 
simplified by adding a nullable field to term_relationship.

* open questions

Perhaps there is another way to represent complex constraints and 
inferece rules that uses old-style triples. However, most of what I have 
seen introduces more ambiguity to the system.

Perhaps people think we should not be storing meta-data or inference 
rules and the like in the database. All I can say to this is that I 
believe that it should be possible to provide the relevant knowledge to 
reason over the data along with that data if we choose to do so.

Enough from me. Comments? If I don't hear anything back within some 
reasonable length of time, I will just add the extra foreign key to 
term_relationship.

Matthew

Hilmar Lapp wrote:

> Starts making sense. I in fact suspected that it is about being very 
> explicit about which is otherwise implied behind the scenes, and I 
> know you don't like that.
>
> Would you care to write this up like you did below and post to 
> biosql-l? Otherwise I'd do it and you may not like how I quote you ;-)
>
> An additional foreign key on either term_relationship or term 
> shouldn't actually break anything unless you make it NOT NULL (it's 
> just not going to be supported for a while outside of biojava). What 
> would be significantly more involved is adding foreign keys to 
> term_relationship subject and object pointing back to rel.ship and 
> having them alternative by constraints to the term subject and object.
>
>     -hilmar