[BioSQL-l] Re: [Biojava-l] ontology exception,
addSequence & BioSQLSequenceDB
Matthew Pocock
matthew_pocock at yahoo.co.uk
Thu Oct 16 10:33:20 EDT 2003
Hi all,
I've added this table to my copy of biosql:
CREATE TABLE term_relationship_term (
term_relationship_id INTEGER NOT NULL,
term_id INTEGER NOT NULL,
PRIMARY KEY ( term_relationship_id, term_id ),
UNIQUE ( term_relationship_id ),
UNIQUE ( term_id ) );
This could be modelled more correctly by adding a nullable field to
term_relationship that refers to a term_id, but doing it this way didn't
break existing biosql schemas. This lets associate a single term with a
term_relationship effecively allowing us to treat triples as 1st class
terms.
* Why bother?
For stooring simple facts, you just need terms and triples. For this
case, the existing schema is fine.
For stooring inference rules, I think you need both variables (which
currently we identify by leading unerscors) and the ability to compose
complex expressions from simple triples and terms e.g. this is a rule
for transitive closure
transitive_closure as
implies(and(isa(_t, transitive), and(_t(_x, _y), _t(_y, _z))),
_t(_x, _z))
And here's one for doing a dumb linking of SO terms to embl feature
types by just appending "urn:so:" to the embl feature name:
so2ft { convert between so terms and feature table types }
equal(so2ft(_so, _ft),
and(and(term_name(_so_name, _so), term_name(_ft_name, _ft)),
concatenation(_so_name, ["urn:so:", _so_name])))
This would let us store the inference rules that are needed to interpret
a biosql database actually in that database, which must be a good thing.
These expressions seem to be able to express prety much everything. They
are also explict about how they can be interpreted.
A large class of things which are represented in this form can be
directly turned into prolog expressions (ok, prolog can't do higher
order logic, but HiLog or something else could be used, you get the idea).
* what biojava does
Our ontolgy API has Terms and Triples. The Triples extend Term, so that
every Triple is a 1st class Term that can be reasoned about. Triples
that are not the subject, object or object of other triples are the
things that look like prolog predicates. Terms that are not triples are
atoms (the alphabet of your language) and simple triples are used as
normal for basic rules like isa(x, y) and things.
* what we do in biosql
So, whenever a triple is persisted to biosql, we now write an entry to
term_relationship_term and to term reprenting the triple. This could be
simplified by adding a nullable field to term_relationship.
* open questions
Perhaps there is another way to represent complex constraints and
inferece rules that uses old-style triples. However, most of what I have
seen introduces more ambiguity to the system.
Perhaps people think we should not be storing meta-data or inference
rules and the like in the database. All I can say to this is that I
believe that it should be possible to provide the relevant knowledge to
reason over the data along with that data if we choose to do so.
Enough from me. Comments? If I don't hear anything back within some
reasonable length of time, I will just add the extra foreign key to
term_relationship.
Matthew
Hilmar Lapp wrote:
> Starts making sense. I in fact suspected that it is about being very
> explicit about which is otherwise implied behind the scenes, and I
> know you don't like that.
>
> Would you care to write this up like you did below and post to
> biosql-l? Otherwise I'd do it and you may not like how I quote you ;-)
>
> An additional foreign key on either term_relationship or term
> shouldn't actually break anything unless you make it NOT NULL (it's
> just not going to be supported for a while outside of biojava). What
> would be significantly more involved is adding foreign keys to
> term_relationship subject and object pointing back to rel.ship and
> having them alternative by constraints to the term subject and object.
>
> -hilmar
More information about the BioSQL-l
mailing list