[BioSQL-l] Ontology names
Thomas Down
td2@sanger.ac.uk
Mon, 30 Sep 2002 10:45:43 +0100
On Fri, Sep 27, 2002 at 11:52:56AM -0700, Hilmar Lapp wrote:
>
> Ontology names will likely (but are not required to) have NULL in
> category_id.
>
> Is everyone OK with this so far?
>
> In order to get things out by a Bio* package other than the one that
> put it in, we need to agree on ontology names in the first place
> (but also on terms).
>
> I am right now using the following ontology names:
>
> - 'Annotation Tags': the keys (tags, qualifier names) for simple
> annotation values (qualifier values)
> - 'SeqFeature Keys': the keys of seqfeatures ($feat->primary_tag()
> slot in bioperl; e.g., the genbank feature key, or swissprot feature
> key, like 'CDS', 'mRNA', ...)
> - 'SeqFeature Sources': the source names of seqfeatures
> ($feat->source_tag() slot in bioperl; like 'swissprot', 'genscan',
> etc).
>
> There is already a pre-defined number of terms for location
> properties (min_start, etc), but without an ontology. I'd like to
> put them into an ontology and suggest the name 'Location Tags' for
> it.
Sorry to reply a bit late to this thread -- I've been having
a few problems with e-mails to and from these mailing lists
(probably DNS-related, and seem to be sorted out now).
Anyway, to me this all feels like it's trying to mix together
several different concepts. Many (though by no means all)
ontology_terms are really defining properties of objects.
The keys used in seqfeature_qualifier_value are a very good
example of this. Similarly the location qualifiers.
Looking specifically at properties, they can be defined by:
- Their domain -- the class (or classes) of object to which
they apply.
- Their range -- the set of values which are allowed.
- Their cardinality -- e.g, 0..1, exactly 1, 0..infinity
The domain might just be `seqfeature' or `seqfeature_location'.
But the interesting cases come when you set more restrictive
domains (say, "A feature of type SNP must have one or more
variants"). A more mundane application might be to define
the required set of qualifiers for a given feature type in
an EMBL feature table./
We're now taking ontology_terms somewhat beyond being a simple
controlled vocabulary, and into schema-land. I don't know what
people's feelings are on this. My understanding is that the
original plan with ontology_term was to leave it totally opaque,
then join on some extra tables which included relationship/schema
information.
As I understand it (please correct me if I've got the wrong
end of this), the `category' concept seems to be trying to
mix up aspects of property domains (for ontology_terms which
define names of properties) and propery ranges (for terms which
are used as values -- e.g. seqfeature_key). Is this actually
a sensible thing to do?
Hilmar: I know you're on a tight schedule with this. If adding
a category field solves your problem, today, then go for it.
However, it might be better to put this on a separate table,
for ease of untangling stuff in the future (it also avoids having
an FK to self, although you still get a circular reference, of
course).
Thomas.
PS. The way I've discussed properties here is very DAML-esque.
At some point in the past, I remember a dicussion about doing
DAML definitions for the open-bio datamodels. Did this
ever get off the ground?