[BioSQL-l] location type

Hilmar Lapp hlapp@gnf.org
Tue, 15 Oct 2002 09:56:24 -0700


> -----Original Message-----
> From: Thomas Down [mailto:td2@sanger.ac.uk]
> Sent: Tuesday, October 15, 2002 3:39 AM
> To: Hilmar Lapp
> Cc: Thomas Down; biosql-l@open-bio.org
> Subject: Re: [BioSQL-l] location type
> 
> 
> On Tue, Oct 15, 2002 at 01:52:43AM -0700, Hilmar Lapp wrote:
> >
<snip/>
> 
> > What I suggested is not really denormalization. You're 
> right though, 
> > there's duplication. If the coordinate types (which I mentioned 
> > further down in the email in the part you cut) are just moved from 
> > being location_qualifier_value associations to FKs on 
> location, then 
> > there isn't really duplication anymore either. Or am I missing 
> > something?
> 
> It's still duplication for the cases where the actual maxima 
> and minima
> are in location_qualifier_value.

What would be duplicated? Can you be specific? (I don't see it.)

>  I guess you could go to whoe
> way and get rid of location_qualifier_value completely and put max
> and min slots on seqfeature_location.  But that leaves the 
> table rather
> bloated.

Right. And it is going to be the most voluminous table.

> 
> I still think this is really something to be solved in software
> (and yes, it's a good example of why strict 1-to-1 tables-to-adaptors
> doesn't really work once you have too much structure in your 
> data).

In fact I have a relatively strict 1:1 relationship between adaptors and objects (that is, interfaces), not adaptors and tables. As a matter of fact the adaptors themselves don't even know what the schema is, there is a second layer of drivers for that (which in turn don't know how to deal with objects).

Having even the types in a n:n qualifier association is by no means obvious from the object model, nor is it mandated by it. I.e., it is quite likely that many schemas will not have this, and hence this problem is an artifact of the particular biosql schema in question. So I can't put that code into the schema-independent layer. That's ugly to me.

>  If
> you want to work around it in the database, why not just put a boolean
> `here be qualifiers'  flag on seqfeature_location?

This was more or less the suggestion my first email ended in: have a FK to ontology_term that denotes an encoded type. Since whether or not there be qualifiers strictly depends on the type, this is similar to a flag except that it uses controlled vocabulary and is more explicit. Is this too explicit to you, or too much information encoded?

>  This has 
> less impact
> on the location table than 3 FKs to ontology_term, and I can't really
> see it being any slower, since code that has discovered fuzzies is
> likely to want to go to the location_qualifier_value table to get the
> whole story anyway.

Right. But how can I figure that I don't need to go there, which is going to be 99% of the time? I'm just worried that we create complexity here for a very limited use case (round-tripping of fuzzy locations; even for data-mining or display you don't care about the fuzzies do you).

I do think the database can't let the software alone here. Databases are here to help. :)

	-hilmar