[BioSQL-l] location type

Thomas Down td2@sanger.ac.uk
Tue, 15 Oct 2002 09:11:31 +0100


On Sat, Oct 12, 2002 at 04:17:59PM -0700, Hilmar Lapp wrote:
> For certain location types, specifically fuzzy locations, the 
> additional attributes (min_start etc) are stored as 
> Location_Qualifier_Value entries.
> 
> If you don't know in advance whether a location is a fuzzy location, 
> you need to make an extra hit to the database for every location 
> just to find out most of the time that there are no extra attributes.
> 
> To alleviate this I propose to add to SeqFeature_Location a FK to 
> Ontology_Term denoting the type of the location. We'd need to agree 
> on a standard ontology for location types too. E.g.,
> 
> 	FuzzyLocation
> 	SplitLocation
> 	ExactLocation

I see duplicated information :-(.

The way I handle this in BioJava is to do three queries every
time I fetch a block of features (depending on circumstances,
this might be all features on a bioentry, all features overlapping
a sequence interval, or all child features of a given parent --
all three cases go through the same Java code, with slightly
different SQL queries):

   - Fetch all interesting features, and put mementos in a Map
     keyed on seqfeature_ids.

   - Fetch all location_qualifier_values for all interesting
     features (yes, in a single query).  Build in-memory memento
     objects, and put in a Map keyed on location_ids.

   - Fetch all location spans.  As each one is fetched, I do
     an in-memory lookup of its qualifiers.

Finally, the location spans get grouped together and attached
to the Feature.Templates.

Actually, things are a little more complex than this because
of the feature hierarchy, but you get the general idea.

I guess you could argue that I'm not taking maximum advantage
of the database engine by doing things this way.  But it's not
too bad to implement in practice, and scales well to large
numbers of seqfeatures per request.

Might this sort of design be a valid alternative to
denormalization?

       Thomas.