[BioSQL-l] Exploring data in database
Hilmar Lapp
hlapp at gnf.org
Wed Mar 26 13:13:12 EST 2003
I've been through this before. For what it's worth, here's what I
experienced and learned.
- Lab/Bench scientists will not learn SQL in order to type in queries.
(In fact I fully agree with them that that's a total waste of their
time and skills.)
- They will not build queries against a fully normalized database.
(Again, I fully agree this is wasting their time.)
- They will want to run simple queries first, but very quickly will
want to drill down and run complex and unpredictable queries. Writing
tools to support hard-coded queries is not going to cut it, and writing
tools to flexibly and visually build complex queries is a reinvention
of a well-invented (though commercial) wheel, and IMHO hence a total
waste of the Bioinformaticists' time.
- These days most bench scientists are at least moderately
computer-savvy. Given willingness and curiosity for their data, I
haven't encountered lab scientists who could not be taught to build
queries visually in MS Access on 1 table or joining 2 tables. It also
didn't take them long to even run aggregate queries.
So, for what it's worth, here's what I would do.
1) Scrap MySQL, use PostgreSQL or any other serious RDBMS that honors
the fact that there are (normalized) models and (denormalized) views.
2) Design a proper relational data model (that is nice to you as the
data maintainer, not nice to the scientists as data viewers or miners),
then write views on top of it that most closely resemble the data types
in which your lab scientists think.
3) Hand out plain descriptions of your views (just name and column
names) and ask your scientists whether it makes sense to them. If not,
you need to improve or rewrite the views until it does. The lab
scientists have actually a very good idea what data they're working
with. You just need to match it. Your views will be denormalized.
Provide one view for each data entity (not relational entity), and
provide views that pre-join commonly joined datatypes.
4) Set up MS Access and the ODBC driver for your RDBMS on their
desktops. I suppose the former is done already, and the second is
usually not very hard.
5) Expose only the views, not the tables, to be visible when they link
tables in Access. (Provided the RDBMS allows you to do this. Otherwise
prefix all views, and advise to look only at 'tables' that start with
the prefix. The objective is to avoid upfront confusion as much as
possible, which leads to much quicker acceptance.)
6) Watch out for performance problems. Since you wrote the views, you
are in good control of performance tuning, which is a nice side effect.
If the view just doesn't cut it performance-wise, materialize it (which
will be invisible to the users, and won't harm the relational model).
I did this strategy before and it worked out very well in a relatively
conservative environment of people. What is key is that you tie in one
or two of the lab scientists early, first to get the views right, and
second to get a spear head on their end who are going to pull over the
less enthusiastic. Usually the one or those who devised the experiments
are most keen to assume this responsibility (as it gives them early
access to their data).
My gut feeling is that the EnsMart approach works best only if a) your
user community is huge (because then the strategy previously described
may quickly require too much support resources), and b) you can define
a set of canned ways to look at the data and retrieve results, and c)
you know that your definitions are alright and are going to cut it
because those have been the main ways people looked at the data for the
last year.
My $0.02.
-hilmar
On Wednesday, March 26, 2003, at 06:55 AM, Kris Boulez wrote:
> [ I know this question is rather off topic, but I don't know of a more
> appropriate mailing list. In the end it's about a database with
> sequences in, so it's not completely OT :) ]
>
>
> We (will) have a database that tracks the results of different
> experiments in the lab (you could call it a very mini LIMS).
> It will most probably be implemented in MySQL.
>
> The users (scientists) want to look at the data in a very 'exploratory'
> way (e.g. "given this gene, coming from experiment Y, which other genes
> have a similar pattern in experiment X and ... "). They want to query
> the data but have no predefined queries whatsoever.
>
> To allow these people to do this, my standard answer is to learn SQL,
> have a good look at the database schema and write their queries
> themself. They seem however very reluctant to do this.
>
> I can think of alternatives:
>
> - use MS Access and access the database via ODBC (their seems to be a
> relatively intuitive way of creating queries/reports in Access)
>
> - dump the tables "tab seperated" and look at them in MS Excell.
>
> - implement something like EnsMart
>
> I would however prefer a (web) environment, which lets them explore the
> data. I looked around a bit for this, but couldn't really find
> anything.
>
>
> Suggestions, comments ?
>
>
> Kris,
> --
> Kris Boulez Tel: +32-9-241.11.00
> AlgoNomics NV Fax: +32-9-241.11.02
> Technologiepark 4 email: kris.boulez at algonomics.com
> B 9052 Zwijnaarde http://www.algonomics.com/
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at open-bio.org
> http://open-bio.org/mailman/listinfo/biosql-l
>
--
-------------------------------------------------------------
Hilmar Lapp email: lapp at gnf.org
GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
-------------------------------------------------------------
More information about the BioSQL-l
mailing list