[Bioperl-l] bioperl-db: rich queries implementation

Chris Mungall cjm@fruitfly.bdgp.berkeley.edu
Mon, 11 Jun 2001 14:59:19 -0700 (PDT)


So I finally checked out the bioperl-db code with the intention of having
a good look through it during a long flight, and maybe doing some airport
coding in the best bioperl tradition.

Well, after a whole bunch of delays, cancellations and reroutes 
the quick look through the code had turned into the implementation
of a rich/complex query framework for bioperl-db.

Now I would have liked to consult the rest of you bioperlers on some
design issues before wading in there, but I got a little carried away and
you were all a bit inaccessible what with me being up in the air over
another continent. I know some of you have been thinking hard about some
this so I hope our ideas don't clash too badly.

Here's the basic idea, starting with some new classes:

AbstractQuery - parent class for representing some kind of query
SqlQuery      - represents a single sql statement
BioQuery      - represents a highlevel biological query
QueryConstraint - composite object (like the design pattern) for
                  constraining query

the BioQuery was inspired by Lincoln & Ravi's ideas on the das list. The
idea is that it provides some intuitive object-y query that can be applied
to a whole bunch of different schemas. The Adaptors transform the BioQuery
into one or more SqlQuerys, execute them and return objects. (Later we
could have a generic query resolver that decides which adaptor(s) to use)

in addition to the above classes, i have added a fetch_by_query() method
to SeqAdaptor. I've only included a few possible constraints just to demo
the system, but it is now possible to do stuff like

"fetch Seq.* from Seq where (species=Drosophila virilis AND
(references=*transcription factor* OR keyword=*transcription factor*)"

which is much easier than doing the equivalent sql; and things will get
much more fun once we add to add some queryable constraints to the
SeqFeatureAdaptor

The BioQuery language itself still has to be designed; right now you have
to build the BioQuery object yourself.

Also, right now the adaptor will just return the object(s) bare bones,
with remaining attributes fetched on demand. It would be nice to have the
ability to specify what should be fetched upfront with the "select ..."
portion of the BioQuery - or to use the BioQuery in a non OO context eg
"select Seq.primary_seq.seq, Seq.seq_feature.name"

Should I just go ahead and cvs commit all this so you can see it for
yourselves? It shouldn't interfere with any existing code. Or would it be
better to use a seperate branch? - undoubtedly there will be a lot of
refactoring before the code settles down. Maybe these should even form a
different set of adaptors (or rather mixin classes for the existing
adaptors)?

Cheers,
Chris

On Wed, 30 May 2001, Ewan Birney wrote:

> On Tue, 29 May 2001, Kris Boulez wrote:
> 
> > I've been using bioperl-db for the last few weeks and it made part of my
> > life a lot easier. I was wondering how hard it would be to use/access a
> > different database (which might have a different schema). In this
> > approach bioperl-db would be a middle layer betweern different sequence
> > databases.
> > What would it need other then having a replacement for
> > Bio::EnsemblLite::UpdateableDB and changing the sql statements in the
> > different Bio::DB::SQL adaptors.
> 
> This is the aim of the adaptors. In fact, one could parameterise the Root
> DBAdaptor to be able to set "your own" Sequence etc Adaptor whih complied
> to the same Adaptor interface, allowing the "business
> object" (Bio::DB::Seq) to remain the same.
> 
> (Bio::EnsemblLite::UpdateableDB is a defunt module I thought - it
> certainly does not work with the Bio::DB::SQL code so far)
> 
> (I think in reality changing one adaptor is likely to force a change of
> all the adaptors, so an entire new code base will be written)
> 
> 
> If this set up is too restrictive, one could aim to use the same basic
> approach to link to a different SQL schema that implemented the
> Bio::DB::RandomAccessI interface and therefore gave out Bio::SeqI
> 
> I think it would be good to think of a 
> 
> Bio::DB::QueryableAccessI interface to provide a generic, richer query
> langauge we could then layer ontop of database implementations. We could
> have a stab at this
> 
> 
> all in all --- lots of options. What is your immeadaite problem you are
> trying to solve
> 
> 
> > 
> > Such a setup would allow people to access (sequence) databases in a big
> > organization to which they only have access via SQL.
> > This might become a poor mans version of IBM's DiscoveryLink
> > (http://www-3.ibm.com/solutions/lifesciences/discovery.html)
> > 
> > Kris,
> > 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@bioperl.org
> > http://bioperl.org/mailman/listinfo/bioperl-l
> > 
> 
> -----------------------------------------------------------------
> Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
> <birney@ebi.ac.uk>. 
> -----------------------------------------------------------------
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>