[DAS2] query language description
chris mungall
cjm at fruitfly.org
Mon Mar 20 23:45:46 UTC 2006
I guess things need to be left open for a DAS/3...
On Mar 20, 2006, at 9:32 AM, Lincoln Stein wrote:
> The current filter query language, which provides one level of ANDs
> and a
> nested level of ORs, satisfies our use cases. It is not clear to me
> what
> additional benefit we'll get from a composable query language. Note
> that none
> of the popular and functional genome information sources -- NCBI, UCSC,
> Ensembl or BioMart -- offer a composable query language, and there
> does not
> seem to be rioting on the streets!
>
> Lincoln
>
>
> On Friday 17 March 2006 19:20, chris mungall wrote:
>> On Mar 16, 2006, at 6:05 PM, Andrew Dalke wrote:
>>>> right now they are forced bypass the constraint language and go
>>>> direct
>>>> to SQL.
>>>
>>> In addition, we provide defined ways for a server to indicate
>>> that there are additional ways to query the server.
>>
>> I was positing this as a bad feature, not a good one. or at least a
>> symptom of an incorrectly designed system (at least in the case of the
>> GO DB API - it may not carry forward to DAS - though if you're going
>> to
>> allow querying by terms...)
>>
>>>> None of these really lit into the DAS paradigm. I'm guessing you
>>>> want
>>>> something simple that can be used as easily as an API with get-by-X
>>>> methods but will seamlessly blend into something more powerful. I
>>>> think what you have is on the right lines. I'm just arguing to make
>>>> this language composable from the outset, so that it can be extended
>>>> to whatever expressivity is required in the future, without bolting
>>>> on
>>>> a new query system that's incompatible with the existing one.
>>>
>>> We have two ways to compose the system. If the simple query language
>>> is extended, for example, to support word searches of the text field
>>> instead of substring searches, then a server can say
>>>
>>> <CAPABILITY type="features"
>>> query_uri="http://somewhere.over.rainbow/server.cgi">
>>> <SUPPORTS name="word-search"/>
>>> </CAPABILITY>
>>>
>>> This is backwards compatible, so the normal DAS queries work. But
>>> a client can recognize the new feature and support whatever new
>>> filters
>>> that 'word-search' indicates, eg
>>>
>>> http://somewhere.over.rainbox/server.cgi?note-wordsearch=Andre*
>>>
>>> (finds features with notes containing words starting with 'Andre' )
>>>
>>> These are composable. For example, suppose Sanger allows
>>> modification
>>> date searches of curation events. Then it might say
>>>
>>> <CAPABILITY type="features"
>>> query_uri="http://somewhere.over.rainbow/server.cgi">
>>> <SUPPORTS name="word-search"/>
>>> <SUPPORTS name="sanger-curation"/>
>>> </CAPABILITY>
>>
>> so this is limited to single-argument search functions?
>>
>>> and I can search for notes containing words starting with "Andre"
>>> which were modified by "dalke" between 2002 and 2005 by doing
>>>
>>> http://somewhere.over.rainbox/server.cgi?note-wordsearch=Andre*&
>>> modified-by=dalke&modified-before=2005&modified-after=2002
>>
>> but the compositionality is always associative since the CGI parameter
>> constraint forbids nesting
>>
>>> An advantage to the simple boolean logic of the current system
>>> is that the GUI interface is easy, and in line with existing
>>> simple search systems.
>>
>> there's nothing preventing you from implementing a simple GUI on top
>> of
>> an expressive system - there is nothing forcing you to use the
>> expressivity
>>
>>> If someone wants to implement a new search system which is
>>> not backwards compatible then the server can indicate that
>>> alternative with a new CAPABILITY. Suppose Thomas at Sanger
>>> comes up with a new search mechanism based on an object query
>>> language he invented,
>>>
>>> <CAPABILITY type="down-oql"
>>> query_uri="http://sanger.ac.uk/oql-search" />
>>>
>>> The Sanger and EBI clients might understand that and support
>>> a more complex GUI, eg, with a text box interface. Everyone
>>> else must ignore unknown capability types.
>>
>> but this doesn't integrate with the existing query system
>>
>>> Then that would be POSTED (or whatever the protocol defines)
>>> to the given URL, which returns back whatever results are
>>> desired.
>>>
>>> Or the server can point to a public MySQL port, like
>>>
>>> <CAPABILITY type="mysql-connection"
>>> query_uri="mysql://username:password@hostname:port/databasename"
>>> />
>>>
>>> That's what you are doing to bypass the syntax, except that
>>> here it isn't a bypass; you can define the new interface in
>>> the DAS sources document.
>>>
>>>> The generic language could just be some kind of simple
>>>> extensible function syntax for search terms, boolean operators,
>>>> and some kind of (optional) nesting syntax.
>>>
>>> Which syntax? Is it supposed to be easy for people to write?
>>> Text oriented? Or tree structured, like XML, or SQL-like?
>>
>> I'd favour some concrete asbtract syntax that looks much like the
>> existing DAS QL
>>
>>> And which clients and servers will implement that search
>>> language?
>>
>> all servers. clients optional
>>
>>> If there was a generic language it would allow
>>> OR("on segment Chr1 between 1000 and 2000",
>>> "on segment ChrX between 99 and 777")
>>> which is something we are expressly not allowing in DAS2
>>> queries. It doesn't make sense for the target applications
>>> and by excluding it it simplifies the server development,
>>> which means less chance for bugs.
>>
>> this example is pointless but it's easy to imagine plenty of ontology
>> term queries or other queries in which this would be useful
>>
>> I guess I depart from the normal DAS philosophy - I don't see this
>> being a high barrier for entry for servers, if they're not up to this
>> it'll probably be a buggy hacky server anyway
>>
>>> Also, I personally haven't figured out a decent way to
>>> do a GUI composition of a complex boolean query which is
>>> as easy as learning the query language in the first place.
>>
>> doesn't mean it doesn't exist.
>>
>> i'm not sure what's hard about having say, a clipboard of favourite
>> queries, then allowing some kind of drag-and-drop composition
>>
>>> A more generic language implementation is a lot of overhead
>>> if most (80%? 90%) need basic searches, and many of the
>>> rest can fake it by breaking a request into parts and
>>> doing the boolean logic on the client side.
>>
>> this is always an option - if the user doesn't mind the additional
>> possibly very high overhead. it's just a little bit of a depressing
>> approach, as if Codd's seminal paper from 1970 or whenever it was
>> never
>> happened.
>>
>>> Feedback I've heard so far is that DAS1 queries were
>>> acceptable, with only a few new search fields needed.
>>>
>>>> hmm, not sure how useful this would be - surely you'd want something
>>>> more dasmodel-aware?
>>>
>>> The example I gave was a bad one. What I meant was to show
>>> how there's an extension point so someone can develop a new
>>> search interface and clients can know that the new functionality
>>> exists, without having to change the DAS spec.
>>
>> ok
>>
>> that's probably all I've got to say on the matter, sorry for being
>> irksome. I guess I'm fundamentally missing something, that is, why
>> wrap
>> simple and expressive declarative query languages with limited ad-hoc
>> constraint systems with consciously limited expressivity and limited
>> means of extensibility..
>>
>> cheers
>> chris
>>
>>> Andrew
>>> dalke at dalkescientific.com
>>>
>>> _______________________________________________
>>> DAS2 mailing list
>>> DAS2 at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/das2
>>
>> _______________________________________________
>> DAS2 mailing list
>> DAS2 at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/das2
>
> --
> Lincoln D. Stein
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> FOR URGENT MESSAGES & SCHEDULING,
> PLEASE CONTACT MY ASSISTANT,
> SANDRA MICHELSEN, AT michelse at cshl.edu (516 367-5008)
More information about the DAS2
mailing list