[DAS2] query language description
Andrew Dalke
dalke at dalkescientific.com
Thu Mar 16 23:47:58 EST 2006
Updated:
- added 'note' as a query field
- changed string searches to substring (not word) searches
and made them be case insensitive
"AB" matches only the strings "AB", "Ab", "aB" and "ab"
"*AB" matches only fields which exactly end with
"AB", "ab", "aB", and "Ab"
"AB*" matches only fields which exactly match, up to case
"*AB*" matches only fields which contain the substring,
up to case
- added 'coordinates' search
- removed 'type' and renamed 'exacttype' to 'type'
- removed 'contains' search, which no one said they wanted. Instead,
supporting (EXPERIMENTAL) an 'excludes' search.
==================================
The query fields are
name | takes | matches features ...
==========================
xid | URI | which have the given xid
type | URI | with exactly the given type
segment | URI | on the given segment
coordinates | URI | which are part of the given coordinate system
overlaps | region | which overlap the given region
excludes | region | which have no overlap to the given region
inside | region | which are contained inside the given region
name | string | with a title or alias which matches the given
string
note | string | with a note which matches the given string
prop-* | string | with the property "*" matching the given string
Queries are form-urlencoded requests. For example, if the features
query URL is 'http://biodas.org/features' and there is a segment named
'http://ncbi.org/human/Chr1' then the following is a request for all the
features on the first 10,000 bases of that segment
The query is for
segment = 'http://ncbi.org/human/Chr1'
overlaps = 0:10000
which is form-urlencoded as
http://biodas.org/features?
segment=http%3A%2F%2Fncbi.org%2Fhuman%2FChr1;overlaps=0%3A1000
Multiple search terms with the same key are OR'ed together. The
following
searches for features containing the name or alias of either
BC048328 or BC015400
http://biodas.org/features?name=BC048328;name=BC015400
The 'excludes' search is an exception. See below.
Multiple search terms with different keys are AND'ed together,
but only after doing the OR search for each set of search terms with
identical keys. The following searches for features which have
a name or alias of BC048328 or BC015400 and which are on the segment
http://ncbi.org/human/Chr1
http://biodas.org/features?name=BC048328;
segment=http%3A%2F%2Fncbi.org%2Fhuman%2FChr1;name=BC015400
The order of the search terms in the query string does not affect
the results.
If any part of a complex feature (that is, one with parents
or parts) matches a search term then all of the parents and
parts are returned. (XXX Gregg -- is this correct? XXX)
The fields which take URLs require exact matches, that is, a
character by character match. (For details on the nuances of
comparing URIs see http://www.textuality.com/tag/uri-comp-3.html )
(We don't have an ontology URI yet, and when we do we can add
an 'ontology' query.)
The segment query filter takes a URI. This must accept
the segment URI and, if known to the server, the equivalent
reference identifier for the segment.
If range searches are given then one and only one segment
must be given. If there are multiple segment queries then
ranges are not allowed.
The string searches may be exact matches, substring, prefix
or suffix searches. The query type depends on if the search
value starts and/or ends with a '*'.
ABC -- field exactly matches "ABC"
*ABC -- field ends with "ABC"
ABC* -- field starts with "ABC"
*ABC* -- field contains the substring "ABC"
The "*" has no special meaning except at the start or end
of the query value. The search term "***" will match a
field which contains the character "*" anywhere. There
is no way to match fields which exactly match '*' or
which only start or end with that character.
Text searches are case-insensitive. The string "ABC"
matches "abc", "aBc", "ABC", etc.
A server may choose to collapse multiple whitespace
characters into a single space character for search purposes.
For example, the query "*a newline*" should match
"This is a line of text which contains a
newline"
The 'name' search does a text search of the 'title' and 'alias'
fields.
The "prop-*" is shorthand for a class of text searches of
<PROP> elements. Features may have properties, like
<PROP key="cellular_component" value="membrane" />
To do a string search for all 'membrane' cellular components,
construct the query key by taking the string "prop-" and
appending the property key text ("cellular_component"). The
query value is the text to search for, in this case:
prop-cellular_component=membrane
To search for any cellular_component containing the substring
"membrane"
prop-cellular_component=*membrane*
The rules for multiple searches with the same key also apply to the
prop-* searches. To search for all 'membrane' or 'nuclear'
cellular components, use two 'prop-cellular_component' terms, as
http://biodas.org/features?prop-cellular_component=membrane;prop-
cellular_component=membrane
The range searches are defined with explicit start and end
coordinates. The range syntax is in the form "start:end", for
example, "1:9". There is no way to restrict the search to
a specific strand.
A feature may have several locations. An annotation may
have several features in a parent/part relationship. The
relationship may have several levels. If a range search
matches any feature in the annotation then the search
returns all of the features in the annotation.
An 'overlaps' search matches if and only if any feature
location of any of the parent or part overlaps the query
range and segment.
An 'inside' search matches if and only if at least one
feature in the annotation has a location on the query segment
and all features which have a location on the query segment
have at least one location which starts and ends in the
query range.
EXPERIMENTAL: An 'excludes' matches if and only if at
least one feature of the annotation is on the query segment
and no features are in the query range. This is the
complement of the 'overlaps' search, for annotations on
the same query segment.
Unlike the other search keys, if there multiple 'excludes'
searches then the results are AND'ed together. That is,
if the query is has two excludes ranges
segment=ChrX excludes=RANGE1 excludes=RANGE2
then the result are those features which on ChrX which
are not in RANGE1 and are not in RANGE2.
Andrew
dalke at dalkescientific.com
More information about the DAS2
mailing list