Identifiers (was: [Biocorba-l] BSANE and bioCORBA)

Martin Senger senger@ebi.ac.uk
Thu, 31 May 2001 13:53:35 +0100 (BST)


> I am sure Martin or Juha can point you to a Naming
> Service doc thing...
> 
   The URL is http://www.omg.org/cgi-bin/doc?dtc/99-08-10
(I hope that it is the latest one. Philip?)

   The relevant chapter (talking about the syntax for names) is 4.5.
"Stringified names". Here are parts of the chapter:

4.5 Stringified Names

...

This specification defines a syntax for stringified names... A stringified
name represents one and only one 'name'. If two names are equal, their
stringified representations are equal (and vice-versa).

The stringified name representation reserves use of the characters "/",
"\" and ".". The forward slash "/" is a name component separator; the dot
"." separates 'id' and 'kind' fields. The backslash "\" is an escape
character...


4.5.1 Basic Representation of Stringified Names

A stringified name consists of the name components of a name separated by
a "/" character. For example, a name consisting of the components "a",
"b", and "c" (in that order) is represented as "a/b/c".

Stringified names use the "." character to separate id and kind fields in
the stringified representation. For example, the stringified name is
"a.b/c.d/. represents the name:

Index  id       kind
--------------------
0      a        b
1      c        d
2      <empty>  <empty>                       


The single "." character is the only representation of a name component
with empty id and kind fields. If a name component in a stringified name
does not contain a "." character, the entire component is interpreted as
the id field, and the kind field is empty. For example:

a/./c.d/.e

corresponds to the name:

Index   id       kind
---------------------
0       a        <empty>
1       <empty>  <empty>
2       c        d
3       <empty>  e                 

If a name component has a non-empty id field and an empty kind field, the
stringified representation consists only of the id field. A trailing "."
character is not permitted.

4.5.2 Escape Mechanism

The backslash "\" character escapes the reserved meaning of "/", ".", and
"\" in a stringified name. The meaning of any other character following a
"\" is reserved for future use.

NameComponent Separators

If a name component contains a "/" slash character, the stringified
representation uses the "\" character as an escape. For example, the
stringified name "a/x\/y\/z/b" represents the name consisting of the name
components "a", "x/y/z", and "b".

Id and kind Fields

The backslash escape mechanism is also used for ".", so id and kind fields
can contain a literal ".". To illustrate, the stringified name
"a\.b.c\.d/e.f" represents the name:

Index   id    kind
------------------
0       a.b   c.d
1       e     f
                     
The Escape Character

The escape character "\" must be escaped if it appears in a name
component. For example, the stringified name "a/b\\/c" represents the name
consisting of the components "a", "b\", and "c".

--- end of the quoting ---

Our suggestion how to use the syntax described above for sequence
identifiers (and others) weas best descrtibed by Philip in
"Genomic Map Draft Adopted Specification"
(http://www.omg.org/cgi-bin/doc?dtc/99-12-01). Here is the relevant
excerp:

   - Names can refer to collections of entities (such as databases), or to
entities within such collections. Names referring to collections consist
of exactly one component; names referring to entities within collections
consist of at least two components.

   - The first component represents the data source. Data sources
can be anything: transient collections, local databases, public
repositories, etc. It is up to the implementation to document the accepted
names for the data source.

   - The empty name ( . ) is valid for the first component, and represents
the local or default collection. It is up to the implementation to
document what the semantics of local or default is.

   - Names that refer to entities within collections consist of two or
more components. The second component of such names represents an
identifier that is unique in the context of the data source. No empty
id-fields are allowed in this or any further components.

   - If two components are not enough to uniquely identify an entity, an
Identifier can contain more than two components, but no more than
necessary to make the identification unique. That is, an Identifier may
not be used to freely attach textual information.

   - The only characters valid in a component are a through z , 0 through
9 , and - (hyphen), _ (under_score), $ and .  (period). Use of the latter
is discouraged since it has a special meaning in the stringifying
convention, and has therefore to be escaped.

--- end of quoting -- (but there is more there...)

   Martin


-- 
Martin Senger

EMBL Outstation - Hinxton                Senger@EBI.ac.uk     
European Bioinformatics Institute        Phone: (+44) 1223 494636      
Wellcome Trust Genome Campus             (Switchboard:     494444)
Hinxton                                  Fax  : (+44) 1223 494468
Cambridge CB10 1SD
United Kingdom                           http://industry.ebi.ac.uk/~senger