[Bioperl-l] identifier interface

Matthew Pocock matthew_pocock@yahoo.co.uk
Wed, 17 Jul 2002 22:05:19 +0100


Hi Lincoln,

Sory - probably not being crystal clear. I totaly get that identifiers 
can usefully be seperate from locators, and completely agree that 
resolution of identifiers to resources should be done by external code. 
What I was saying is:

Do you want to make all identifiers in BioPerl conform to the LSID spec?

What if an ID provider (someone producing a BioPerl object with an 
identifier) wants to use some other form of ID e.g.:

   * database URN, table name, unique key
   * some custom URN
   * emboss ID
   * LDAP path

These are just some silly ideas. No doubt real implementors will want to 
use even funkier info to locate or uniquely identify their resources. 
Nearly every effective naming scheim I have ever seen has been 
hierachial (like file paths, domain names, LDAP). So, does it make sence 
to expect all ID providers to fit their identifying info into an 
LSID-shaped object, or should the only contracts on Identifier be that:

   a) they can be losslessly read from/written to a string so that you 
can serialize them

   b) when fed to the Identifier resolving machinery, they can be used 
to retrieve the referent they identify

The resolver can contain a hash or some code that effectively does the 
switch/case/if_else on Identifier implementation class and uses the 
appropreate factory to regenerate referents. It's no more scary than 
resolving ftp, http, mail to different URL handler classes.

I'm just a bit worried that we are going to over-constrain what legal 
identifiers are and make it hard for people to use the general framework 
for something that is less than a 95% match to what we originaly 
envisioned IDs being.

Is that any clearer?

Matthew

Lincoln Stein wrote:
> Matt, could you clarify what you are asking?
> 
> It is important to separate the concept of an identifier (and its correlates, 
> the identifier namespace and the collection of identifiers), from the 
> mechanism for resolving an identifier and regenerating its referent.  The 
> analogous situation is domains names, in which there is insufficient 
> information to resolve a domain name into a IP address, and a separate 
> protocol, the DNS, is called for.  The nice thing about the LSID format is 
> that the details of what goes into the object identifier field is left up to 
> the naming authority, and so it can carry whatever information is necessary 
> for the naming authority to resolve it.
> 
> There is a separate protocol for resolving LSIDs into resources, which the I3C 
> is working on.  The draft that I looked at was pretty vague.
> 
> Lincoln
>