[Bioperl-l] RE: [Open-bio-l] seq namespace method

Hilmar Lapp hlapp@gnf.org
Tue, 9 Jul 2002 09:58:01 -0700


> -----Original Message-----
> From: Matthew Pocock [mailto:matthew_pocock@yahoo.co.uk]
> Sent: Tuesday, July 09, 2002 5:08 AM
> To: Hilmar Lapp
> Cc: OBDA BioSQL (E-mail); BioPerl (E-mail)
> Subject: Re: [Open-bio-l] seq namespace method
> 
[...] 
> If the namespace is some 
> meta-data about the publisher, then the rich sequence could 
> have a slot 
> for this in the interface or as a well known type of 
> annotation

I've actually briefly thought about this. Presently, Biodatabase is one of the few entities (apart from association tables) in BioSQL which do not have an equivalent class in Bioperl. Generally, this should raise eyebrows as it indicates possible mistakes. For Biodatabase, I tend to think there is a mistake on the Bioperl end not to have a class representing this information. So, yes, I'm inclined to create a new class for this and have PrimarySeqI (not RichSeqI) have a slot for an instance of this class. I personally think 1-1 is good enough for the use cases we'd be faced with, so others should weigh in with arguments if we need an array here (which is /not/ BioSQL compatible though).

What do others think?

Another example is Seqfeature_Source. I tend to think it's a mistake of BioSQL to list this as its own entity; it should either go into a unique key, or be a qualifier/value assocation. There may have been other reasons though to create an entity for it.

> (which 
> may be the same object as a database uses to publish its meta 
> data). Do 
> you want to be able to use some sequence ID in conjunction with the 
> namespace to re-fetch the sequence at a later time?

yes, basically that would be the use case. Also, to maintain some meta-data about sequence collections.

> If so, how much 
> information would you need to store, and how much is discovered at 
> sequence-resolution-time? Are namespaces independant (or potentialy 
> independant) of where the sequence was fetched from? How does this 
> relate to the bio-directorys stuff?

They should be potentially independent I would think. Depends on your use case. I'm not in the position to comment on the bio-directory stuff.

> 
> Bootom line:
>    what does namespace mean?

The namespace in which the accession number of any sequence associated with the namespace is valid and unique. Could be made up, or could (many times will) refer to an actual databank where the sequence was obtained from. If you maintain your own sequence collections, however, you would make up the name.

>    is this best represented at the level of the sequence or 
> the sequence 
> collection?

Namespace is the name of the sequence collection.

>    are you re-inventing URNs / naming and directory / name resolvers?

No. An actual naming and directory server has to come from elsewhere (I think).

	-hilmar