[Bioperl-l] Re: [Open-bio-l] seq namespace method

David Block dblock@gnf.org
Tue, 9 Jul 2002 11:06:08 -0700


I hate Outlook.  See my comments interspersed below.

--
David Block                    dblock@gnf.org
GNF - San Diego, CA        http://www.gnf.org     
Genome Informatics  /  Enterprise Programming

> -----Original Message-----
> From: Matthew Pocock [mailto:matthew_pocock@yahoo.co.uk]
> Sent: Tuesday, July 09, 2002 5:08 AM
> To: Hilmar Lapp
> Cc: OBDA BioSQL (E-mail); BioPerl (E-mail)
> Subject: [Bioperl-l] Re: [Open-bio-l] seq namespace method
> 
> 
> Hi Hilmar,
> 
> Glad to see that someone is looking over the BioSql stuff again. Just 
> some random thoughts.
> 
> Can a sequence belong to more than one namespace? 

A sequence can (ATTCGAATTCG...).  An annotated BioEntry cannot.  I think that's where we're ending up.

> It depends what you 
> want the namespace to mean. For example, if you had five sequence 
> databases, one shadowed from persistant stoorage (indexed flat, sql, 
> corba, whatever) and four in-memory databases all with names 
> (genbank, 
> my interesting sequences, blast hits... ) then a single 
> sequence object 
> could be in all of them. If the namespace is meant to 
> represent the name 
> of the collection it is part of, 

We want it to represent the collection it originated from.  All other databases will reference the original.

this becomes ambiguous. The sql has 
> this where it is because you need somewhere for the 'sequence 
> is part of 
> database' relation. In sql, this goes with the sequence. In oopy 
> collections, this goes in the database. If the namespace is some 
> meta-data about the publisher, then the rich sequence could 
> have a slot 
> for this in the interface or as a well known type of 
> annotation (which 
> may be the same object as a database uses to publish its meta 
> data). 

Yes, I think that's what we want.

> Do 
> you want to be able to use some sequence ID in conjunction with the 
> namespace to re-fetch the sequence at a later time? 

Yes.

> If so, how much 
> information would you need to store, and how much is discovered at 
> sequence-resolution-time? Are namespaces independant (or potentialy 
> independant) of where the sequence was fetched from? 

Yes.  That is the whole point of stable ids - it should not matter
where you get the id from, it will always be the same object you
get back.

How does this 
> relate to the bio-directorys stuff?

Bio-directories?  Sorry, I don't know what you're talking about.
Is that another bio-* project?

> 
> Bootom line:
>    what does namespace mean?
>    is this best represented at the level of the sequence or 
> the sequence 
> collection?

Sequence collections are curated and assembled by someone, who then
assigns stable ids to individual sequences.  These ids may be duplicated
across databases, but the combination of namespace and id should 
unambiguously point to the same object at all times, forever.  Versioning
should take care of changes in the sequence and/or annotation.

That's what we want, anyway.

>    are you re-inventing URNs / naming and directory / name resolvers?
> 
> Matthew
> > _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>