[DAS2] on local and global ids

Thu Mar 16 15:52:51 UTC 2006

Hey Guys,

    Where's the latest spec and use case document? Sorry if this is a 
super dumb question. I couldn't find it on the website.

                               Best,

                                        -B

Andrew Dalke wrote:

>Thomas:
>  
>
>>I'm not sure that DAS1 experience is a good model for this.  It's true 
>>that people didn't always point to well-known reference servers, but I 
>>think this has more to do with the fact that people didn't know which 
>>server to point to.
>>    
>>
>
>I think I said there are two cases; there's actually several
>
>  1. the sources document states a well-known COORDINATES
>       and makes no links to segments
>  2. the sources document refers to a well-known segments server
>       ("the" reference server) and no COORDINATES
>  3. the source document has a segments document, and each segment
>      listed uses URIs from "the" reference server
>  4. the server implements its own coordinates server, with
>      new segment ids
>  5. When uploading a track to Ensembl there's no need to have
>      either COORDINATE or segments -- the upload server can
>      verify for itself that the upload uses the right ids.
>
>
>The *only* concern is with #4.  Everything else uses the well-known
>global identifier for segments.
>
>  
>
>>I'd still argue that the majority -- probably the vast majority -- of 
>>people setting up DAS servers really just want to make an assertion 
>>like "I'm annotating build NCBI35 of the human genome" and be done 
>>with it.
>>    
>>
>
>I'm fine with that.  There are two ways to do it.  #1 and #2 above.
>In theory only one of those is needed.   The document can point to
>"the" reference server for NCBI 35.
>
>In practice that's not sufficient because there is no authoritative
>NCBI 35 server.
>
>Hence COORDINATES provides an abstract global identifier describing
>the reference server.
>
>  
>
>>  That's what the coordinate system stuff in DAS/2 is for.  If this is 
>>documented properly I don't think we'll see many "end-user" sites 
>>setting up their own reference servers unless a) they want an internal 
>>mirror of a well-known server purely for performance/bandwidth reasons 
>>or b) they want to annotate an unpublished/new/whatever genome 
>>assembly.
>>    
>>
>
>A philosophical comment.  I'm a distributed, self-organizing kinda
>guy.  I don't think single root centralized systems work well when
>there are many different groups involved.
>
>I think many people will use the registry server, but not all.
>I think there will be public DAS servers which aren't in the registry.
>I know there will be in-house DAS servers which aren't.
>
>I'm just about certain that some sites will have local copies of
>the primary data.  They do for GenBank, for PDB, for SWISS-PROT,
>for EnsEMBL.  Why not for DAS?
>
>That said, here's a couple of questions for you to answer:
>
>   a) When connecting to a new versioned source containing only
>COORDINATES data, what should the client do to get the list
>of segments, sizes, and primary sequence?
>
>I can think of several answers.  My answer is that the versioned
>source should state the preferred reference server and unless
>otherwise configured a client should use that reference server
>and only that reference server.
>
>Yes, all the reference servers for that coordinate system
>are supposed to return the same results.  But that's only if
>they are available.  There are performance issues too, like
>low bandwidth or hosting the server on a slow machine.  The
>DAS client shouldn't round-robin through the list until it
>finds one which works because that could take several minutes
>to timeout on a single server, with another 10 to try.
>
>Yes, a client can be configured and told "for coordinate
>system A use reference server Z".  But that's a user
>configuration.
>
>   b) If there is a local mirror of some reference server, how
>should the local DAS clients be made aware of it?  (And
>should this be a supportable configuration? I think so.)
>
>I'm pretty sure that most DAS clients won't be configurable
>to look for local servers instead of global ones.  Even if
>they are, I'm pretty sure each will have a different way
>to do so.  Apollo and Bioperl will use different mechanisms.
>
>I have no good answer for this.  It sounds like your answer
>is "people won't have local copies."  I think they will.
>
>Ideas:
>   - have a rewriting registry server which does a rewrite of
>the information from the other servers.  But this doesn't
>work because the feature result from the remote server (in
>my scheme) is given using its local segment names.  There's
>no way to go from that local name to the appropriate mirror
>reference server.  This suggests that the results really do
>need to be given through global ids, with no support for
>local ones.  The segments result optionally provides a way
>to resolve a global name through a local resource.
>
>   - set up an HTTP proxy service for DAS requests which
>transparently detects, translates and redirects to the
>appropriate local resource.  Cute, but not likely to be
>done in real life.
>
>   c) A group has been working on a new genome/assembly.  The
>data is annotated on local machines using DAS and DAS writeback
>Finally it's published.  Do they need to rewrite all their
>segment identifiers to use the newly defined global ones?
>
>As there are only a few places where the segment identifier is
>used, and it's an interface layer, I think the conversion is
>easy.  But it is a flag day event which means people don't
>want to do it.  Instead, it's more likely that local people
>will set up a synonym table to help with the conversion.
>
>There are perhaps a dozen groups which might do this and they
>all have competent people.  This should not be a problem.
>
>					Andrew
>					dalke at dalkescientific.com
>
>_______________________________________________
>DAS2 mailing list
>DAS2 at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/das2
>
>
>  
>