[DAS2] on local and global ids
Brian Gilman
gilmanb at pantherinformatics.com
Thu Mar 16 15:52:51 UTC 2006
Hey Guys,
Where's the latest spec and use case document? Sorry if this is a
super dumb question. I couldn't find it on the website.
Best,
-B
Andrew Dalke wrote:
>Thomas:
>
>
>>I'm not sure that DAS1 experience is a good model for this. It's true
>>that people didn't always point to well-known reference servers, but I
>>think this has more to do with the fact that people didn't know which
>>server to point to.
>>
>>
>
>I think I said there are two cases; there's actually several
>
> 1. the sources document states a well-known COORDINATES
> and makes no links to segments
> 2. the sources document refers to a well-known segments server
> ("the" reference server) and no COORDINATES
> 3. the source document has a segments document, and each segment
> listed uses URIs from "the" reference server
> 4. the server implements its own coordinates server, with
> new segment ids
> 5. When uploading a track to Ensembl there's no need to have
> either COORDINATE or segments -- the upload server can
> verify for itself that the upload uses the right ids.
>
>
>The *only* concern is with #4. Everything else uses the well-known
>global identifier for segments.
>
>
>
>>I'd still argue that the majority -- probably the vast majority -- of
>>people setting up DAS servers really just want to make an assertion
>>like "I'm annotating build NCBI35 of the human genome" and be done
>>with it.
>>
>>
>
>I'm fine with that. There are two ways to do it. #1 and #2 above.
>In theory only one of those is needed. The document can point to
>"the" reference server for NCBI 35.
>
>In practice that's not sufficient because there is no authoritative
>NCBI 35 server.
>
>Hence COORDINATES provides an abstract global identifier describing
>the reference server.
>
>
>
>> That's what the coordinate system stuff in DAS/2 is for. If this is
>>documented properly I don't think we'll see many "end-user" sites
>>setting up their own reference servers unless a) they want an internal
>>mirror of a well-known server purely for performance/bandwidth reasons
>>or b) they want to annotate an unpublished/new/whatever genome
>>assembly.
>>
>>
>
>A philosophical comment. I'm a distributed, self-organizing kinda
>guy. I don't think single root centralized systems work well when
>there are many different groups involved.
>
>I think many people will use the registry server, but not all.
>I think there will be public DAS servers which aren't in the registry.
>I know there will be in-house DAS servers which aren't.
>
>I'm just about certain that some sites will have local copies of
>the primary data. They do for GenBank, for PDB, for SWISS-PROT,
>for EnsEMBL. Why not for DAS?
>
>That said, here's a couple of questions for you to answer:
>
> a) When connecting to a new versioned source containing only
>COORDINATES data, what should the client do to get the list
>of segments, sizes, and primary sequence?
>
>I can think of several answers. My answer is that the versioned
>source should state the preferred reference server and unless
>otherwise configured a client should use that reference server
>and only that reference server.
>
>Yes, all the reference servers for that coordinate system
>are supposed to return the same results. But that's only if
>they are available. There are performance issues too, like
>low bandwidth or hosting the server on a slow machine. The
>DAS client shouldn't round-robin through the list until it
>finds one which works because that could take several minutes
>to timeout on a single server, with another 10 to try.
>
>Yes, a client can be configured and told "for coordinate
>system A use reference server Z". But that's a user
>configuration.
>
> b) If there is a local mirror of some reference server, how
>should the local DAS clients be made aware of it? (And
>should this be a supportable configuration? I think so.)
>
>I'm pretty sure that most DAS clients won't be configurable
>to look for local servers instead of global ones. Even if
>they are, I'm pretty sure each will have a different way
>to do so. Apollo and Bioperl will use different mechanisms.
>
>I have no good answer for this. It sounds like your answer
>is "people won't have local copies." I think they will.
>
>Ideas:
> - have a rewriting registry server which does a rewrite of
>the information from the other servers. But this doesn't
>work because the feature result from the remote server (in
>my scheme) is given using its local segment names. There's
>no way to go from that local name to the appropriate mirror
>reference server. This suggests that the results really do
>need to be given through global ids, with no support for
>local ones. The segments result optionally provides a way
>to resolve a global name through a local resource.
>
> - set up an HTTP proxy service for DAS requests which
>transparently detects, translates and redirects to the
>appropriate local resource. Cute, but not likely to be
>done in real life.
>
> c) A group has been working on a new genome/assembly. The
>data is annotated on local machines using DAS and DAS writeback
>Finally it's published. Do they need to rewrite all their
>segment identifiers to use the newly defined global ones?
>
>As there are only a few places where the segment identifier is
>used, and it's an interface layer, I think the conversion is
>easy. But it is a flag day event which means people don't
>want to do it. Instead, it's more likely that local people
>will set up a synonym table to help with the conversion.
>
>There are perhaps a dozen groups which might do this and they
>all have competent people. This should not be a problem.
>
> Andrew
> dalke at dalkescientific.com
>
>_______________________________________________
>DAS2 mailing list
>DAS2 at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/das2
>
>
>
>
More information about the DAS2
mailing list