From Steven.VanVooren at esat.kuleuven.ac.be Thu Sep 16 17:46:43 2004 From: Steven.VanVooren at esat.kuleuven.ac.be (Steven Van Vooren) Date: Thu Sep 16 17:46:41 2004 Subject: [DAS] Setting up DAS with ensembl plugin In-Reply-To: Message-ID: Dear list, I am using a dazzle server to feed human genome annotations from a mysql database. It suddenly stopped working. I upgraded to Dazzle 1.01 and changed my config file, to no avail. Any pointers would be welcome. 1. On ensembldb.ensembl.org, the core tables for homo sapiens are currently at 24_34e. I changed my config file accordingly. 2. I see the sanger das reference server seems to feed ensembl1834. I changed the mapmaster key accordingly: Initialisation keeps failing, though: DazzleServerMain: init org.biojava.servlets.dazzle.DazzleServlet: Error initializing installation org.biojava.servlets.dazzle.datasource.DataSourceException: Couldn't instantiate data source ensembl2434e_bridge at org.biojava.servlets.dazzle.datasource.BasicDazzleInstallation.parseConfigFi le(BasicDazzleInstallation.java:129) Below is my current config file. Can anyone point out what's wrong? Thanks, Steven From dalke at dalkescientific.com Tue Sep 21 21:25:19 2004 From: dalke at dalkescientific.com (Andrew Dalke) Date: Tue Sep 21 21:24:48 2004 Subject: [DAS] Re: das/2 proposal status In-Reply-To: <200409151528.16053.lstein@cshl.edu> References: <200409151528.16053.lstein@cshl.edu> Message-ID: <43E0DE26-0C36-11D9-90EE-000A956826C8@dalkescientific.com> Hi Lincoln and others, Lincoln: > Here's the latest version of the "request" portion of the DAS/2 spec, > recently converted into HTML. I haven't proofread the HTML yet; any > help you can render would be appreciated. I've gone over the proposal. Here are some of the things I've noticed. When going over the spec I'm trying to keep a few things in mind. - everything that can be a relative URL may instead be an absolute URL pointing to another machine - ReST requires that the architecture not depend on the actual URL hierarchy. Eg, links aren't made by knowing to add "sequence/" to get the sequence data but instead is found as the result of a previous request - Reversing that, the program shouldn't figure out what a URL does by analyzing it. That said, the hierarchical structure is fine. It's meant both for humans and as a way to minimize the size of links. And except in a few small (and fixable) places the spec doesn't restrict the server to use a hierarchy. > The HTML fragment notation "#" is never used. except in references to external URLs, like the GO link to http://song.sourceforge.net/ontologies/sofa#tRNA > In addition to the standard HTTP response headers, DAS servers > return the following HTTP headers: > ? X-DAS-Version: DAS/2.0 > ? X-DAS-Status: XXX status code How much of that can be moved into the standard HTTP error codes, possibly with a parseable error message? There are a few advantages to them. * You can tell if the server uses the DAS/1 or DAS/2 API. An alternate solution is to put a version string in the response from the server. Perhaps ? * A client can dispatch on the status codes without having to parse the payload. But 1) clients need to know how to handle other HTTP error codes (like 404 'Not Found', or 403 'Forbidden'). 2) there's overlap between some of the codes and HTTP codes -- shouldn't HTTP error 400 'Bad Request' be sent when DAS-Status of 4** is sent? 3) HTTP codes are more complete; 405 'Method Not Allowed' for when someone does a PUT on a server that doesn't allow PUT, or 423 'Locked' (from RFC 2518). * The Status codes are more specific than the HTTP codes. I think the answer there is to include a more detailed error message in the HTTP payload. My final reason is that it allows someone to implement major portions of the DAS/2 interface with flat-files served by a stock Apache. That should help a lot with making a test suite -- I could even use file URLs and do without any web server. > http://server/das-genome > > List of data sources maintained by server "server." > The URL as a whole acts as a unique identifier > for this DAS/2 server. This is limiting. 'server' usually means hostname, or hostname + port. I see no reason to prohibit the main entry point from being http://www.example.com/~dalke/my-servers/SantaFe or limiting the connection to only http. Why not also allow, say, https? https://www.example.com/secure/das-genome One worry I see, btw, is the difference between http://server/das-genome and http://server/das-genome/ I haven't been careful in checking all the uses of "/" vs. no "/". From what I've seen it's fine, in large part because of the xml:base use. > Two formats are supported: a verbose XML format of > type application/x-das-source, and a compact ... Later on in the editing I'll point out more examples like this where the language is descriptive instead of prescriptive. Must servers implement both formats? Or can they support neither and use a 3rd mechanism? > xmlns="http://www.biodas.org/ns/das-genome/2.00" Probably should be "2.0" to match the server version > > ... > > ... > In general the spec doesn't say which fields are required and which are optional. Will we be using DTDs or some other schema for this? In either case, based on the experience with the DAS/1 DTDs they didn't seem that useful. I built my parser on them and had to correct various typos in them. My parser was validating and it ended up failing when used against servers with extensions. > The version column is any sequence of characters excluded > tab and newline In general the word 'character' needs to be made more specific. I think you mean "printable ASCII character" as compare to "Unicode character." The restrictions on the source URL and source version fields need to be propagated back to the XML names. That is, it should be illegal to have Also, in the XML the 'id' and 'version' fields are both resolvable URLs relative to the xml:base. You have In the flatfile example you have volvox 1 V. volvulus ... That should likely be volvox volvox/1 V. volvulus ... Because those can be arbitrary URLs the following should be allowed which would be written http://cshl.edu/das2/volvox http://dalkescientific.com/das2/volvox/1 V. volvulus ... This exact case isn't likely but it should be allowed. > By adding the version to the end of the path, the URL > becomes an identifier for the versioned data source. Retrieving You many times use the language of string concatenation to describe how to fully expand a URI in the context of a base url. Since it may be an absolute URL, I ask there be some other language instead. However, I don't know what that word would be. > Fetching Information about Data Sources: The Sources Request (backing up a bit) > As a special case, a version of 0 (numeric zero) selects the > current (most recent) version of sourceid. For this reason a > version of 0 is reserved. How is a client supposed to know to use version 0? It looks like that's done by string concatenation to the URL, but as I mentioned I don't like that approach. I can think of two solutions. 1) add a new element like to the . 2) add an attribute to the VERSION element, like Is the concept of "latest version" something that needs to be named? If the /0 URL is resolved, what does it do? Is it a redirect to the most recent version? Must/should the list of versions in some order? Like from oldest to newest? Should clients preserve the order when showing it to users? > REQUEST: > http://www.wormbase.org/das-genome/volvox/2 > > RESPONSE: > Content-type: application/x-das-source-details > > > > xmlns="http://www.biodas.org/ns/das-genome/2.00" > xmlns:xlink="http://www.w3.org/1999/xlink" > xml:base="http://dev.wormbase.org/das" > id="volvox" > description="Volvox Example Database"> > > Feature types > > > > A genomic feature > > > > > How does a client know what to do with each of these namespaces? Should it expect to get an application/x-das-types from volvox/1/sequence? Why or why not? As written the only way to figure it out is to look at the end of the URL, which I don't like. I would rather have the namespace content type stated as an attribute: ... ('nstype' is an ugly name but 'type' is already used for feature type and for content type). What is the text of the element used for? That's the "A genomic feature" in the following > A genomic feature > Is the following also allowed? A genomic feature It would be better, I think, to have that inside an element or attribute, as > Dates should follow the HTTP date specification. RFC 2068 (HTTP/1.1) allows three different formats HTTP applications have historically allowed three different formats for the representation of date/time stamps: Sun, 06 Nov 1994 08:49:37 GMT ; RFC 822, updated by RFC 1123 Sunday, 06-Nov-94 08:49:37 GMT ; RFC 850, obsoleted by RFC 1036 Sun Nov 6 08:49:37 1994 ; ANSI C's asctime() format The first format is preferred as an Internet standard and represents a fixed-length subset of that defined by RFC 1123 (an update to RFC 822). The second format is in common use, but is based on the obsolete RFC 850 [12] date format and lacks a four-digit year. HTTP/1.1 clients and servers that parse the date value MUST accept all three formats (for compatibility with HTTP/1.0), though they MUST only generate the RFC 1123 format for representing HTTP-date values in header fields. I would prefer the DAS spec be more specific about which of those is allowed. I think it's okay to say "RFC 1123 with 4 digit years". We can pin this down later. > > The id attribute within each tag corresponds to an > HTTP method, and is one of "GET," "PUT," "DELETE" or "POST." > Clients can use this information to determine whether a data > source is updateable. I don't know how needed this is. Eg, a data source might be editable but not by the person who fetched this data. I suspect this can't be fully figured out until the write interface is done. > > A data format recognized by this server. The id attribute is > the short name of the format for use in the GET URL, and the > type attribute is the returned document's MIME type. That should probably be 'name' instead of 'id'. For consistency's sake since 'id' seems otherwise always used for resolvable URIs. > xmlns="http://www.biodas.org/ns/das-genome/2.00" > xmlns:xlink="http://www.w3.org/1999/xlink" > xml:base="http://www.wormbase.org/das-genome/volvox/1/type/"> > ontology="http://song.sourceforge.net/ontologies/sofa#tRNA" > source="tRNAscan-SE-1.11" > xml:base="tRNAscan/"> ? There are two xml:base elements. How is the ATT id resolved? Is it resolved upwards through all the enclosing URLs? That is, url = "glyph" for base in ["tRNAscan/", "http://www.wormbase.org/das-genome/volvox/1/type/", ... URL used to fetch the document ... ]: url = urljoin(base, fragment) .. use 'url' to reference the glyph data .. There's a typo -- replace ≪ with < > > ≪TYPE id="curated_gene" > > > > > > > At some point these need to be defined more formally. How does a client app know what 'glyph' means, or what "white" means? Do these need to be individually named? As written these are resolvable as URLs. It seems rather too fine grained to me, and I like named items! The problem comes down to how the software is expected to know how to interpret a name. There's nothing in the protocol to say that "glyph" is to be used as how to draw a given feature type. It can be resolved in at least two ways. One is to add a datatype field to each of the attrs, where the datatype comes from a controlled vocabulary. The other is to drop the id scheme and just leave this as a key/value table. That means that individual attributes of the feature type will not be fetchable. OTOH, this can be left as is. I don't think it's that big a problem. I can appease myself by saying that there's a element which describes the datatype of each id, and when not given it defaults to http://www.biodas.org/specs/2.0/metadata which defines things properly. ;) > Fetching Information About Sequences: The Sequence Request > > Appending "dna" to the end of a versioned data source URL > addresses the raw sequence data. Fetching this URL > returns a FASTA file containing all the sequences known > to the data source: > > REQUEST: > http://www.wormbase.org/das-genome/volvox/1/sequence ("append" is another string concatenation operation ...) The text says to append "dna" but the example uses "sequence". The Content-Type is "application/fasta". Shouldn't that be "x-fasta"? Is there any way to get a list of sequence ids? I had assumed .../1/sequence would return a document listing all of them, but it appears to return a FASTA file instead. > Ranges have the following format: > seqid/min:max:strand Are the following allowed? Chr1/::-1 -- reverse complement of all of Chr1 Chr1/1000: -- Chr1 from 1000 to the end (I would rather use this than Chr1/1000 because to me that look like asking for the base at position 1000) Chr/1000::-1 -- reverse complement of Chr1 from 1000 to the end Chr1/:: -- The entire sequence named Chr1 Is there a difference between Chr1 Chr1/ Chr1/: Chr1/:: Chr1/::0 More specifically, which mean "on both strands" and which mean "unknown strand"? That's what I've managed to review in the last 3.5 hours. I still have another 9 pages to go, leaving off with the "Fetching Information About Features" section. But I should take a nap now so I can be coherent at 3am for the conference call :) Andrew dalke@dalkescientific.com From hxu at chg.duhs.duke.edu Wed Sep 22 14:16:04 2004 From: hxu at chg.duhs.duke.edu (Hong Xu) Date: Wed Sep 22 14:16:00 2004 Subject: [DAS] Proserver display transcript structure Message-ID: <00e101c4a0d0$395ca0b0$954c1098@DNA319> Dear all, I'm trying to display transcript structure in Ensembl using Proserver DAS server. I can display it as "box" feature from transcript start to end location. I want to display it as "exon boxes" connected with "intron line". Just like the ensembl transcript displayed on Ensembl web site. My questions are: 1) what's the format for storing transcript data in database? 2) how do I write Proserver source adaptor to serve the data with transcript stylesheet? thanks, Hong Xu From gilmanb at pantherinformatics.com Fri Sep 17 08:59:51 2004 From: gilmanb at pantherinformatics.com (Brian Gilman) Date: Wed Sep 22 23:35:37 2004 Subject: [DAS] Re: das/2 proposal status In-Reply-To: <200409151528.16053.lstein@cshl.edu> References: <200409151528.16053.lstein@cshl.edu> Message-ID: <767B49FA-08A9-11D9-9291-000A95CA3D68@pantherinformatics.com> Oy, We looked at the implementation of DAS/2 months ago and started down the path of a webdav implementation on top of BerkeleyDB and Berkeley DBXML. This development ceased about 4 months ago. It wouldn't be hard to start it back up again but, I'd need some guidance from the community as to whether or not they'd be receptive to the adoption of DAS/2 because my time is getting very compressed. What are other people's feelings on this? Best, -B -- Brian Gilman President Panther Informatics Inc. 9 Acadia Park #2 Somerville, MA 02143 Phone 617-335-8276 (Cell) 617-395-7916 (Lan) E-Mail: gilmanb@pantherinformatics.com gilmanb@jforge.net AIM: gilmanb1 01000010 01101001 01101111 01001001 01101110 01100110 01101111 01110010 01101101 01100001 01110100 01101001 01100011 01101001 01100001 01101110 On Sep 15, 2004, at 3:28 PM, Lincoln Stein wrote: > Hi Andrew, > > Here's the latest version of the "request" portion of the DAS/2 spec, > recently converted into HTML. I haven't proofread the HTML yet; any > help you can render would be appreciated. > > Brian Gilman has volunteered to work on the "write" portion of the > spec, but I haven't heard about this recently. > > Lincoln > > On Wednesday 15 September 2004 02:49 pm, Andrew Dalke wrote: >> Hi Lincoln, >> >> In your previous email you said you were most of the way >> done with the first draft of the das2 proposal. Should I >> wait for that to finish or work with the one you sent? >> >> Andrew >> dalke@dalkescientific.com > > -- > Lincoln D. Stein > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 >