From saju_peruvachira@hotmail.com Mon Jan 6 10:19:22 2003 From: saju_peruvachira@hotmail.com (Saju Joseph) Date: Mon, 6 Jan 2003 15:49:22 +0530 Subject: [DAS] DAS Client Message-ID: This is a multi-part message in MIME format. ------=_NextPart_000_015B_01C2B59B.2E6DA090 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Hi Gurus,=20 I want to get the sequence for a chromosome region. Below is the piece = of code I tried, and am struck in between. Can any of you help me. Other = suggestions are welcome. //Connect to the url URL url =3D new = URL("http://genome.cse.ucsc.edu/cgi-bin/das/hg8/dna?segment=3Dchr1:1,100;= segment=3Dchr2:2,300"); HttpURLConnection conn =3D (HttpURLConnection) = url.openConnection(); =20 //Set Http request properties conn.setDoOutput(true); conn.setDoInput(true); conn.setRequestMethod("GET"); conn.setRequestProperty("Content-Language", "en-US"); conn.setRequestProperty("Content-Type", "application/xml"); System.out.println("DAS Data Requested..."); //Read data BufferedReader reader =3D new BufferedReader(new = InputStreamReader(conn.getInputStream())); char[] buff =3D new char[20000]; int bytes; String strFull =3D ""; while (true){ bytes =3D reader.read(buff,0,20000); if (bytes =3D=3D -1) break; =20 String str =3D new String (buff, 0, bytes); =20 strFull =3D strFull + str; } =20 System.out.println(strFull);=20 SequenceIterator stream =3D = (SequenceIterator)SeqIOTools.readEmbl(reader); =20 //Iterate over all sequences in the stream while (stream.hasNext()) { Sequence seq =3D stream.nextSequence(); System.out.println(seq.getName());=20 } Also I want to print the entire Sequence onto the console. How do I = achieve this. Can any of you direct me to any samples. Regards, Saju Joseph Hi, The biojava library contains classes for connecting to the das server = and for converting the results. They should provide everything you need. Hope that helps Thorsten Jansen -----Original Message----- From: das-admin@biodas.org [mailto:das-admin@biodas.org]On Behalf Of = Saju Joseph Sent: Friday, December 20, 2002 8:07 AM To: das@biodas.org Subject: [DAS] DAS Gurus, We have implemented a genome browser. We would like to download = external genome annotation data from the DAS servers with minimum = effort. With an http Request made to the DAS server, XML data will be = obtained. How is it possible to convert this XML data using the DAS = standard. If any of you could provide me with a detailed input on this, = it is highly appreciatable. Thanks in advance, Saju Joseph ------=_NextPart_000_015B_01C2B59B.2E6DA090 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable
Hi=20 Gurus,
I want=20 to get the sequence for a chromosome region. Below is the piece of code = I tried,=20 and am struck in between. Can any of you help me. Other suggestions are=20 welcome.
 
       =20 //Connect to the url
    =     URL=20 url =3D new URL("http://genome.cse.ucsc.edu/cgi-bin/das/hg8/dna?s= egment=3Dchr1:1,100;segment=3Dchr2:2,300");
  = ;   =20   HttpURLConnection conn =3D (HttpURLConnection)=20 url.openConnection();
       =20
    =     //Set=20 Http request properties
       =20 conn.setDoOutput(true);
       =20 conn.setDoInput(true);
       =20 conn.setRequestMethod("GET");
      &nbs= p;=20 conn.setRequestProperty("Content-Language",=20 "en-US");
       =20 conn.setRequestProperty("Content-Type",=20 "application/xml");
       =20 System.out.println("DAS Data Requested...");
    =     //Read=20 data
        BufferedReader = reader =3D=20 new BufferedReader(new = InputStreamReader(conn.getInputStream()));
 
       = char[] buff=20 =3D new char[20000];
       int=20 bytes;
       String strFull =3D=20 "";
       while=20 (true){
           = bytes =3D=20 reader.read(buff,0,20000);
      =    =20 if (bytes =3D=3D -1) break;
     =20
          String str =3D new = String=20 (buff, 0, bytes);
     =20
          strFull =3D = strFull +=20 str;
      }
   
&n= bsp;    =20     System.out.println(strFull); 
    = SequenceIterator stream =3D=20 (SequenceIterator)SeqIOTools.readEmbl(reader);
    = ;  
   //Iterate=20 over all sequences in the stream
 
   while = (stream.hasNext())=20 {
       Sequence seq =3D=20 stream.nextSequence();
       System.out.println(seq.getName()); 
}
 
Also I want to print = the entire=20 Sequence onto the console. How do I achieve this. Can any of you direct = me to=20 any samples.
 
Regards,
Saju=20 Joseph
 
 
 
 
 
Hi,
 
The=20 biojava library contains classes for connecting to the das server and = for=20 converting the results. They should provide everything you=20 need.
 
Hope=20 that helps
Thorsten Jansen
-----Original Message-----
From: = das-admin@biodas.org=20 [mailto:das-admin@biodas.org]On Behalf Of Saju = Joseph
Sent:=20 Friday, December 20, 2002 8:07 AM
To:=20 das@biodas.org
Subject: [DAS] DAS

Gurus,
We have implemented a genome browser. = We would=20 like to download external genome annotation data from the DAS servers = with=20 minimum effort. With an http Request made to the DAS server, XML data = will be=20 obtained. How is it possible to convert this XML data using the DAS = standard.=20 If any of you could provide me with a detailed input on this, it is = highly=20 appreciatable.
Thanks in advance,
Saju=20 Joseph
------=_NextPart_000_015B_01C2B59B.2E6DA090-- From td2@sanger.ac.uk Mon Jan 6 15:24:39 2003 From: td2@sanger.ac.uk (Thomas Down) Date: Mon, 6 Jan 2003 15:24:39 +0000 Subject: [DAS] DAS Client In-Reply-To: References: Message-ID: <20030106152438.GB357690@jabba.sanger.ac.uk> On Mon, Jan 06, 2003 at 03:49:22PM +0530, Saju Joseph wrote: > Hi Gurus, > I want to get the sequence for a chromosome region. Below is the piece of code I tried, and am struck in between. Can any of you help me. Other suggestions are welcome. I noticed a few problems with your script: - You're connecting to the `hg8' datasource at UCSC. This no longer appears to exist -- it's not in the DSN list for that server, and when I try accessing it from the command line with wget (a great DAS debugging tool, by the way), I get a blank HTTP response (not even a DAS error, which is arguably a problem with the server). If I try a similar request to `hg13' it works as expected. - You set a content-type property for the request. This only makes sense if you're POSTing your request rather than GETing. And even if you *were* POSTing, the correct content type would be application/x-www-form-urlencoded. In practice, this probably isn't what's causing any problem, but it's definitely wrong (and could, for example, cause problems with some servers which accept alternative query formats). - You try to read data twice from the InputStream returned by the URLConnection. First, you read into a string, then (having already reached the end of the stream), you pass it to a BioJava parsing function. This would only be valid if you did a mark/reset on the stream. - At the end of the script, you use the BioJava EMBL parser. DAS data is in a special DAS XML format. If the EMBL parser were actually to receive any of this (it won't, see above), it would return an error. Since you're using BioJava, the simplest was to get the data would be something like: import java.net.*; import org.biojava.bio.seq.*; import org.biojava.bio.seq.db.*; import org.biojava.bio.program.das.*; // ... SequenceDB das = new DASSequenceDB( new URL("http://genome.cse.ucsc.edu/cgi-bin/das/hg13/") ); Sequence dasSeq = das.getSequence("chr2"); System.out.println(dasSeq.subStr(100, 300)); If you want to see how BioJava is *actually* fetching sequence data, see: http://cvs.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/biojava-live/src/org/biojava/bio/program/das/DASRawSymbolList.java?rev=1.6&cvsroot=biojava&content-type=text/vnd.viewcvs-markup Thomas. From gcox@cle.lionbioscience.com Tue Jan 7 21:52:03 2003 From: gcox@cle.lionbioscience.com (Cox, Greg) Date: Tue, 7 Jan 2003 16:52:03 -0500 Subject: [DAS] Odd behaviour at Sanger install Message-ID: In some testing of the types command, I went against the Sanger Center's install to cross check. What I found suprised me. The request http://servlet.sanger.ac.uk:8080/das/ens_anoph_dros_5_1/types returned a list of types pretty much what I expected. However, http://servlet.sanger.ac.uk:8080/das/ens_anoph_dros_5_1/types?segment=2L returned types that weren't listed in the first query. My reading of the spec implies that the second query should return a strict subset of the first. Am I missing something? Greg From td2@sanger.ac.uk Wed Jan 8 13:06:53 2003 From: td2@sanger.ac.uk (Thomas Down) Date: Wed, 8 Jan 2003 13:06:53 +0000 Subject: [DAS] Odd behaviour at Sanger install In-Reply-To: References: Message-ID: <20030108130653.GB366364@jabba.sanger.ac.uk> On Tue, Jan 07, 2003 at 04:52:03PM -0500, Cox, Greg wrote: > In some testing of the types command, I went against the Sanger Center's > install to cross check. What I found suprised me. > > The request http://servlet.sanger.ac.uk:8080/das/ens_anoph_dros_5_1/types > returned a list of types pretty much what I expected. However, > http://servlet.sanger.ac.uk:8080/das/ens_anoph_dros_5_1/types?segment=2L > returned types that weren't listed in the first query. My reading of the > spec implies that the second query should return a strict subset of the > first. Am I missing something? Okay, this was a silly one-line bug. Basically, that data source is running through a DAS proxy server. This was passing back the global types-list for the reference server rather than the annotation server. Fixed now in CVS. Hopefully will be deployed in a few days. Thomas. From colm.nestor@decode.is Tue Jan 14 10:53:38 2003 From: colm.nestor@decode.is (Colm Nestor) Date: Tue, 14 Jan 2003 10:53:38 +0000 Subject: [DAS] LDAS server Message-ID: Bug reports: Using latest version of LDAS, downloaded yesterday. 1. Cannot get ldas_bulk_load.pl to work: [srs@mysql testdata]$ /usr/local/bin/ldas_bulk_load.pl --create --database ensembl_das test.das Can't exec "bulk_load_gff.pl": No such file or directory at /usr/local/bin/ldas_bulk_load.pl line 90. [srs@mysql testdata]$ There is a system call to a perl script at line 90, I cannot find this script anywhere. Note: The other loading script, ldas_load.pl works fine: [srs@mysql testdata]$ perl /usr/local/bin/ldas_load.pl --create --database ensembl_das test.das test.das: loading... test.das: 35 records loaded [srs@mysql testdata]$ I have installed all software requested. And am running on RedHat7.2. Kindest regards, Colm Nestor, DeCODE Genetics, ICELAND. From lstein@cshl.org Tue Jan 14 21:50:07 2003 From: lstein@cshl.org (Lincoln Stein) Date: Tue, 14 Jan 2003 13:50:07 -0800 Subject: [DAS] LDAS server In-Reply-To: References: Message-ID: <200301141350.07287.lstein@cshl.org> bulk_load_gff.pl is part of the bioperl 1.2 distribution. As indicated in the ldas install documents, you need to find this script in bioperl and install it in your command path. Look in bioperl/scripts/Bio-DB-GFF Lincoln On Tuesday 14 January 2003 02:53 am, Colm Nestor wrote: > Bug reports: > > Using latest version of LDAS, downloaded yesterday. > > > 1. > > Cannot get ldas_bulk_load.pl to work: > > [srs@mysql testdata]$ /usr/local/bin/ldas_bulk_load.pl --create --database > ensembl_das test.das > Can't exec "bulk_load_gff.pl": No such file or directory at > /usr/local/bin/ldas_bulk_load.pl line 90. > [srs@mysql testdata]$ > > There is a system call to a perl script at line 90, I cannot find this > script anywhere. > > Note: > The other loading script, ldas_load.pl works fine: > > [srs@mysql testdata]$ perl /usr/local/bin/ldas_load.pl --create --database > ensembl_das test.das > test.das: loading... > test.das: 35 records loaded > [srs@mysql testdata]$ > > I have installed all software requested. And am running on RedHat7.2. > > Kindest regards, > Colm Nestor, > DeCODE Genetics, > ICELAND. > > > _______________________________________________ > DAS mailing list > DAS@biodas.org > http://biodas.org/mailman/listinfo/das -- ======================================================================== Lincoln D. Stein Cold Spring Harbor Laboratory lstein@cshl.org Cold Spring Harbor, NY 1 Bungtown Road, Cold Spring Harbor, NY 11724 ======================================================================== From dalke@dalkescientific.com Wed Jan 15 07:45:52 2003 From: dalke@dalkescientific.com (Andrew Dalke) Date: Wed, 15 Jan 2003 00:45:52 -0700 Subject: [DAS] RFC: REST advocacy Message-ID: <20030115004552.62d87b25.adalke@mindspring.com> REST Advocacy While others have suggested using SOAP, UDDI, WSDL, etc for DAS 2.0, RFC 0: http://www.biojava.org/thomasd/DAS/spec-new.html RFC 2: http://www.biodas.org/RFCs/rfc002.txt RFC 11: http://www.biodas.org/RFCs/rfc011.txt RFC 13: http://www.biodas.org/RFCs/king_das2/index.html I propose herein an alternate view. I had to use SOAP for a project for one of my clients, a Big Pharma. Getting all the parts to work together was a bear, especially once we wanted to get WSDL in place. I tried to read the spec and available examples, but it wasn't much help. Nor was the O'Reilly book on SOAP. We finally got it working, but it felt more like luckly guesswork than true understanding. Thea public web diary from a friend of mine, Andrew Kuchling, helps describe my current view: http://www.amk.ca/diary/2002/sep.html > Indeed. The best thing to do, for a small developer, is to lend > support to those standards that are simple enough to be > understandable. Use RELAX NG instead of the wretchedly overcomplicated > XML Schema. Use XML-RPC instead of SOAP, and if you need a more > complex interface than XML-RPC can handle, skip SOAP and design it > following the REST principles. > > Eventually all of this SOAP + WSDL + UDDI + > junk will collapse under its own weight, I think. I just hope it won't > take the underlying technology of XML with it. For another example, by Fredrick Lundh, who wrote one of the SOAP libraries available for Python: http://effbot.org/zone/rest-vs-rpc.htm > So when we started working on the design for a large image > distribution and processing system, we already had a simple and > scalable design, and the tools to support it. Just send XML > documents representing objects back and forth over HTTP, and use the > lightweight DOM structure to hold parsed versions of them inside the > application. Add some glue code to let application code access the > DOM structures as ordinary Python objects, and you have a complete > and scalable system. > > The result was a much nicer specification (very few buzzwords) that > anyone can understand, far less code, and most importantly, a much > more robust design. ... > XML-RPC gives you a lot of power, and anyone can understand how it > works, and understand what the limitations are (Dave W. might not > know the limitations, but that's another story ;-) > > SOAP is something completely different; lots of additional > complexity, but very few additional benefits. Some people love > complexity (especially if they see a chance to make a living out of > it, like Don Box). But I don't. Wouldn't use Python if I did. "REST" is a way of organizing web services around URIs and other web technologies instead of using RPC systems like SOAP. Paul Prescod is one advocate for the REST architecture and some of his essays are available at http://www.prescod.net/rest/ . He wrote an overview of REST at http://www.xml.com/pub/a/2002/02/20/rest.html . There is a REST wiki at http://conveyor.com/RESTwiki/moin.cgi . See also http://www.xfront.com/REST-Web-Services.html . One fundamental idea promoted by REST advocates is that HTTP is not simply a way to get bits from here to there but instead is an application protocol, with the methods GET, POST, PUT, and DELETE, just like a file system has a few fundamental actions. (Some applications may need a few more actions, which is the idea behind DAV, see http://www.webdav.org/ . DAV is a REST architecture that extends HTTP/1.1 to add support for metadata properties, locking, and namespaces.) Let's consider RFC 11, "SOAP as the standard transport encapsulation for DAS/2 messages." It starts by giving some background to SOAP. > SOAP [1,2] is a simple messaging system, whereby all messages are > encoded as XML documents. It supports a variety of messaging > models, and is independent of underlying transport protocol, but for > DAS, we will presumably be using a standard request-response > paradigm. At least initially, transport will be over HTTP or HTTPS. I can say for certain that SOAP is not simple, at least not 1.2. Try reading the spec for it Part 0: Primer http://www.w3.org/TR/soap12-part0/ Part 1: Messaging Framework http://www.w3.org/TR/soap12-part1/ Part 3: Adjuncts http://www.w3.org/TR/soap12-part2/ with support for things like routing and signing of different parts of the message stream. Follow that up with the specs for WSDL, UDDI, and XML Schemas. Blarg! I'm not sure what is meant by "messaging models." As mentioned above, HTTP is not simply a transport protocol. It offers other actions besides "send data then get data back." It can be used as a transport protocol, but then again, so can SOAP. Is there need to use any "transport protocol" other than HTTP/HTTPS? If not, then is that an important consideration? The RFC then lists various advantages of SOAP over the existing DAS 1.5 protocol. > - Unlike the current DAS model, requests will be XML encoded, as > well as responses. This gives much more scope for extending the > request format, and makes it easier to support a powerful query > language in the requests (indeed, it would be easy to embed > XQueryX in SOAP messages). The current request format is the CGI-style "application/x-www-form-urlencoded" or "multipart/form-data" format. This is not an intrinsic part of HTTP. After all, that's how SOAP and XML-RPC can send XML to the server, or how HTTP allows a "PUT" request. Therefore, I argue that this is not an aspect of SOAP but is simply one of HTTP. The support for XQueryX should not bias the choice towards XQueryX. It's just as easy to embed any text string in any message and don't think XQueryX is .... pretty. See the example on the XQueryX page at http://www.w3.org/TR/xqueryx > - Message components must be namespace-qualified, guaranteeing > extensibility. The ability to return XML is a property of HTTP and not specifically of SOAP. Nor is extensibility unique to XML, though I do not advocate some other data representation language. > - Basic exception-reporting semantics are defined. As are they for HTTP. See http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html Indeed, since SOAP is a layer on top of HTTP that means I already need to check for HTTP errors. Why then require two sources for error messages? > - There is full support for pipelines of actors processing a given > message. This makes technologies like smart caching and proxying > easy to retrofit onto protocols. Even though I've seen the term "actor" used elsewhere, I don't understand that term, nor "pipelines of actors." Smart caching and proxying is already available using HTTP. See http://www.prescod.net/rest/rest_vs_soap_overview/ ] The final core goal of REST is "compatibility with ] intermediaries". The most popular intermediaries are various kinds ] of Web proxies. Some proxies cache information to improve ] performance. Others enforce security policies. Another important ] kind of intermediary is a gateway, which encapsulates non-Web ] systems. Compare that to a SOAP request where a proxy cannot automatically tell if an RPC call is a pure read-request (like a GET) or not. Hence, it cannot easily tell if it's cacheable or not! > - There are a large (and increasing) number of toolkits which make > developing SOAP applications easy. And a large number for doing XML-RPC. And a much larger number for doing HTTP. The two packages I used for SOAP and Python weren't too hard, it's true. > - SOAP-Encoding provides a standard format for marshaling arbitrary > data structures. (but see below for issues with this). There are a huge number of ways to marshal arbitrary data structures, even standardized ways. In addition, (as I understand things) SOAP Encoding doesn't require the rest of SOAP, like envelopes, routing, etc. The part "below" says that the SOAP toolkits are DOM-based and the RFC author would rather use a SAX/event-based one for processing large data sets. (Also mentioned in RFC 0.) However, as pointed out by Richard Salz (developer of ZSI, a leading Python SOAP implementation, and from his bio a long-time network protocol developer) http://www.xml.com/pub/a/2002/07/17/salz.html?page=last } Note that even though the individual processing is fairly simple, } the overall process is fairly complex and requires multiple passes } over the header elements. In a streaming environment -- think SAX, } not DOM -- that won't work. In fact, it's my bet that headers will } spell the end of SAX-style SOAP processors. For example, a digital } signature of a SOAP message naturally belongs in the header. In } order to generate the signature, you need to generate a hash of the } message content. How can you do that without buffering? Hence I believe SOAP toolkits will not migrate to a pure-event driven style and his concern will not be addressed. In any case, the RFC says the concern is with the memory and set-up/tear-down costs. I disagree. If 50 MB of data was requested, then using, say, a 100MB DOM shouldn't be a concern. I say the concern should be to allow people to see the data while it is being downloaded, instead of having to wait for a complete download. I do not think the existing SOAP toolkits allow this, nor will they soon. For one final read, only somewhat related, see http://www.adtmag.com/article.asp?id=6965 which advocates the "bohemian" RELAX NG schema over XML Schema. Andrew Dalke dalke@dalkescientific.com -- Need usable, robust software for bioinformatics or chemical informatics? Want to integrate your different tools so you can do more science in less time? Contact us! http://www.dalkescientific.com/ From dalke@dalkescientific.com Wed Jan 15 07:46:19 2003 From: dalke@dalkescientific.com (Andrew Dalke) Date: Wed, 15 Jan 2003 00:46:19 -0700 Subject: [DAS] RFC: REST example for DAS 2.0 Message-ID: <20030115004619.504a94f6.adalke@mindspring.com> REST example for DAS 2.0 In my previous RFC I suggested ignoring SOAP+UDDI+WSDL and build DAS 2.0 on top of straight HTTP+XML using a REST architecture. To show you how that might work, here's one way to have implemented the functionality from the DAS 1.5 spec. I ignore for now a discussion of how to handle versioning when the sequence changes. (I think it's best done by having an extra level with the version identifier in them.) If you want me to say "URI" instead "URI" you can make the replacement in your head. ============================ / Returns a list of data sources This replaces the 'dsns' method call. It returns an XML document of doctype "http://www.biodas.org/dtd/dasdsn.dtd" Doing this also gets rid of the annoying "cannot have a dsn named 'dsn'" problem. /stylesheet Returns the stylesheet for the DSN /entry_point/ Returns a list of entry points This returns an XML document (the doctype doesn't yet exist). It is basically a list of URLs. /entry_point/ This returns XML describing a segment, ie, id, start, stop, and orientation. The doctype doesn't yet exist. /feature/ Returns a list of all features. (You might not want to do this, and the server could simply say "not implemented.") /feature/ Returns the GFF for the feature named 'id' Each feature in 1.5 already has a unique identifier. This makes the feature a full-fledged citizen of the web by making it directly accessible. (Under DAS 1.5 it is accessible as a side effect of a 'features' command, but I don't want to confuse a feature's name with a search command, especially since many searches can return the same feature, and because the results of a search should be a list, not a single result.) /features?segment=RANGE;type=TYPE;category=.... Returns a list of features matching the given search criteria. The input is identical to the existing 'features' command. The result is a list of feature URLs. This is a POST interface. /sequence?segment=RANGE[;segment=RANGE]* Returns the sequence in the given segment(s), as XML of doctype "http://www.biodas.org/dtd/dassequence.dtd". This is identical to the existing 'sequence' command and is a POST interface. /type/ Returns a list of all types. (You might not want to do this, and the server could simply say "not implemented.") /type/ Returns a XML document of doctype "DASTYPE", which is like the existing "http://www.biodas.org/dtd/dastypes.dtd" except there's only one type. /types?segment=RANGE;type=TYPE Return a list of URIs for types matching the search criteria. The input is identical to the existing 'types' command. The result is a list of URLs. This is a POST interface. ============================ Unlike the existing spec, and unlike the proposed RFC 13, the feature and types are objects in their own right. This has several effects. Linkability Since a feature has a URL, means that features are directly addressible. This helps address RFC 3 "InterService links in DAS/2" (see http://www.biodas.org/RFCs/rfc003.txt ) because each object is accessible through a URL, and can be addressed by anything else which understands URLs. One such relevant technology is the Resource Description Framework (RDF) (see http://www.w3.org/TR/REC-rdf-syntax/ ). This lets 3rd parties add their own associations between URLs. For example, I could publish my own RDF database which comments on the quality of features in someone else's database. I do not know enough about RDF. I conjecture that I can suggest an alternative stylesheet (RFC 8, "DAS Visualization Server" http://www.biodas.org/RFCs/rfc008.txt) by an appropriate link to the /stylesheet/ . I further conjecture that RDF appropriately handles group normalization from RFC 10 (http://www.biodas.org/RFCs/rfc010.txt). Ontologies Web ontologies, like DAML+OIL, are built on top of RDF. Because types are also directly accessible, this lets us (or others!) build their own ontologies on top of the features type. This addresses RFC 4 "Annotation ontologies for DAS/2" at http://www.biodas.org/RFCs/rfc004.txt . Independent requests Perhaps the biggest disadvantage to this scheme is that any search (like 'features') requires an additional 'GET' to get information about every feature that matched. If there are 1,000 matches, then there are 1,000 additional requests. Compare that to the current scheme where all the data about the matches is returned in one shot. I do not believe this should be a problem. The HTTP/1.1 spec supports "keep-alive" so that the connection to the server does not need to be re-established. A client can feed requests to the server while also receiving responses from earlier queries, so there shouldn't be a pause in bandwidth usage while making each request. In addition, the overhead for making a request and the extra headers for each independent response shouldn't require much extra data to be sent. The performance slowdown should pay for itself quickly once someone does multiple queries. Suppose the second query also has 1,000 matches, with 500 matches overlapping with the first query. Under the existing DAS 1.5 spec, this means that all the data must be sent again. Under this proposal, only the 500 new requests need be sent. One other issue mentioned in the SOAP proposals and in my REST advocacy was the ability to stream through a feature table. Suppose the feature table is large. People would like to see partial results and not wait until all the data is received. Eg, this would allow them to cancel a download if they can see it contains the wrong information. If the results are sent in one block, this requires that the parsing toolkit support a streaming interface. It is unlikely that most SOAP toolkits will support this mode. It's also trickier to develop software using a streaming API (like SAX) compared to a bulk API (like DOM). This new spec gets around that problem by sending a list of URLs instead of the full data. The individual records are small and can be fetched one at a time and parsed with whatever means are appropriate. This makes it easier to develop software which can multitask between reading/parsing input and handling the user interface. Caching RFC 5 "DAS Caching" (http://www.biodas.org/RFCs/rfc005.txt) wants a way to cache data. I believe most of the data requests will be for feature data. Because these are independentially named and accessed through that name using an HTTP GET, this means that normal HTTP caching systems like the Squid proxy can be used along with standard and well-defined mechanisms to control cache behaviour. The caching proposal also considers P2P systems like Gnutella as a way to distribute data. One possible scheme for this is to define a mapping from URLs to a Gnutella resource. In this case, replace 'URL' above to 'URI'. Andrew Dalke dalke@dalkescientific.com -- Need usable, robust software for bioinformatics or chemical informatics? Want to integrate your different tools so you can do more science in less time? Contact us! http://www.dalkescientific.com/ From dalke@dalkescientific.com Wed Jan 15 07:46:24 2003 From: dalke@dalkescientific.com (Andrew Dalke) Date: Wed, 15 Jan 2003 00:46:24 -0700 Subject: [DAS] RFC: DAV and a writable DAS 2 server Message-ID: <20030115004624.211c7911.adalke@mindspring.com> DAV and a writable DAS 2 server There is one more piece of web technology I want to mention, DAV (see http://www.webdav.org). This is an extension of HTTP/1.1 which improves support for distributed authoring. It adds properties, namespace manipulation, and locking. DAV is available in many tools, including Internet Explorer. DAV should make it easy to modify a DAS 2.0 server. For example, suppose I have a new set of features. I create them as a set of files on the local file system, where the name of the file is feature identifier. I copy it to the server into the /feature/ directory (in detail, I get a write lock, PUT the new records into the /feature/ for each id, and yield the lock). And that is it! Implementation-wise, the DAS server would need to talk to the DAV-enabled server and to the back-end database, so that it parses new features as they come in and rebuilds the database and the end of the transaction (when the lock is returned). But the end result makes it very easy to maintain and update a DAS server. DAV can also be used to add metadata to records. For example, it could include properties like "last modified" or "owner." I do not know enough about DAV to know if/when properties should be used. Andrew Dalke dalke@dalkescientific.com -- Need usable, robust software for bioinformatics or chemical informatics? Want to integrate your different tools so you can do more science in less time? Contact us! http://www.dalkescientific.com/ From gilmanb@genome.wi.mit.edu Wed Jan 15 13:41:33 2003 From: gilmanb@genome.wi.mit.edu (Brian Gilman) Date: Wed, 15 Jan 2003 08:41:33 -0500 Subject: [DAS] RFC: REST example for DAS 2.0 In-Reply-To: <20030115004619.504a94f6.adalke@mindspring.com> Message-ID: On 1/15/03 2:46 AM, "Andrew Dalke" wrote: Hey Andrew, Long time no talk. SOAP, WSDL, and UDDI are NEVER going to help you send 50 MB of data across the wire! I've also thought about REST as a means to make a distributed system. But, the industry is just not going that way. There are MANY toolkits to program up a web service. Programming a REST service means doing things that are non-standard and my engineering brain says not to touch those things. SOAP has been able to solve a lot of interoperability problems and will only get better over time. We use the DIME protocol and compression to shove data over the wire. No need to parse the document this way. SOAP has two methods of asking for data: 1) RPC 2) Document centric My question to you is: Why reinvent the wheel?? Why program up yet another wire protocol when you have something to work with already?? And, DAS, is a REST protocol!! Right now DAS just works. Why change it to use anything else?? Is there a problem with the semantics of the protocol that impede any of the research that we are doing?? Murphy's law should be called the engineer's prayer. Best, -B > REST example for DAS 2.0 > > In my previous RFC I suggested ignoring SOAP+UDDI+WSDL and build DAS > 2.0 on top of straight HTTP+XML using a REST architecture. > > To show you how that might work, here's one way to have implemented > the functionality from the DAS 1.5 spec. I ignore for now a > discussion of how to handle versioning when the sequence changes. (I > think it's best done by having an extra level with the version > identifier in them.) > > If you want me to say "URI" instead "URI" you can make the replacement > in your head. > > ============================ > / > Returns a list of data sources > > This replaces the 'dsns' method call. It returns an XML document of > doctype "http://www.biodas.org/dtd/dasdsn.dtd" Doing this also gets > rid of the annoying "cannot have a dsn named 'dsn'" problem. > > > /stylesheet > Returns the stylesheet for the DSN > > > /entry_point/ > Returns a list of entry points > > This returns an XML document (the doctype doesn't yet exist). It is > basically a list of URLs. > > /entry_point/ > This returns XML describing a segment, ie, id, start, stop, and > orientation. The doctype doesn't yet exist. > > > /feature/ > Returns a list of all features. (You might not want to do this, > and the server could simply say "not implemented.") > > /feature/ > Returns the GFF for the feature named 'id' > > Each feature in 1.5 already has a unique identifier. This makes the > feature a full-fledged citizen of the web by making it directly > accessible. (Under DAS 1.5 it is accessible as a side effect of a > 'features' command, but I don't want to confuse a feature's name with > a search command, especially since many searches can return the same > feature, and because the results of a search should be a list, not a > single result.) > > > /features?segment=RANGE;type=TYPE;category=.... > Returns a list of features matching the given search criteria. > > The input is identical to the existing 'features' command. The result > is a list of feature URLs. This is a POST interface. > > > /sequence?segment=RANGE[;segment=RANGE]* > Returns the sequence in the given segment(s), as XML of > doctype "http://www.biodas.org/dtd/dassequence.dtd". > > This is identical to the existing 'sequence' command and is a POST > interface. > > > /type/ > Returns a list of all types. (You might not want to do this, > and the server could simply say "not implemented.") > > /type/ > Returns a XML document of doctype "DASTYPE", which is like > the existing "http://www.biodas.org/dtd/dastypes.dtd" except > there's only one type. > > /types?segment=RANGE;type=TYPE > Return a list of URIs for types matching the search criteria. > > The input is identical to the existing 'types' command. The result is > a list of URLs. This is a POST interface. > > ============================ > > Unlike the existing spec, and unlike the proposed RFC 13, the feature > and types are objects in their own right. This has several effects. > > Linkability > > Since a feature has a URL, means that features are directly > addressible. This helps address RFC 3 "InterService links in DAS/2" > (see http://www.biodas.org/RFCs/rfc003.txt ) because each object is > accessible through a URL, and can be addressed by anything else which > understands URLs. > > One such relevant technology is the Resource Description Framework > (RDF) (see http://www.w3.org/TR/REC-rdf-syntax/ ). This lets 3rd > parties add their own associations between URLs. For example, I could > publish my own RDF database which comments on the quality of features > in someone else's database. > > I do not know enough about RDF. I conjecture that I can suggest an > alternative stylesheet (RFC 8, "DAS Visualization Server" > http://www.biodas.org/RFCs/rfc008.txt) by an appropriate link to the > /stylesheet/ . > > I further conjecture that RDF appropriately handles group > normalization from RFC 10 (http://www.biodas.org/RFCs/rfc010.txt). > > Ontologies > > Web ontologies, like DAML+OIL, are built on top of RDF. Because types > are also directly accessible, this lets us (or others!) build their > own ontologies on top of the features type. This addresses RFC 4 > "Annotation ontologies for DAS/2" at > http://www.biodas.org/RFCs/rfc004.txt . > > > Independent requests > > Perhaps the biggest disadvantage to this scheme is that any search > (like 'features') requires an additional 'GET' to get information > about every feature that matched. If there are 1,000 matches, then > there are 1,000 additional requests. Compare that to the current > scheme where all the data about the matches is returned in one shot. > > I do not believe this should be a problem. The HTTP/1.1 spec supports > "keep-alive" so that the connection to the server does not need to be > re-established. A client can feed requests to the server while also > receiving responses from earlier queries, so there shouldn't be a > pause in bandwidth usage while making each request. In addition, the > overhead for making a request and the extra headers for each > independent response shouldn't require much extra data to be sent. > > The performance slowdown should pay for itself quickly once someone > does multiple queries. Suppose the second query also has 1,000 > matches, with 500 matches overlapping with the first query. Under the > existing DAS 1.5 spec, this means that all the data must be sent > again. Under this proposal, only the 500 new requests need be sent. > > One other issue mentioned in the SOAP proposals and in my REST > advocacy was the ability to stream through a feature table. Suppose > the feature table is large. People would like to see partial results > and not wait until all the data is received. Eg, this would allow > them to cancel a download if they can see it contains the wrong > information. > > If the results are sent in one block, this requires that the parsing > toolkit support a streaming interface. It is unlikely that most SOAP > toolkits will support this mode. It's also trickier to develop > software using a streaming API (like SAX) compared to a bulk API (like > DOM). This new spec gets around that problem by sending a list of > URLs instead of the full data. The individual records are small and > can be fetched one at a time and parsed with whatever means are > appropriate. This makes it easier to develop software which can > multitask between reading/parsing input and handling the user > interface. > > Caching > > RFC 5 "DAS Caching" (http://www.biodas.org/RFCs/rfc005.txt) wants a > way to cache data. I believe most of the data requests will be for > feature data. Because these are independentially named and accessed > through that name using an HTTP GET, this means that normal HTTP > caching systems like the Squid proxy can be used along with standard > and well-defined mechanisms to control cache behaviour. > > The caching proposal also considers P2P systems like Gnutella as a way > to distribute data. One possible scheme for this is to define a > mapping from URLs to a Gnutella resource. In this case, replace 'URL' > above to 'URI'. > > > > Andrew Dalke > dalke@dalkescientific.com -- Brian Gilman Group Leader Medical & Population Genetics Dept. MIT/Whitehead Inst. Center for Genome Research One Kendall Square, Bldg. 300 / Cambridge, MA 02139-1561 USA phone +1 617 252 1069 / fax +1 617 252 1902 From dblock@gnf.org Wed Jan 15 16:41:42 2003 From: dblock@gnf.org (David Block) Date: Wed, 15 Jan 2003 08:41:42 -0800 Subject: [DAS] RFC: REST example for DAS 2.0 In-Reply-To: Message-ID: <3A4DD83F-28A8-11D7-B0A8-0003935B04D0@gnf.org> Brian, What libraries are you using for DIME? Is there good Java, Perl support? I know you're a J2EE shop - what toolkit do you use? Thanks, Dave On Wednesday, January 15, 2003, at 05:41 AM, Brian Gilman wrote: > On 1/15/03 2:46 AM, "Andrew Dalke" wrote: > > Hey Andrew, > > Long time no talk. SOAP, WSDL, and UDDI are NEVER going to help > you send > 50 MB of data across the wire! I've also thought about REST as a means > to > make a distributed system. But, the industry is just not going that > way. > There are MANY toolkits to program up a web service. Programming a REST > service means doing things that are non-standard and my engineering > brain > says not to touch those things. SOAP has been able to solve a lot of > interoperability problems and will only get better over time. We use > the > DIME protocol and compression to shove data over the wire. No need to > parse > the document this way. > > SOAP has two methods of asking for data: > > 1) RPC > 2) Document centric > > My question to you is: Why reinvent the wheel?? Why program up yet > another wire protocol when you have something to work with already?? > And, > DAS, is a REST protocol!! Right now DAS just works. Why change it to > use > anything else?? Is there a problem with the semantics of the protocol > that > impede any of the research that we are doing?? Murphy's law should be > called > the engineer's prayer. > > Best, > > -B > >> REST example for DAS 2.0 >> >> In my previous RFC I suggested ignoring SOAP+UDDI+WSDL and build DAS >> 2.0 on top of straight HTTP+XML using a REST architecture. >> >> To show you how that might work, here's one way to have implemented >> the functionality from the DAS 1.5 spec. I ignore for now a >> discussion of how to handle versioning when the sequence changes. (I >> think it's best done by having an extra level with the version >> identifier in them.) >> >> If you want me to say "URI" instead "URI" you can make the replacement >> in your head. >> >> ============================ >> / >> Returns a list of data sources >> >> This replaces the 'dsns' method call. It returns an XML document of >> doctype "http://www.biodas.org/dtd/dasdsn.dtd" Doing this also gets >> rid of the annoying "cannot have a dsn named 'dsn'" problem. >> >> >> /stylesheet >> Returns the stylesheet for the DSN >> >> >> /entry_point/ >> Returns a list of entry points >> >> This returns an XML document (the doctype doesn't yet exist). It is >> basically a list of URLs. >> >> /entry_point/ >> This returns XML describing a segment, ie, id, start, stop, and >> orientation. The doctype doesn't yet exist. >> >> >> /feature/ >> Returns a list of all features. (You might not want to do this, >> and the server could simply say "not implemented.") >> >> /feature/ >> Returns the GFF for the feature named 'id' >> >> Each feature in 1.5 already has a unique identifier. This makes the >> feature a full-fledged citizen of the web by making it directly >> accessible. (Under DAS 1.5 it is accessible as a side effect of a >> 'features' command, but I don't want to confuse a feature's name with >> a search command, especially since many searches can return the same >> feature, and because the results of a search should be a list, not a >> single result.) >> >> >> /features?segment=RANGE;type=TYPE;category=.... >> Returns a list of features matching the given search criteria. >> >> The input is identical to the existing 'features' command. The result >> is a list of feature URLs. This is a POST interface. >> >> >> /sequence?segment=RANGE[;segment=RANGE]* >> Returns the sequence in the given segment(s), as XML of >> doctype "http://www.biodas.org/dtd/dassequence.dtd". >> >> This is identical to the existing 'sequence' command and is a POST >> interface. >> >> >> /type/ >> Returns a list of all types. (You might not want to do this, >> and the server could simply say "not implemented.") >> >> /type/ >> Returns a XML document of doctype "DASTYPE", which is like >> the existing "http://www.biodas.org/dtd/dastypes.dtd" except >> there's only one type. >> >> /types?segment=RANGE;type=TYPE >> Return a list of URIs for types matching the search criteria. >> >> The input is identical to the existing 'types' command. The result is >> a list of URLs. This is a POST interface. >> >> ============================ >> >> Unlike the existing spec, and unlike the proposed RFC 13, the feature >> and types are objects in their own right. This has several effects. >> >> Linkability >> >> Since a feature has a URL, means that features are directly >> addressible. This helps address RFC 3 "InterService links in DAS/2" >> (see http://www.biodas.org/RFCs/rfc003.txt ) because each object is >> accessible through a URL, and can be addressed by anything else which >> understands URLs. >> >> One such relevant technology is the Resource Description Framework >> (RDF) (see http://www.w3.org/TR/REC-rdf-syntax/ ). This lets 3rd >> parties add their own associations between URLs. For example, I could >> publish my own RDF database which comments on the quality of features >> in someone else's database. >> >> I do not know enough about RDF. I conjecture that I can suggest an >> alternative stylesheet (RFC 8, "DAS Visualization Server" >> http://www.biodas.org/RFCs/rfc008.txt) by an appropriate link to the >> /stylesheet/ . >> >> I further conjecture that RDF appropriately handles group >> normalization from RFC 10 (http://www.biodas.org/RFCs/rfc010.txt). >> >> Ontologies >> >> Web ontologies, like DAML+OIL, are built on top of RDF. Because types >> are also directly accessible, this lets us (or others!) build their >> own ontologies on top of the features type. This addresses RFC 4 >> "Annotation ontologies for DAS/2" at >> http://www.biodas.org/RFCs/rfc004.txt . >> >> >> Independent requests >> >> Perhaps the biggest disadvantage to this scheme is that any search >> (like 'features') requires an additional 'GET' to get information >> about every feature that matched. If there are 1,000 matches, then >> there are 1,000 additional requests. Compare that to the current >> scheme where all the data about the matches is returned in one shot. >> >> I do not believe this should be a problem. The HTTP/1.1 spec supports >> "keep-alive" so that the connection to the server does not need to be >> re-established. A client can feed requests to the server while also >> receiving responses from earlier queries, so there shouldn't be a >> pause in bandwidth usage while making each request. In addition, the >> overhead for making a request and the extra headers for each >> independent response shouldn't require much extra data to be sent. >> >> The performance slowdown should pay for itself quickly once someone >> does multiple queries. Suppose the second query also has 1,000 >> matches, with 500 matches overlapping with the first query. Under the >> existing DAS 1.5 spec, this means that all the data must be sent >> again. Under this proposal, only the 500 new requests need be sent. >> >> One other issue mentioned in the SOAP proposals and in my REST >> advocacy was the ability to stream through a feature table. Suppose >> the feature table is large. People would like to see partial results >> and not wait until all the data is received. Eg, this would allow >> them to cancel a download if they can see it contains the wrong >> information. >> >> If the results are sent in one block, this requires that the parsing >> toolkit support a streaming interface. It is unlikely that most SOAP >> toolkits will support this mode. It's also trickier to develop >> software using a streaming API (like SAX) compared to a bulk API (like >> DOM). This new spec gets around that problem by sending a list of >> URLs instead of the full data. The individual records are small and >> can be fetched one at a time and parsed with whatever means are >> appropriate. This makes it easier to develop software which can >> multitask between reading/parsing input and handling the user >> interface. >> >> Caching >> >> RFC 5 "DAS Caching" (http://www.biodas.org/RFCs/rfc005.txt) wants a >> way to cache data. I believe most of the data requests will be for >> feature data. Because these are independentially named and accessed >> through that name using an HTTP GET, this means that normal HTTP >> caching systems like the Squid proxy can be used along with standard >> and well-defined mechanisms to control cache behaviour. >> >> The caching proposal also considers P2P systems like Gnutella as a way >> to distribute data. One possible scheme for this is to define a >> mapping from URLs to a Gnutella resource. In this case, replace 'URL' >> above to 'URI'. >> >> >> >> Andrew Dalke >> dalke@dalkescientific.com > > -- > Brian Gilman > Group Leader Medical & Population Genetics Dept. > MIT/Whitehead Inst. Center for Genome Research > One Kendall Square, Bldg. 300 / Cambridge, MA 02139-1561 USA > phone +1 617 252 1069 / fax +1 617 252 1902 > > > _______________________________________________ > DAS mailing list > DAS@biodas.org > http://biodas.org/mailman/listinfo/das > -- ---------------------------------------------- David Block -- Genome Informatics Developer dblock@gnf.org http://radio.weblogs.com/0104507 (858)812-1513 From lstein@cshl.org Wed Jan 15 17:26:25 2003 From: lstein@cshl.org (Lincoln Stein) Date: Wed, 15 Jan 2003 12:26:25 -0500 Subject: [DAS] RFC: REST example for DAS 2.0 In-Reply-To: References: Message-ID: <200301151226.25382.lstein@cshl.org> I just want to keep things simple. As long as there are good Perl/Java/Python APIs to DAS and performance is usable, none of the target audience (applications developers) are going to care in the least whether it's SOAP or not. My concern with SOAP encapsulation is that it makes it harder to stream DAS, at least with my favorite language, Perl. But I've got my fingers crossed that eventually there will be a good streaming SOAP for Perl, and at that point all my misgivings go away. My understanding of REST is that it's defined by the negative -- it isn't SOAP. That's not going to provide much in the way of reusability. Lincoln On Wednesday 15 January 2003 08:41 am, Brian Gilman wrote: > On 1/15/03 2:46 AM, "Andrew Dalke" wrote: > > Hey Andrew, > > Long time no talk. SOAP, WSDL, and UDDI are NEVER going to help you > send 50 MB of data across the wire! I've also thought about REST as a means > to make a distributed system. But, the industry is just not going that way. > There are MANY toolkits to program up a web service. Programming a REST > service means doing things that are non-standard and my engineering brain > says not to touch those things. SOAP has been able to solve a lot of > interoperability problems and will only get better over time. We use the > DIME protocol and compression to shove data over the wire. No need to parse > the document this way. > > SOAP has two methods of asking for data: > > 1) RPC > 2) Document centric > > My question to you is: Why reinvent the wheel?? Why program up yet > another wire protocol when you have something to work with already?? And, > DAS, is a REST protocol!! Right now DAS just works. Why change it to use > anything else?? Is there a problem with the semantics of the protocol that > impede any of the research that we are doing?? Murphy's law should be > called the engineer's prayer. > > Best, > > -B > > > REST example for DAS 2.0 > > > > In my previous RFC I suggested ignoring SOAP+UDDI+WSDL and build DAS > > 2.0 on top of straight HTTP+XML using a REST architecture. > > > > To show you how that might work, here's one way to have implemented > > the functionality from the DAS 1.5 spec. I ignore for now a > > discussion of how to handle versioning when the sequence changes. (I > > think it's best done by having an extra level with the version > > identifier in them.) > > > > If you want me to say "URI" instead "URI" you can make the replacement > > in your head. > > > > ============================ > > / > > Returns a list of data sources > > > > This replaces the 'dsns' method call. It returns an XML document of > > doctype "http://www.biodas.org/dtd/dasdsn.dtd" Doing this also gets > > rid of the annoying "cannot have a dsn named 'dsn'" problem. > > > > > > /stylesheet > > Returns the stylesheet for the DSN > > > > > > /entry_point/ > > Returns a list of entry points > > > > This returns an XML document (the doctype doesn't yet exist). It is > > basically a list of URLs. > > > > /entry_point/ > > This returns XML describing a segment, ie, id, start, stop, and > > orientation. The doctype doesn't yet exist. > > > > > > /feature/ > > Returns a list of all features. (You might not want to do this, > > and the server could simply say "not implemented.") > > > > /feature/ > > Returns the GFF for the feature named 'id' > > > > Each feature in 1.5 already has a unique identifier. This makes the > > feature a full-fledged citizen of the web by making it directly > > accessible. (Under DAS 1.5 it is accessible as a side effect of a > > 'features' command, but I don't want to confuse a feature's name with > > a search command, especially since many searches can return the same > > feature, and because the results of a search should be a list, not a > > single result.) > > > > > > /features?segment=RANGE;type=TYPE;category=.... > > Returns a list of features matching the given search criteria. > > > > The input is identical to the existing 'features' command. The result > > is a list of feature URLs. This is a POST interface. > > > > > > /sequence?segment=RANGE[;segment=RANGE]* > > Returns the sequence in the given segment(s), as XML of > > doctype "http://www.biodas.org/dtd/dassequence.dtd". > > > > This is identical to the existing 'sequence' command and is a POST > > interface. > > > > > > /type/ > > Returns a list of all types. (You might not want to do this, > > and the server could simply say "not implemented.") > > > > /type/ > > Returns a XML document of doctype "DASTYPE", which is like > > the existing "http://www.biodas.org/dtd/dastypes.dtd" except > > there's only one type. > > > > /types?segment=RANGE;type=TYPE > > Return a list of URIs for types matching the search criteria. > > > > The input is identical to the existing 'types' command. The result is > > a list of URLs. This is a POST interface. > > > > ============================ > > > > Unlike the existing spec, and unlike the proposed RFC 13, the feature > > and types are objects in their own right. This has several effects. > > > > Linkability > > > > Since a feature has a URL, means that features are directly > > addressible. This helps address RFC 3 "InterService links in DAS/2" > > (see http://www.biodas.org/RFCs/rfc003.txt ) because each object is > > accessible through a URL, and can be addressed by anything else which > > understands URLs. > > > > One such relevant technology is the Resource Description Framework > > (RDF) (see http://www.w3.org/TR/REC-rdf-syntax/ ). This lets 3rd > > parties add their own associations between URLs. For example, I could > > publish my own RDF database which comments on the quality of features > > in someone else's database. > > > > I do not know enough about RDF. I conjecture that I can suggest an > > alternative stylesheet (RFC 8, "DAS Visualization Server" > > http://www.biodas.org/RFCs/rfc008.txt) by an appropriate link to the > > /stylesheet/ . > > > > I further conjecture that RDF appropriately handles group > > normalization from RFC 10 (http://www.biodas.org/RFCs/rfc010.txt). > > > > Ontologies > > > > Web ontologies, like DAML+OIL, are built on top of RDF. Because types > > are also directly accessible, this lets us (or others!) build their > > own ontologies on top of the features type. This addresses RFC 4 > > "Annotation ontologies for DAS/2" at > > http://www.biodas.org/RFCs/rfc004.txt . > > > > > > Independent requests > > > > Perhaps the biggest disadvantage to this scheme is that any search > > (like 'features') requires an additional 'GET' to get information > > about every feature that matched. If there are 1,000 matches, then > > there are 1,000 additional requests. Compare that to the current > > scheme where all the data about the matches is returned in one shot. > > > > I do not believe this should be a problem. The HTTP/1.1 spec supports > > "keep-alive" so that the connection to the server does not need to be > > re-established. A client can feed requests to the server while also > > receiving responses from earlier queries, so there shouldn't be a > > pause in bandwidth usage while making each request. In addition, the > > overhead for making a request and the extra headers for each > > independent response shouldn't require much extra data to be sent. > > > > The performance slowdown should pay for itself quickly once someone > > does multiple queries. Suppose the second query also has 1,000 > > matches, with 500 matches overlapping with the first query. Under the > > existing DAS 1.5 spec, this means that all the data must be sent > > again. Under this proposal, only the 500 new requests need be sent. > > > > One other issue mentioned in the SOAP proposals and in my REST > > advocacy was the ability to stream through a feature table. Suppose > > the feature table is large. People would like to see partial results > > and not wait until all the data is received. Eg, this would allow > > them to cancel a download if they can see it contains the wrong > > information. > > > > If the results are sent in one block, this requires that the parsing > > toolkit support a streaming interface. It is unlikely that most SOAP > > toolkits will support this mode. It's also trickier to develop > > software using a streaming API (like SAX) compared to a bulk API (like > > DOM). This new spec gets around that problem by sending a list of > > URLs instead of the full data. The individual records are small and > > can be fetched one at a time and parsed with whatever means are > > appropriate. This makes it easier to develop software which can > > multitask between reading/parsing input and handling the user > > interface. > > > > Caching > > > > RFC 5 "DAS Caching" (http://www.biodas.org/RFCs/rfc005.txt) wants a > > way to cache data. I believe most of the data requests will be for > > feature data. Because these are independentially named and accessed > > through that name using an HTTP GET, this means that normal HTTP > > caching systems like the Squid proxy can be used along with standard > > and well-defined mechanisms to control cache behaviour. > > > > The caching proposal also considers P2P systems like Gnutella as a way > > to distribute data. One possible scheme for this is to define a > > mapping from URLs to a Gnutella resource. In this case, replace 'URL' > > above to 'URI'. > > > > > > > > Andrew Dalke > > dalke@dalkescientific.com -- Lincoln Stein lstein@cshl.org Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) From adalke@mindspring.com Wed Jan 15 18:33:58 2003 From: adalke@mindspring.com (Andrew Dalke) Date: Wed, 15 Jan 2003 11:33:58 -0700 Subject: [DAS] RFC: REST example for DAS 2.0 In-Reply-To: References: <20030115004619.504a94f6.adalke@mindspring.com> Message-ID: <20030115113358.29575c71.adalke@mindspring.com> Brian Gilman: > I've also thought about REST as a means to > make a distributed system. But, the industry is just not going that way. Says the Java guy talking to Perl and Python coders. :) The direction of industry should be an influence, but not a deciding factor. ("If everyone else were to jump off the Empire State building....") In my posts I said that so far SOAP has been tricky to use (interoperating between Python, Perl, and Java), and the benefits of SOAP given in the RFCs are not unique to a SOAP approach. I also said, as you verified, that SOAP isn't useful for streaming over large responses, and pointed out an alternate way to approach it that the current 1.5 spec nor RFC 13 support. I also pointed out that a SOAP approach makes it hard to do caching. [Actually, itemized list of reasons at the bottom of this post.] > There are MANY toolkits to program up a web service. Yes, and I tried 4 of them for Python and the SOAP::Lite one for Perl. I also mentioned there are many toolkits to program up a REST service, since they already exist for standard web programming. > Programming a REST > service means doing things that are non-standard and my engineering brain > says not to touch those things. How is it non-standard? A search can still be a SOAP request (or XML-RPC which is just as useful as SOAP and much less complicated). Returning XML w/ a DTD is standardized, and the DTD can be used to generate a native data structure. True, the DTD doesn't handle type schemas, but that can be verified with an external schema. (Based on the work of people I know, I'm now leaning towards RELAX-NG, but that's a different topic.) > SOAP has been able to solve a lot of > interoperability problems and will only get better over time. We use the > DIME protocol and compression to shove data over the wire. No need to parse > the document this way. And XML-RPC has been able to solve a lot of interoperability problems and is mature and stable. What advantages does SOAP bring? DIME? Here's what I know about it (dated 2002/09/18) http://www.xml.com/pub/a/2002/09/18/ends.html According to that, DIME is more of a de facto standard than a de jure one, so how does that affect your engineering brain? ;) More seriously, it's layers upon layers. As I read it, the DIME message holds the SOAP message, and is decoded to get the SOAP portion out. (This is because DIME is a binary format and may contain XML metacharacters.) Therefore, why not return the DIME message without the SOAP part? Just include the data sets directly. > SOAP has two methods of asking for data: > > 1) RPC > 2) Document centric > > My question to you is: Why reinvent the wheel?? Why program up yet > another wire protocol when you have something to work with already?? RFC 13, which suggests WSDL and UDDI for a DAS 2, is RPC not document centric. I agree that everything I mentioned for my REST example can be done over SOAP, in which case SOAP is being done for pure serialization. However, even in that case you limit caching (RFC 5) because SOAP requests are all done via POST and the cache doesn't know if a POST request has side effects or not. How am I reinventing the wheel? In my example recasting of DAS I created no new wire protocols. Everything was returned in XML with a DTD. Just like in RFC 13 there's a new WSDL for every query. So there are equal numbers of new definitions required. > And, DAS, is a REST protocol!! Right now DAS just works. I disagree. Two data types, features and types, are not directly addressable. They are only retrievable as part of a search, ie, the 'types' and 'features' commands. (Semantically 'features?feature_id=ABC' returns a list of matches, either of length 1 or 0, as compared to a name which returns the object or says "404 Not Found") This means that DAS as it stands doesn't allow "InterService links" as requested for RFC 3 nor allows RDF-style commentary and metadata. And so I believe DAS 1.5 is not a REST protocol. > Why change it to use > anything else?? Is there a problem with the semantics of the protocol that > impede any of the research that we are doing?? Murphy's law should be called > the engineer's prayer. Yes, as listed: - improved performance because previously fetched features do not need to be re-retrieved for every search - better integration with existing http caching proxies - protocols are easier to understand - toolkits for doing this are more widely available (than SOAP, they are the same toolkits for the existing DAS spec) - able to make links to a feature, eg, with RDF (which can also address RFC 10 on "normalizing groups") - easy support for streaming - easy extension to DAV for making a *writable* system using standard and widely available authoring tools Do they "impede research"? The performance ones make it easier to work with distant data sources and easier to develop more interactive tools. The ability to make direct links is, I believe, a big but untapped advantage. The support for writing makes it easier for people to maintain a DAS system. Andrew Dalke dalke@dalkescientific.com -- Need usable, robust software for bioinformatics or chemical informatics? Want to integrate your different tools so you can do more science in less time? Contact us! http://www.dalkescientific.com/ From dalke@dalkescientific.com Wed Jan 15 18:48:40 2003 From: dalke@dalkescientific.com (Andrew Dalke) Date: Wed, 15 Jan 2003 11:48:40 -0700 Subject: [DAS] RFC: REST example for DAS 2.0 In-Reply-To: <200301151226.25382.lstein@cshl.org> References: <200301151226.25382.lstein@cshl.org> Message-ID: <20030115114840.2d297ea3.adalke@mindspring.com> [Blech! I'm subscribed to this list as "dalke@dalkescientific.com" since that is my primary email address. But my 'From' is "adalke@mindspring.com" because my ISP won't allow me to do otherwise. So every message I send gets held for moderation. Sorry about that moderators.] Lincoln Stein: > As long as there are good Perl/Java/Python > APIs to DAS and performance is usable, none of the target audience > (applications developers) are going to care in the least whether it's > SOAP or not. I agree. > My concern with SOAP encapsulation is that it makes it harder to > stream DAS, at least with my favorite language, Perl. But I've got my > fingers crossed that eventually there will be a good streaming SOAP > for Perl, and at that point all my misgivings go away. Given my readings, I do not think this will happen http://www.xml.com/pub/a/2002/07/17/salz.html?page=last } Note that even though the individual processing is fairly simple, } the overall process is fairly complex and requires multiple passes } over the header elements. In a streaming environment -- think SAX, } not DOM -- that won't work. In fact, it's my bet that headers will } spell the end of SAX-style SOAP processors. For example, a digital } signature of a SOAP message naturally belongs in the header. In } order to generate the signature, you need to generate a hash of the } message content. How can you do that without buffering? Brian mentioned DIME, which may. I do not think that DIME solution affects my comments on caching and on fetching only new features. > My understanding of REST is that it's defined by the negative -- it > isn't SOAP. That's not going to provide much in the way of > reusability. I would rather say that most times SOAP isn't REST. The papers I've read offer plenty of example of what a REST-style architecture is (as compared to the negatice.) Quoting from http://www.xfront.com/REST-Web-Services.html * Client-Server: a pull-based interaction style: consuming components pull representations. * Stateless: each request from client to server must contain all the information necessary to understand the request, and cannot take advantage of any stored context on the server. * Cache: to improve network efficiency responses must be capable of being labeled as cacheable or non-cacheable. * Uniform interface: all resources are accessed with a generic interface (e.g., HTTP GET, POST, PUT, DELETE). * Named resources - the system is comprised of resources which are named using a URL. * Interconnected resource representations - the representations of the resources are interconnected using URLs, thereby enabling a client to progress from one state to another. * Layered components - intermediaries, such as proxy servers, cache servers, gateways, etc, can be inserted between clients and resources to support performance, security, etc. Here's the PhD dissertation describing REST http://www.ics.uci.edu/~fielding/pubs/dissertation/top.htm in full glory. Andrew Dalke dalke@dalkescientific.com -- Need usable, robust software for bioinformatics or chemical informatics? Want to integrate your different tools so you can do more science in less time? Contact us! http://www.dalkescientific.com/ From ecerami@yahoo.com Sun Jan 19 17:07:37 2003 From: ecerami@yahoo.com (Ethan Cerami) Date: Sun, 19 Jan 2003 09:07:37 -0800 (PST) Subject: [DAS] Finding the ADAM2 Gene via Ensembl DAS In-Reply-To: Message-ID: <20030119170737.73300.qmail@web41510.mail.yahoo.com> Hi Everybody: I recently read the Nature guide, "A Users Guide to the Human Genome" (http://www.nature.com/genomics/), and the first exercises is to locate a gene and its neighboring genes via NCBI, Ensembl and UCSC. I thought it would be interesting to recreate this exercise using DAS directly, but I am having some difficulty. First some overview: if you click on this link: http://www.ensembl.org/Homo_sapiens/contigview?highlight=&chr=8&vc_start=38800000&vc_end=39190000&x=0&y=0, in the detailed panel on the bottom, you will see two known genes, ADAM18 and ADAM2. I am trying to get this same gene data out of Ensembl via DAS. I tried several Ensembl data sources, including: ensembl930, ens_ncbi30refseq (Ensembl-mapped Human RefSeqs), ens930cds (Ensembl CDS). I finally tried ens_ncbi30trans (NCBI Transcripts). Here's the query I sent: http://servlet.sanger.ac.uk:8080/das/ens_ncbi30trans/features?segment=8:38800000,39190000 In the response, I got back 14 features, all named ADAM2, but each one is located at a different location. So, my questions: 1. Am I using the right Ensembl data source? 2. Why do I get back 14 ADAM2 Genes, instead of just one? 3. Why don't I get back the ADAM18 gene? Many thanks! Ethan __________________________________________________ Do you Yahoo!? Yahoo! Mail Plus - Powerful. Affordable. Sign up now. http://mailplus.yahoo.com From thomas@derkholm.net Sun Jan 19 17:59:35 2003 From: thomas@derkholm.net (Thomas Down) Date: Sun, 19 Jan 2003 17:59:35 +0000 Subject: [DAS] Finding the ADAM2 Gene via Ensembl DAS In-Reply-To: <20030119170737.73300.qmail@web41510.mail.yahoo.com> References: <20030119170737.73300.qmail@web41510.mail.yahoo.com> Message-ID: <20030119175935.GA23438@firechild.derkholm.net> Once upon a time, on a computer far far away, Ethan Cerami wrote: > > First some overview: if you click on this link: > http://www.ensembl.org/Homo_sapiens/contigview?highlight=&chr=8&vc_start=38800000&vc_end=39190000&x=0&y=0, > in the detailed panel on the bottom, you will see two > known genes, ADAM18 and ADAM2. > > I am trying to get this same gene data out of Ensembl > via DAS. I tried several Ensembl data sources, > including: ensembl930, ens_ncbi30refseq > (Ensembl-mapped Human RefSeqs), ens930cds (Ensembl > CDS). I finally tried ens_ncbi30trans (NCBI > Transcripts). Here's the query I sent: > > http://servlet.sanger.ac.uk:8080/das/ens_ncbi30trans/features?segment=8:38800000,39190000 > > In the response, I got back 14 features, all named > ADAM2, but each one is located at a different > location. > > So, my questions: > > 1. Am I using the right Ensembl data source? No, I don't believe that you are. The source you're looking at is an NCBI genebuild, which I don't think can be expected to be the same as Ensembl. The core Ensembl data (including gene predictions) is on /das/ensembl930/. But trying the query you show above on this datasource isn't going to work... (see below). > 2. Why do I get back 14 ADAM2 Genes, instead of just > one? One for each exon. The DAS protocol doesn't have any way to return a single FEATURE element with a non-contiguous location, so gene structures really have to be returned as many individual FEATUREs grouped together. I note that Ensembl actually predicts 13 exons for ADAM2. 14 is close enough for me -- maybe NCBI managed to map a bit more UTR in this case. > 3. Why don't I get back the ADAM18 gene? Don't know. I presume NCBI don't predict it (or, possibly, put it somewhere else). The big issue here is actually that DAS servers don't *have* to provide you the annotation you want in chromosomal coordinates. It was implemented in this way so that annotation could potentially survive across assembly changes. The Ensembl DAS server actually choses to serve gene structures in either contig coordinates (if the whole gene fits) or else supercontig coordinates (the forthcoming version actually drops the supercontigs and just has clone, contig, and chromosomal coordinates, so this will make life slightly easier). Secondary issue: the Ensembl DAS server will call the gene structure ENST00000265708, rather than ADAM2. This is because the DAS protocol doesn't (to the best of my knowlege) support synonyms. The Ensembl server uses ENST numbers as the primary ID, on the basis that these are something consistent which every single prediction has. If you actually want to see it directly from Ensembl, try: http://servlet.sanger.ac.uk:8080/das/ensembl930/features?segment=NT_034911;type=exon A better bet would be to use some dedicated DAS client code, such as that included in the BioJava library, to access this data. This will handle all the sequence assembly issues for you, so you can do: SequenceDB ensemblDAS = new DASSequenceDB( new URL("http://servlet.sanger.ac.uk:8080/das/ensembl930/") ); Sequence chr = ensemblDAS.getSequence("8"); FeatureHolder someFeatures = chr.filter( new FeatureFilter.OverlapsLocation( new RangeLocation(38800000, 390000000) ) ); And get back what you expect. Thomas. From gilmanb@genome.wi.mit.edu Sun Jan 19 18:24:34 2003 From: gilmanb@genome.wi.mit.edu (Brian Gilman) Date: Sun, 19 Jan 2003 13:24:34 -0500 Subject: [DAS] Finding the ADAM2 Gene via Ensembl DAS In-Reply-To: <20030119175935.GA23438@firechild.derkholm.net> Message-ID: On 1/19/03 12:59 PM, "Thomas Down" wrote: Hello everyone, This brings up an interesting point: For organisms where we have a fairly complete genome, is there ever a time that you don't want features given to you in chromosome coordinates?? I can't remember the last time I really wanted a read out of the golden path....My buddies in assembly would kill me right now ;) For mouse human comparisons I don't care to have read coordinates or even contig coordinates. It is very intuitive to ask for chromosomal (ie. Global) coordinates for everything you're interested in.... Do people who use the Ensembl DAS server ever go after reads or contigs?? Or do people just use Thomas' convenience methods?? Best, -B > Once upon a time, on a computer far far away, Ethan Cerami wrote: >> >> First some overview: if you click on this link: >> http://www.ensembl.org/Homo_sapiens/contigview?highlight=&chr=8&vc_start=3880 >> 0000&vc_end=39190000&x=0&y=0, >> in the detailed panel on the bottom, you will see two >> known genes, ADAM18 and ADAM2. >> >> I am trying to get this same gene data out of Ensembl >> via DAS. I tried several Ensembl data sources, >> including: ensembl930, ens_ncbi30refseq >> (Ensembl-mapped Human RefSeqs), ens930cds (Ensembl >> CDS). I finally tried ens_ncbi30trans (NCBI >> Transcripts). Here's the query I sent: >> >> http://servlet.sanger.ac.uk:8080/das/ens_ncbi30trans/features?segment=8:38800 >> 000,39190000 >> >> In the response, I got back 14 features, all named >> ADAM2, but each one is located at a different >> location. >> >> So, my questions: >> >> 1. Am I using the right Ensembl data source? > > No, I don't believe that you are. The source you're looking > at is an NCBI genebuild, which I don't think can be expected > to be the same as Ensembl. > > The core Ensembl data (including gene predictions) is on > /das/ensembl930/. But trying the query you show above on > this datasource isn't going to work... (see below). > >> 2. Why do I get back 14 ADAM2 Genes, instead of just >> one? > > One for each exon. The DAS protocol doesn't have any way > to return a single FEATURE element with a non-contiguous > location, so gene structures really have to be returned as > many individual FEATUREs grouped together. I note that > Ensembl actually predicts 13 exons for ADAM2. 14 is close > enough for me -- maybe NCBI managed to map a bit more UTR > in this case. > >> 3. Why don't I get back the ADAM18 gene? > > Don't know. I presume NCBI don't predict it (or, possibly, > put it somewhere else). > > > > The big issue here is actually that DAS servers don't *have* > to provide you the annotation you want in chromosomal coordinates. > It was implemented in this way so that annotation could potentially > survive across assembly changes. The Ensembl DAS server actually > choses to serve gene structures in either contig coordinates > (if the whole gene fits) or else supercontig coordinates > (the forthcoming version actually drops the supercontigs and > just has clone, contig, and chromosomal coordinates, so this will > make life slightly easier). > > Secondary issue: the Ensembl DAS server will call the gene > structure ENST00000265708, rather than ADAM2. This is because > the DAS protocol doesn't (to the best of my knowlege) support > synonyms. The Ensembl server uses ENST numbers as the primary > ID, on the basis that these are something consistent which every > single prediction has. > > If you actually want to see it directly from Ensembl, try: > > > http://servlet.sanger.ac.uk:8080/das/ensembl930/features?segment=NT_034911;typ > e=exon > > A better bet would be to use some dedicated DAS client code, such > as that included in the BioJava library, to access this data. > This will handle all the sequence assembly issues for you, so you > can do: > > SequenceDB ensemblDAS = new DASSequenceDB( > new URL("http://servlet.sanger.ac.uk:8080/das/ensembl930/") > ); > Sequence chr = ensemblDAS.getSequence("8"); > FeatureHolder someFeatures = chr.filter( > new FeatureFilter.OverlapsLocation( > new RangeLocation(38800000, 390000000) > ) > ); > > And get back what you expect. > > Thomas. > _______________________________________________ > DAS mailing list > DAS@biodas.org > http://biodas.org/mailman/listinfo/das > -- Brian Gilman Group Leader Medical & Population Genetics Dept. MIT/Whitehead Inst. Center for Genome Research One Kendall Square, Bldg. 300 / Cambridge, MA 02139-1561 USA phone +1 617 252 1069 / fax +1 617 252 1902 From ecerami@yahoo.com Sun Jan 19 19:26:46 2003 From: ecerami@yahoo.com (Ethan Cerami) Date: Sun, 19 Jan 2003 11:26:46 -0800 (PST) Subject: [DAS] Finding the ADAM2 Gene via Ensembl DAS In-Reply-To: <20030119175935.GA23438@firechild.derkholm.net> Message-ID: <20030119192646.57557.qmail@web41501.mail.yahoo.com> Thomas, Thanks (as always). So, in order to recreate the Nature example, I first need to map the gene chromosome location to its contig. Then, request features for that contig. ADAM2 and ADAM18 are actually on different contigs. So, I was eventually able to track down both genes. Ethan --- Thomas Down wrote: > Once upon a time, on a computer far far away, Ethan > Cerami wrote: > > > > First some overview: if you click on this link: > > > http://www.ensembl.org/Homo_sapiens/contigview?highlight=&chr=8&vc_start=38800000&vc_end=39190000&x=0&y=0, > > in the detailed panel on the bottom, you will see > two > > known genes, ADAM18 and ADAM2. > > > > I am trying to get this same gene data out of > Ensembl > > via DAS. I tried several Ensembl data sources, > > including: ensembl930, ens_ncbi30refseq > > (Ensembl-mapped Human RefSeqs), ens930cds (Ensembl > > CDS). I finally tried ens_ncbi30trans (NCBI > > Transcripts). Here's the query I sent: > > > > > http://servlet.sanger.ac.uk:8080/das/ens_ncbi30trans/features?segment=8:38800000,39190000 > > > > In the response, I got back 14 features, all named > > ADAM2, but each one is located at a different > > location. > > > > So, my questions: > > > > 1. Am I using the right Ensembl data source? > > No, I don't believe that you are. The source you're > looking > at is an NCBI genebuild, which I don't think can be > expected > to be the same as Ensembl. > > The core Ensembl data (including gene predictions) > is on > /das/ensembl930/. But trying the query you show > above on > this datasource isn't going to work... (see below). > > > 2. Why do I get back 14 ADAM2 Genes, instead of > just > > one? > > One for each exon. The DAS protocol doesn't have > any way > to return a single FEATURE element with a > non-contiguous > location, so gene structures really have to be > returned as > many individual FEATUREs grouped together. I note > that > Ensembl actually predicts 13 exons for ADAM2. 14 is > close > enough for me -- maybe NCBI managed to map a bit > more UTR > in this case. > > > 3. Why don't I get back the ADAM18 gene? > > Don't know. I presume NCBI don't predict it (or, > possibly, > put it somewhere else). > > > > The big issue here is actually that DAS servers > don't *have* > to provide you the annotation you want in > chromosomal coordinates. > It was implemented in this way so that annotation > could potentially > survive across assembly changes. The Ensembl DAS > server actually > choses to serve gene structures in either contig > coordinates > (if the whole gene fits) or else supercontig > coordinates > (the forthcoming version actually drops the > supercontigs and > just has clone, contig, and chromosomal coordinates, > so this will > make life slightly easier). > > Secondary issue: the Ensembl DAS server will call > the gene > structure ENST00000265708, rather than ADAM2. This > is because > the DAS protocol doesn't (to the best of my > knowlege) support > synonyms. The Ensembl server uses ENST numbers as > the primary > ID, on the basis that these are something consistent > which every > single prediction has. > > If you actually want to see it directly from > Ensembl, try: > > > http://servlet.sanger.ac.uk:8080/das/ensembl930/features?segment=NT_034911;type=exon > > A better bet would be to use some dedicated DAS > client code, such > as that included in the BioJava library, to access > this data. > This will handle all the sequence assembly issues > for you, so you > can do: > > SequenceDB ensemblDAS = new DASSequenceDB( > new > URL("http://servlet.sanger.ac.uk:8080/das/ensembl930/") > ); > Sequence chr = ensemblDAS.getSequence("8"); > FeatureHolder someFeatures = chr.filter( > new FeatureFilter.OverlapsLocation( > new RangeLocation(38800000, 390000000) > ) > ); > > And get back what you expect. > > Thomas. > _______________________________________________ > DAS mailing list > DAS@biodas.org > http://biodas.org/mailman/listinfo/das __________________________________________________ Do you Yahoo!? Yahoo! Mail Plus - Powerful. Affordable. Sign up now. http://mailplus.yahoo.com From matthew_pocock@yahoo.co.uk Mon Jan 20 11:07:21 2003 From: matthew_pocock@yahoo.co.uk (Matthew Pocock) Date: Mon, 20 Jan 2003 11:07:21 +0000 Subject: [DAS] Finding the ADAM2 Gene via Ensembl DAS In-Reply-To: References: Message-ID: <3E2BD869.2070804@yahoo.co.uk> Hi Brian, I do when dumping out contigs for analysis. Of course, there are lots of convenient ways to do this in BioJava without the features /living/ in contig space. As a data-consumer, I don't care what coordinates the protocol uses as long as the toolkit I interact with can project them to the one I want to use. As a data-publisher, I just want to publish features in the easiest coordiante system. If I've annotated a contig, I want to publish in those coordinates. Matthew Brian Gilman wrote: > Do people who use the Ensembl DAS server ever go after reads or > contigs?? Or do people just use Thomas' convenience methods?? > > Best, > > -B From lstein at cshl.org Sun Jan 26 18:31:27 2003 From: lstein at cshl.org (Lincoln Stein) Date: Sun Jan 26 18:24:10 2003 Subject: [DAS] Finding the ADAM2 Gene via Ensembl DAS In-Reply-To: <3E2BD869.2070804@yahoo.co.uk> References: <3E2BD869.2070804@yahoo.co.uk> Message-ID: <200301261831.27724.lstein@cshl.org> To find all the ADAM genes in gbrowse, type ADAM* into the search box and hit . BTW, I'm having a mini DAS-hackathon this week at CSHL to fix up the protocol so that named features can be retrieved in a more sensible way. We've got gbrowse running on top of DAS, but would like the search interface to work as well on top of DAS as it does on top of a GFF database. It annoys me no end that searching for ADAM2 returns all the exons as well as the genes. Lincoln On Monday 20 January 2003 06:07 am, Matthew Pocock wrote: > Hi Brian, > > I do when dumping out contigs for analysis. Of course, there are lots of > convenient ways to do this in BioJava without the features /living/ in > contig space. As a data-consumer, I don't care what coordinates the > protocol uses as long as the toolkit I interact with can project them to > the one I want to use. As a data-publisher, I just want to publish > features in the easiest coordiante system. If I've annotated a contig, I > want to publish in those coordinates. > > Matthew > > Brian Gilman wrote: > > Do people who use the Ensembl DAS server ever go after reads or > > contigs?? Or do people just use Thomas' convenience methods?? > > > > Best, > > > > -B > > _______________________________________________ > DAS mailing list > DAS@biodas.org > http://biodas.org/mailman/listinfo/das -- ======================================================================== Lincoln D. Stein Cold Spring Harbor Laboratory lstein@cshl.org Cold Spring Harbor, NY 1 Bungtown Road, Cold Spring Harbor, NY 11724 ========================================================================