From andy.jenkinson at ebi.ac.uk Wed Feb 18 11:18:59 2009 From: andy.jenkinson at ebi.ac.uk (Andy Jenkinson) Date: Wed, 18 Feb 2009 16:18:59 +0000 Subject: [DAS] comments - hierarchical features Message-ID: <499C34F3.8000801@ebi.ac.uk> Hi all, As you may know, soon a new revision of the DAS specification will be published. One of the features to be added is improved support for hierarchical features, and I'm looking for input regarding a detail of how this will be done. The plan is to replace the structure with something similar to the DAS/2 approach: parent features have concise elements that identify other (separate) child features. Child features have elements to represent the reciprocal relationship. This means the group data no longer needs to be duplicated when shared by several features, and groups can themselves have start/endpoints: ... start, end, notes and other verbose content ... ... content ... ... content ... Here, both contain references to each other representing the same link. However, it would be possible to represent the relationship even if only one feature links to the other: ... ... Therefore the option exists to omit the element from the specification entirely. Over the last couple of years we have seen DAS sources become more and more dense, and browsers wishing to display larger regions. As a result, there is significant pressure to minimise the verbosity of the XML response (there are other changes to the upcoming spec to help with this). Whilst DAS2's alternative content negotiation feature sidesteps the issue, DAS does not yet have this and in any case it is my belief that the fallback XML format should still be fit for purpose. The counter argument (i.e. the case for requiring both and elements) is based around the rendering efficiency benefits of streaming. If a client knows for sure that it has parsed all features that are related to each other, it can render them while it waits for the server to send the rest of the response. A client could potentially use this to offer a significant usability boost - a user's perception of the speed of an interface is greatly influenced by how fast a display starts to render rather than the time it takes to complete. But at the moment there are no DAS clients that use this (it is not possible with the current spec, and some clients such as Ensembl cannot due to the way the data is rendered). I am not sure to what extent it would be used in future either, for example it could not be used where post-processing of the entire set of features is necessary (e.g. binning). So my question is: should the specification require bi-directional references ( and ), or uni-directional ( only)? Whichever approach is taken, replacing the structure will significantly reduce verbosity for groups with large numbers of child features, but do we want to reduce this further by removing elements at the cost of the potential for "streaming"> Apologies for the long and technical post. Andy From andy.jenkinson at ebi.ac.uk Wed Feb 18 12:01:18 2009 From: andy.jenkinson at ebi.ac.uk (Andy Jenkinson) Date: Wed, 18 Feb 2009 17:01:18 +0000 Subject: [DAS] [proserver-users] how to not share private data but use proserver? In-Reply-To: References: Message-ID: <499C3EDE.8000409@ebi.ac.uk> Hi Catherine, A great question and one that has surfaced a number of times recently. There is no formal specification of how to do authentication within DAS, but it has been discussed and will hopefully be addressed soon. I am copying this to the DAS mailing list as I believe it's relevant. In the meantime, ProServer does contain an immature authentication framework that might be sufficient, depending on your situation. There are two implementations: "ip" and "http". I hope you will bear with me whilst I explain these: The former allows you to define an IP range whitelist so you can restrict access to certain machines, but there are two caveats: first, you cannot filter the IPs of your users' machines because the IP is unlikely to be forwarded by the DAS client (i.e. in your case Ensembl, I believe?). Thus all you can do is block requests that are not from Ensembl's webservers. So if somebody knows the URL of your DAS server, they can visualise the data through Ensembl. The second caveat is that IP addresses can be spoofed, so if a malicious party has the technical knowledge (and knows the URL) they can pretend to be within the allowed IP range. The second method is vastly more robust, but would require a change to Ensembl. It works by extracting a token from the DAS request (e.g. a header or parameter) and forwarding it to a known third party server to check if the request should be allowed or denied. This system is similar to how OpenID works, but was designed for use by Ensembl (wherein the token would be encrypted and the third party would be Ensembl itself). The idea was that you would be able to control access for specific users/groups via the Ensembl interface. Unfortunately it has yet to be implemented in the Ensembl web code. It has been suggested before to use simple HTTP user:password URL syntax (UCSC use this for BED files). To cut a long story short, this *might* work without needing to modify Ensembl, but despite appearances it's actually less secure than using IP filtering. Hope that's useful, Andy Catherine Leroy wrote: > Hi, > > I have kind of a 'sellfish' question. > > I would like to build my own internal Das Server so that my users > (post-docs) can visualize their unpublished data. > > From what I understand and what we tested, if somebody has the url of a > proserver server that is inside Sanger, then this somebody can have > access to the data served by this server even from outside the Sanger. > In my case, I really don't want that to happen. > > Is there a work around that? > > Thank you very much in advance, > Cheers, > Catherine > > > ------------------------------------------------------------------------ > > _______________________________________________ > proserver-users mailing list > proserver-users at sanger.ac.uk > http://lists.sanger.ac.uk/mailman/listinfo/proserver-users From david.nix at hci.utah.edu Wed Feb 18 12:35:56 2009 From: david.nix at hci.utah.edu (David Nix) Date: Wed, 18 Feb 2009 10:35:56 -0700 Subject: [DAS] [proserver-users] how to not share private data but use proserver? In-Reply-To: <499C3EDE.8000409@ebi.ac.uk> Message-ID: I've built an authentication method into the genoviz DAS/2 server. It may be appropriate for your use. Installation instructions are at http://bioserver.hci.utah.edu/BioInfo/index.php/Software:DAS2 . We use this to grant public/ private access to particular folders for ~20 different lab groups (we're a core facility). Once installed modify the restrictedDirectories.txt and users.txt files to define who can see what. The test install comes with some restricted data directories. This is DAS/2 not DAS and only IGB has been modified to make the appropriate handshake. Although this could be built into other browsers. -cheers, David -- David Austin Nix, PhD | HCI Bioinformatics | Huntsman Cancer Institute | 2000 Circle of Hope | SLC, UT 84112 | Rm: 3165 | Vc: 801.587.4611 | Fx: 801.585.6458 | david.nix at hci.utah.edu | http://bioserver.hci.utah.edu -cheers, David On 2/18/09 10:01 AM, "Andy Jenkinson" wrote: > Hi Catherine, > > A great question and one that has surfaced a number of times recently. > There is no formal specification of how to do authentication within DAS, > but it has been discussed and will hopefully be addressed soon. I am > copying this to the DAS mailing list as I believe it's relevant. > > In the meantime, ProServer does contain an immature authentication > framework that might be sufficient, depending on your situation. There > are two implementations: "ip" and "http". I hope you will bear with me > whilst I explain these: > > The former allows you to define an IP range whitelist so you can > restrict access to certain machines, but there are two caveats: first, > you cannot filter the IPs of your users' machines because the IP is > unlikely to be forwarded by the DAS client (i.e. in your case Ensembl, I > believe?). Thus all you can do is block requests that are not from > Ensembl's webservers. So if somebody knows the URL of your DAS server, > they can visualise the data through Ensembl. The second caveat is that > IP addresses can be spoofed, so if a malicious party has the technical > knowledge (and knows the URL) they can pretend to be within the allowed > IP range. > > The second method is vastly more robust, but would require a change to > Ensembl. It works by extracting a token from the DAS request (e.g. a > header or parameter) and forwarding it to a known third party server to > check if the request should be allowed or denied. This system is similar > to how OpenID works, but was designed for use by Ensembl (wherein the > token would be encrypted and the third party would be Ensembl itself). > The idea was that you would be able to control access for specific > users/groups via the Ensembl interface. Unfortunately it has yet to be > implemented in the Ensembl web code. > > It has been suggested before to use simple HTTP user:password URL syntax > (UCSC use this for BED files). To cut a long story short, this *might* > work without needing to modify Ensembl, but despite appearances it's > actually less secure than using IP filtering. > > Hope that's useful, > Andy > > Catherine Leroy wrote: >> Hi, >> >> I have kind of a 'sellfish' question. >> >> I would like to build my own internal Das Server so that my users >> (post-docs) can visualize their unpublished data. >> >> From what I understand and what we tested, if somebody has the url of a >> proserver server that is inside Sanger, then this somebody can have >> access to the data served by this server even from outside the Sanger. >> In my case, I really don't want that to happen. >> >> Is there a work around that? >> >> Thank you very much in advance, >> Cheers, >> Catherine >> >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> proserver-users mailing list >> proserver-users at sanger.ac.uk >> http://lists.sanger.ac.uk/mailman/listinfo/proserver-users > _______________________________________________ > DAS mailing list > DAS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das From andy.jenkinson at ebi.ac.uk Thu Feb 19 08:31:23 2009 From: andy.jenkinson at ebi.ac.uk (Andy Jenkinson) Date: Thu, 19 Feb 2009 13:31:23 +0000 Subject: [DAS] comments - hierarchical features In-Reply-To: References: <499C34F3.8000801@ebi.ac.uk> Message-ID: <499D5F2B.60109@ebi.ac.uk> Hi Chris, Thanks for the feedback. I suspect you may be right about the optimisation - lots of other pieces have to fall into place for it to work. Having subfeatures indicate their containers is indeed how GFF3 works (and is also how existing DAS works), but in most cases it's a fair bit more verbose. The other aspect I forgot to mention was the DAS-DAS2 transition. The parent/part syntax is borrowed directly from DAS2 as I'm keen to avoid more divergence when there remains a possibility of uniting them. If we don't keep both elements, this isn't so important though. Speaking personally, I'm not too worried about a lack of obviousness of the relationship for using parent/part as I believe it's reasonably obvious from the XML, but then again I already know what to expect. So I certainly value your perspective if you think it is significantly confusing? Cheers, Andy Chris Mungall wrote: > > I suggest you name relations such that the inverses and directionality > are obvious > > part_of / has_part > parent_of / child_of > has_parent / has_child > > But not > > part / parent > > The argument for specifying both seems like premature optimization. I > suggest you align what you're doing with GFF3 as far as possible and > have subfeatures indicate their containing features. > > On Feb 18, 2009, at 8:18 AM, Andy Jenkinson wrote: > >> Hi all, >> >> As you may know, soon a new revision of the DAS specification will be >> published. One of the features to be added is improved support for >> hierarchical features, and I'm looking for input regarding a detail of >> how this will be done. >> >> The plan is to replace the structure with something similar to >> the DAS/2 approach: parent features have concise elements that >> identify other (separate) child features. Child features have >> elements to represent the reciprocal relationship. This means the >> group data no longer needs to be duplicated when shared by several >> features, and groups can themselves have start/endpoints: >> >> >> >> >> ... start, end, notes and other verbose content ... >> >> >> >> ... content ... >> >> >> >> ... content ... >> >> >> Here, both contain references to each other representing the same >> link. However, it would be possible to represent the relationship even >> if only one feature links to the other: >> >> >> >> ... >> >> >> ... >> >> >> Therefore the option exists to omit the element from the >> specification entirely. Over the last couple of years we have seen DAS >> sources become more and more dense, and browsers wishing to display >> larger regions. As a result, there is significant pressure to minimise >> the verbosity of the XML response (there are other changes to the >> upcoming spec to help with this). Whilst DAS2's alternative content >> negotiation feature sidesteps the issue, DAS does not yet have this >> and in any case it is my belief that the fallback XML format should >> still be fit for purpose. >> >> The counter argument (i.e. the case for requiring both and >> elements) is based around the rendering efficiency benefits of >> streaming. If a client knows for sure that it has parsed all features >> that are related to each other, it can render them while it waits for >> the server to send the rest of the response. A client could >> potentially use this to offer a significant usability boost - a user's >> perception of the speed of an interface is greatly influenced by how >> fast a display starts to render rather than the time it takes to >> complete. But at the moment there are no DAS clients that use this (it >> is not possible with the current spec, and some clients such as >> Ensembl cannot due to the way the data is rendered). I am not sure to >> what extent it would be used in future either, for example it could >> not be used where post-processing of the entire set of features is >> necessary (e.g. binning). >> >> So my question is: should the specification require bi-directional >> references ( and ), or uni-directional ( only)? >> Whichever approach is taken, replacing the structure will >> significantly reduce verbosity for groups with large numbers of child >> features, but do we want to reduce this further by removing >> elements at the cost of the potential for "streaming"> >> >> Apologies for the long and technical post. >> Andy >> _______________________________________________ >> DAS mailing list >> DAS at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/das >> From lincoln.stein at gmail.com Thu Feb 19 12:05:28 2009 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Thu, 19 Feb 2009 12:05:28 -0500 Subject: [DAS] comments - hierarchical features In-Reply-To: <499D5F2B.60109@ebi.ac.uk> References: <499C34F3.8000801@ebi.ac.uk> <499D5F2B.60109@ebi.ac.uk> Message-ID: <6dce9a0b0902190905t4ce2b44eyc412d2b255e68bb9@mail.gmail.com> I am in favor of the bidirectional links, even though it is significantly more verbose. I find it convenient to extract a subfeature from the datastream and not lose the hierarchy; it is also a handy way to identify features that are not part of a larger hierachy. Although it adds some computational overhead, DAS does compress very nicely with standard LZH algorithms, and so stream overhead is not as bad as it seems. Lincoln On Thu, Feb 19, 2009 at 8:31 AM, Andy Jenkinson wrote: > Hi Chris, > > Thanks for the feedback. I suspect you may be right about the optimisation > - lots of other pieces have to fall into place for it to work. Having > subfeatures indicate their containers is indeed how GFF3 works (and is also > how existing DAS works), but in most cases it's a fair bit more verbose. > > The other aspect I forgot to mention was the DAS-DAS2 transition. The > parent/part syntax is borrowed directly from DAS2 as I'm keen to avoid more > divergence when there remains a possibility of uniting them. If we don't > keep both elements, this isn't so important though. > > Speaking personally, I'm not too worried about a lack of obviousness of the > relationship for using parent/part as I believe it's reasonably obvious from > the XML, but then again I already know what to expect. So I certainly value > your perspective if you think it is significantly confusing? > > Cheers, > Andy > > Chris Mungall wrote: > >> >> I suggest you name relations such that the inverses and directionality are >> obvious >> >> part_of / has_part >> parent_of / child_of >> has_parent / has_child >> >> But not >> >> part / parent >> >> The argument for specifying both seems like premature optimization. I >> suggest you align what you're doing with GFF3 as far as possible and have >> subfeatures indicate their containing features. >> >> >> On Feb 18, 2009, at 8:18 AM, Andy Jenkinson wrote: >> >> Hi all, >>> >>> As you may know, soon a new revision of the DAS specification will be >>> published. One of the features to be added is improved support for >>> hierarchical features, and I'm looking for input regarding a detail of how >>> this will be done. >>> >>> The plan is to replace the structure with something similar to >>> the DAS/2 approach: parent features have concise elements that >>> identify other (separate) child features. Child features have >>> elements to represent the reciprocal relationship. This means the group data >>> no longer needs to be duplicated when shared by several features, and groups >>> can themselves have start/endpoints: >>> >>> >>> >>> >>> ... start, end, notes and other verbose content ... >>> >>> >>> >>> ... content ... >>> >>> >>> >>> ... content ... >>> >>> >>> Here, both contain references to each other representing the same link. >>> However, it would be possible to represent the relationship even if only one >>> feature links to the other: >>> >>> >>> >>> ... >>> >>> >>> ... >>> >>> >>> Therefore the option exists to omit the element from the >>> specification entirely. Over the last couple of years we have seen DAS >>> sources become more and more dense, and browsers wishing to display larger >>> regions. As a result, there is significant pressure to minimise the >>> verbosity of the XML response (there are other changes to the upcoming spec >>> to help with this). Whilst DAS2's alternative content negotiation feature >>> sidesteps the issue, DAS does not yet have this and in any case it is my >>> belief that the fallback XML format should still be fit for purpose. >>> >>> The counter argument (i.e. the case for requiring both and >>> elements) is based around the rendering efficiency benefits of >>> streaming. If a client knows for sure that it has parsed all features that >>> are related to each other, it can render them while it waits for the server >>> to send the rest of the response. A client could potentially use this to >>> offer a significant usability boost - a user's perception of the speed of an >>> interface is greatly influenced by how fast a display starts to render >>> rather than the time it takes to complete. But at the moment there are no >>> DAS clients that use this (it is not possible with the current spec, and >>> some clients such as Ensembl cannot due to the way the data is rendered). I >>> am not sure to what extent it would be used in future either, for example it >>> could not be used where post-processing of the entire set of features is >>> necessary (e.g. binning). >>> >>> So my question is: should the specification require bi-directional >>> references ( and ), or uni-directional ( only)? >>> Whichever approach is taken, replacing the structure will >>> significantly reduce verbosity for groups with large numbers of child >>> features, but do we want to reduce this further by removing >>> elements at the cost of the potential for "streaming"> >>> >>> Apologies for the long and technical post. >>> Andy >>> _______________________________________________ >>> DAS mailing list >>> DAS at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/das >>> >>> _______________________________________________ > DAS mailing list > DAS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das > -- Lincoln D. Stein Director, Informatics and Biocomputing Platform Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Renata Musa From jeedward at yahoo.com Fri Feb 20 10:28:08 2009 From: jeedward at yahoo.com (John Edward) Date: Fri, 20 Feb 2009 07:28:08 -0800 (PST) Subject: [DAS] Draft paper submission deadline extended: BCBGC-09 Message-ID: <420051.25773.qm@web45909.mail.sp1.yahoo.com> Draft paper submission deadline extended: BCBGC-09 ? The deadline for draft paper submission at the 2009 International Conference on Bioinformatics, Computational Biology, Genomics and Chemoinformatics (BCBGC-09) (website: http://www.PromoteResearch.org ) is extended due to numerous requests from the authors. The conference will be held during July 13-16 2009 in Orlando, FL, USA. We invite draft paper submissions. The conference will take place at the same time and venue where several other international conferences are taking place. The other conferences include: ????????? International Conference on Artificial Intelligence and Pattern Recognition (AIPR-09) ????????? International Conference on Automation, Robotics and Control Systems (ARCS-09) ????????? International Conference on Enterprise Information Systems and Web Technologies (EISWT-09) ????????? International Conference on High Performance Computing, Networking and Communication Systems (HPCNCS-09) ????????? International Conference on Information Security and Privacy (ISP-09) ????????? International Conference on Recent Advances in Information Technology and Applications (RAITA-09) ????????? International Conference on Software Engineering Theory and Practice (SETP-09) ????????? International Conference on Theory and Applications of Computational Science (TACS-09) ????????? International Conference on Theoretical and Mathematical Foundations of Computer Science (TMFCS-09) ? The website http://www.PromoteResearch.org contains more details. ? Sincerely John Edward Publicity committee From gregghelt at gmail.com Mon Feb 23 08:29:56 2009 From: gregghelt at gmail.com (Gregg Helt) Date: Mon, 23 Feb 2009 08:29:56 -0500 Subject: [DAS] comments - hierarchical features In-Reply-To: <6dce9a0b0902190905t4ce2b44eyc412d2b255e68bb9@mail.gmail.com> References: <499C34F3.8000801@ebi.ac.uk> <499D5F2B.60109@ebi.ac.uk> <6dce9a0b0902190905t4ce2b44eyc412d2b255e68bb9@mail.gmail.com> Message-ID: <50158cb00902230529p65141bc3p55502b19e4583409@mail.gmail.com> I also prefer bidirectional links, both for parsing optimizations and for consistency with DAS/2. As far as naming of the elements, "parent" and "part" were chosen for DAS/2 after some discussion, but I don't there were any major justifications for those over a different name pair. Some may be wondering why not avoid link elements altogether and just represent feature hierarchies by allowing nested feature elements? So the A1 parent, B1 & B2 part relationship would look like: ... ... ... We considered this during development of DAS/2, but the main use case that argues against it is when multiple parents share children. For example if multiple transcripts share exons -- this is how alternative splicing is modeled in many GMOD databases. Gregg On Thu, Feb 19, 2009 at 12:05 PM, Lincoln Stein wrote: > I am in favor of the bidirectional links, even though it is significantly > more verbose. I find it convenient to extract a subfeature from the > datastream and not lose the hierarchy; it is also a handy way to identify > features that are not part of a larger hierachy. > > Although it adds some computational overhead, DAS does compress very nicely > with standard LZH algorithms, and so stream overhead is not as bad as it > seems. > > Lincoln > > On Thu, Feb 19, 2009 at 8:31 AM, Andy Jenkinson >wrote: > > > Hi Chris, > > > > Thanks for the feedback. I suspect you may be right about the > optimisation > > - lots of other pieces have to fall into place for it to work. Having > > subfeatures indicate their containers is indeed how GFF3 works (and is > also > > how existing DAS works), but in most cases it's a fair bit more verbose. > > > > The other aspect I forgot to mention was the DAS-DAS2 transition. The > > parent/part syntax is borrowed directly from DAS2 as I'm keen to avoid > more > > divergence when there remains a possibility of uniting them. If we don't > > keep both elements, this isn't so important though. > > > > Speaking personally, I'm not too worried about a lack of obviousness of > the > > relationship for using parent/part as I believe it's reasonably obvious > from > > the XML, but then again I already know what to expect. So I certainly > value > > your perspective if you think it is significantly confusing? > > > > Cheers, > > Andy > > > > Chris Mungall wrote: > > > >> > >> I suggest you name relations such that the inverses and directionality > are > >> obvious > >> > >> part_of / has_part > >> parent_of / child_of > >> has_parent / has_child > >> > >> But not > >> > >> part / parent > >> > >> The argument for specifying both seems like premature optimization. I > >> suggest you align what you're doing with GFF3 as far as possible and > have > >> subfeatures indicate their containing features. > >> > >> > >> On Feb 18, 2009, at 8:18 AM, Andy Jenkinson wrote: > >> > >> Hi all, > >>> > >>> As you may know, soon a new revision of the DAS specification will be > >>> published. One of the features to be added is improved support for > >>> hierarchical features, and I'm looking for input regarding a detail of > how > >>> this will be done. > >>> > >>> The plan is to replace the structure with something similar to > >>> the DAS/2 approach: parent features have concise elements that > >>> identify other (separate) child features. Child features have > >>> elements to represent the reciprocal relationship. This means the group > data > >>> no longer needs to be duplicated when shared by several features, and > groups > >>> can themselves have start/endpoints: > >>> > >>> > >>> > >>> > >>> ... start, end, notes and other verbose content ... > >>> > >>> > >>> > >>> ... content ... > >>> > >>> > >>> > >>> ... content ... > >>> > >>> > >>> Here, both contain references to each other representing the same link. > >>> However, it would be possible to represent the relationship even if > only one > >>> feature links to the other: > >>> > >>> > >>> > >>> ... > >>> > >>> > >>> ... > >>> > >>> > >>> Therefore the option exists to omit the element from the > >>> specification entirely. Over the last couple of years we have seen DAS > >>> sources become more and more dense, and browsers wishing to display > larger > >>> regions. As a result, there is significant pressure to minimise the > >>> verbosity of the XML response (there are other changes to the upcoming > spec > >>> to help with this). Whilst DAS2's alternative content negotiation > feature > >>> sidesteps the issue, DAS does not yet have this and in any case it is > my > >>> belief that the fallback XML format should still be fit for purpose. > >>> > >>> The counter argument (i.e. the case for requiring both and > >>> elements) is based around the rendering efficiency benefits of > >>> streaming. If a client knows for sure that it has parsed all features > that > >>> are related to each other, it can render them while it waits for the > server > >>> to send the rest of the response. A client could potentially use this > to > >>> offer a significant usability boost - a user's perception of the speed > of an > >>> interface is greatly influenced by how fast a display starts to render > >>> rather than the time it takes to complete. But at the moment there are > no > >>> DAS clients that use this (it is not possible with the current spec, > and > >>> some clients such as Ensembl cannot due to the way the data is > rendered). I > >>> am not sure to what extent it would be used in future either, for > example it > >>> could not be used where post-processing of the entire set of features > is > >>> necessary (e.g. binning). > >>> > >>> So my question is: should the specification require bi-directional > >>> references ( and ), or uni-directional ( only)? > >>> Whichever approach is taken, replacing the structure will > >>> significantly reduce verbosity for groups with large numbers of child > >>> features, but do we want to reduce this further by removing > >>> elements at the cost of the potential for "streaming"> > >>> > >>> Apologies for the long and technical post. > >>> Andy > >>> _______________________________________________ > >>> DAS mailing list > >>> DAS at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/das > >>> > >>> _______________________________________________ > > DAS mailing list > > DAS at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/das > > > > > > -- > Lincoln D. Stein > Director, Informatics and Biocomputing Platform > Ontario Institute for Cancer Research > 101 College St., Suite 800 > Toronto, ON, Canada M5G0A3 > 416 673-8514 > Assistant: Renata Musa > _______________________________________________ > DAS mailing list > DAS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das > From andy.jenkinson at ebi.ac.uk Mon Feb 23 08:34:02 2009 From: andy.jenkinson at ebi.ac.uk (Andy Jenkinson) Date: Mon, 23 Feb 2009 13:34:02 +0000 Subject: [DAS] comments - hierarchical features In-Reply-To: <50158cb00902230529p65141bc3p55502b19e4583409@mail.gmail.com> References: <499C34F3.8000801@ebi.ac.uk> <499D5F2B.60109@ebi.ac.uk> <6dce9a0b0902190905t4ce2b44eyc412d2b255e68bb9@mail.gmail.com> <50158cb00902230529p65141bc3p55502b19e4583409@mail.gmail.com> Message-ID: <49A2A5CA.9050100@ebi.ac.uk> Gregg Helt wrote: > Some may be wondering why not avoid link elements altogether and just > represent feature hierarchies by allowing nested feature elements? So > the A1 parent, B1 & B2 part relationship would look like: > > ... > ... > ... > > > We considered this during development of DAS/2, but the main use case > that argues against it is when multiple parents share children. For > example if multiple transcripts share exons -- this is how alternative > splicing is modeled in many GMOD databases. I think this model is quite common for other databases too. From christoph.gille at charite.de Mon Feb 23 18:10:01 2009 From: christoph.gille at charite.de (Dr. Christoph Gille) Date: Tue, 24 Feb 2009 00:10:01 +0100 (CET) Subject: [DAS] example program output In-Reply-To: References: Message-ID: This is my first post here. I embedded DAS some time ago. Here is an output example of the program: http://3d-alignment.eu/exampleOutput.html Please send suggestions and recommendations Christoph From lincoln.stein at gmail.com Tue Feb 24 08:04:14 2009 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Tue, 24 Feb 2009 08:04:14 -0500 Subject: [DAS] example program output In-Reply-To: References: Message-ID: <6dce9a0b0902240504s5cfd8597q1fbee0df03b6c8a3@mail.gmail.com> It's very nice! Is this program something that you're considering making public? Lincoln On Mon, Feb 23, 2009 at 6:10 PM, Dr. Christoph Gille < christoph.gille at charite.de> wrote: > This is my first post here. > I embedded DAS some time ago. > > Here is an output example of the program: > > http://3d-alignment.eu/exampleOutput.html > > Please send suggestions and recommendations > > > Christoph > > > _______________________________________________ > DAS mailing list > DAS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das > -- Lincoln D. Stein Director, Informatics and Biocomputing Platform Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Renata Musa From andy.jenkinson at ebi.ac.uk Tue Feb 24 12:32:09 2009 From: andy.jenkinson at ebi.ac.uk (Andy Jenkinson) Date: Tue, 24 Feb 2009 17:32:09 +0000 Subject: [DAS] [Fwd: [Bosc-announce] BOSC 2009 Call for Abstracts] Message-ID: <49A42F19.6090305@ebi.ac.uk> The following may be of interest to others on the list. BOSC is organised by the OBF, with which the BioDAS project is affiliated, and has been a good forum for DAS discussion. Cheers, Andy -------- Original Message -------- Subject: [Bosc-announce] BOSC 2009 Call for Abstracts Date: Tue, 24 Feb 2009 08:55:08 -0800 From: Kam Dahlquist To: kdahlquist at lmu.edu Please forward as appropriate and forgive multiple postings. Call for Abstracts for the 2009 Bioinformatics Open Source Conference (BOSC) 2009 An ISMB 2009 Special Interest Group (SIG) Date: June 27-28, 2009 Location: Stockholm, Sweden URL: http://www.open-bio.org/wiki/BOSC_2009 Abstract submission via EasyChair: https://www.easychair.org/login.cgi?conf=bosc2009 Important Dates Monday, April 13: Abstract deadline May 1, 2009: Notification of accepted abstracts May 15, 2009: Early Registration Discount Cut-off date June 27-28, 2009: BOSC 2009 The Bioinformatics Open Source Conference (BOSC) is sponsored by the Open Bioinformatics Foundation (O|B|F), a non-profit group dedicated to promoting the practice and philosophy of Open Source software development within the biological research community. To be considered for acceptance, software systems representing the central topic in a presentation submitted to BOSC must be licensed with a recognized Open Source License, and be freely available for download in source code form. Many open source bioinformatics packages are widely used by the research community across many application areas and form a cornerstone in enabling research in the genomic and post-genomic era. Open source bioinformatics software has facilitated rapid innovation, dissemination, and wide adoption of new computational methods, reusable software components, and standards. One of the hallmarks of BOSC is the coming together of the open source developer community in one location to meet face-to-face. This creates synergy where participants can work together to create use cases, prototype working code, or run bootcamps for developers from other projects as short, informal, and hands-on tutorials in new software packages and emerging technologies. In short, BOSC is not just a conference for presentations of completed work, but is a dynamic meeting where collaborative work gets done and attendees can learn about new or on-going developments that they can directly apply to their own work. This year?s conference will mark the 10th anniversary of BOSC. To celebrate the special occasion, the theme of this year?s conference is ?Looking Back and Looking Ahead: Open Source Solutions to Grand Challenges in Bioinformatics.? We are inviting abstracts for two different types of talks: 1. Descriptions of a particular open source software implementation by a member of the development team, especially in the areas listed below. 2. Reviews of open source software that compares and contrasts different solutions to the same bioinformatics problem. In addition, we are asking all speakers to come prepared to lead an informal tutorial on their software during a Birds of a Feather/hackathon session. This year?s topics include: Topics Design Patterns in Bioinformatics Regulatory Genomics Multicore and GPGPU computing Data & Analysis Management (shared session with DAM SIG) Computational Grids Visualization There will also be updates from O|B|F-sponsored projects, and as always, abstracts may be submitted for open source software that does not fit neatly into the above categories. Lightning Talks will also highlight very recent developments. To contact the organizing committee, e-mail bosc at open-bio.org. To sign up for BOSC-related announcements, subscribe to the Bosc-announce list (http://lists.open-bio.org/mailman/listinfo/bosc-announce). BOSC 2009 Organizing Committee Kam D. Dahlquist (Chair) Lonnie R. Welch (Co-chair) Hilmar Lapp Jens Lichtenberg Frank Drews Andrew Dalke Jim Procter Se?n I. O' Donoghue Anton Nekrutenko Steffen Moeller _______________________________________________ Bosc-announce mailing list Bosc-announce at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bosc-announce From jw12 at sanger.ac.uk Wed Feb 25 08:22:58 2009 From: jw12 at sanger.ac.uk (Jonathan Warren) Date: Wed, 25 Feb 2009 13:22:58 +0000 Subject: [DAS] das stylesheets Message-ID: <9783C660-4A35-48D6-9E61-B85EF2E85AE2@sanger.ac.uk> Warning! please note that if you are setting up a das server with a stylesheet, be sure to not have 0 as strange things happen in clients such as ensembl- remove these tags if you're not sure what to do with them! Thanks Jonathan See issue below posted to ensembl dev list: Abel Ureta-Vidal wrote: > > Hi, > > This is maybe more for the DAS/Web team. I'm interested in > structural variation data (CNV, chromosome gain/loss, etc....) > I've identified some data in Ensembl that is integrated via DAS > sources. I've identified the original DAS sources and > query them directly, no problem, they work as expected. > > I'd just like to report a couple of things on the Ensembl web > display side of things. > > 1/The live ensembl server points to 2 DAS sources for CNVs from > Redon et al paper (Nature 2006 I presume), > i.e. WGTP_loci_36 and WGTP_regions_36. > I could not find a clear description of the exact difference between > the 2, (what does WGTP stands for? Whole Genome > Tiling Platform?). Querying them directly and looking at their data, > it seems that these are 2 identical data sets > > $ GET http://das.sanger.ac.uk/das/WGTP_loci_36/features?segment=5:17198060,18198059 > > WGTP_loci_36 > $ GET http://das.sanger.ac.uk/das/WGTP_regions_36/features?segment=5:17198060,18198059 > > WGTP_regions_36 > $ diff WGTP_loci_36 WGTP_regions_36 > 5c5 > < > --- > > > > If I query the complete data from chromosome 5, I get exactly the > same **none** difference. These 2 sources > seem to be identical. Only WGTP_regions_36 is listed and validate in > the DAS registry (www.dasregistry.org). Should > this one only be considered as the reference? > > On the short region I queried 5:17198060,18198059, 5 features come > back. But I can only see 4 in the Ensembl Detailed View > FEATURE id="Chr5tp-4G10:Gain=32|loss=39" label="Chr5tp-4G10:Gain=32| > loss=39" > START 17417700 > END 17583657 > seems to be missing. > All features have different ids so this is not a feature grouping > artifact I presume. > > 2/ Same kind of issue with decipher and decipher_collapsed_36 > sources. Only decipher is in the DAS registry. > > $ GET http://das.sanger.ac.uk/das/decipher/features?segment=5:16621834,16821833 > > decipher > $ GET http://das.sanger.ac.uk/das/decipher_collapsed_36/features?segment=5:16621834,16821833 > > decipher_collapsed > $ diff decipher decipher_collapsed > 5c5 > < > --- > > > > Here 4 features are returned but only 1 displayed. > > So am I missing something? or these sources are data duplications. > And is there a display problem? This is an identified issue - the DAS sources in both cases say DO NOT BUMP - so we don't hence we cannot display all the features. We have solved this for some sources by getting them to remove the "0" entries in the stylesheets. -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From andy.jenkinson at ebi.ac.uk Wed Feb 25 10:47:17 2009 From: andy.jenkinson at ebi.ac.uk (Andy Jenkinson) Date: Wed, 25 Feb 2009 15:47:17 +0000 Subject: [DAS] das stylesheets In-Reply-To: <9783C660-4A35-48D6-9E61-B85EF2E85AE2@sanger.ac.uk> References: <9783C660-4A35-48D6-9E61-B85EF2E85AE2@sanger.ac.uk> Message-ID: <49A56805.4020506@ebi.ac.uk> I think the prevalence of this issue may come from the fact that ProServer's demo stylesheet has this instruction in it. I have updated the subversion repository to remove it. Jonathan Warren wrote: > Warning! > > please note that if you are setting up a das server with a stylesheet, > be sure to not have 0 as strange things happen in clients > such as ensembl- remove these tags if you're not sure what to do with them! > > Thanks > > Jonathan > > See issue below posted to ensembl dev list: > > Abel Ureta-Vidal wrote: >> >> Hi, >> >> This is maybe more for the DAS/Web team. I'm interested in structural >> variation data (CNV, chromosome gain/loss, etc....) >> I've identified some data in Ensembl that is integrated via DAS >> sources. I've identified the original DAS sources and >> query them directly, no problem, they work as expected. >> >> I'd just like to report a couple of things on the Ensembl web display >> side of things. >> >> 1/The live ensembl server points to 2 DAS sources for CNVs from Redon >> et al paper (Nature 2006 I presume), >> i.e. WGTP_loci_36 and WGTP_regions_36. >> I could not find a clear description of the exact difference between >> the 2, (what does WGTP stands for? Whole Genome >> Tiling Platform?). Querying them directly and looking at their data, >> it seems that these are 2 identical data sets >> >> $ GET >> http://das.sanger.ac.uk/das/WGTP_loci_36/features?segment=5:17198060,18198059 > >> WGTP_loci_36 >> $ GET >> http://das.sanger.ac.uk/das/WGTP_regions_36/features?segment=5:17198060,18198059 > >> WGTP_regions_36 >> $ diff WGTP_loci_36 WGTP_regions_36 >> 5c5 >> < > href="http://das.sanger.ac.uk:80/das/WGTP_loci_36/features"> >> --- >> > > href="http://das.sanger.ac.uk:80/das/WGTP_regions_36/features"> >> >> If I query the complete data from chromosome 5, I get exactly the same >> **none** difference. These 2 sources >> seem to be identical. Only WGTP_regions_36 is listed and validate in >> the DAS registry (www.dasregistry.org). Should >> this one only be considered as the reference? >> >> On the short region I queried 5:17198060,18198059, 5 features come >> back. But I can only see 4 in the Ensembl Detailed View >> FEATURE id="Chr5tp-4G10:Gain=32|loss=39" >> label="Chr5tp-4G10:Gain=32|loss=39" >> START 17417700 >> END 17583657 >> seems to be missing. >> All features have different ids so this is not a feature grouping >> artifact I presume. >> >> 2/ Same kind of issue with decipher and decipher_collapsed_36 sources. >> Only decipher is in the DAS registry. >> >> $ GET >> http://das.sanger.ac.uk/das/decipher/features?segment=5:16621834,16821833 > >> decipher >> $ GET >> http://das.sanger.ac.uk/das/decipher_collapsed_36/features?segment=5:16621834,16821833 > >> decipher_collapsed >> $ diff decipher decipher_collapsed >> 5c5 >> < > href="http://das.sanger.ac.uk:80/das/decipher/features"> >> --- >> > > href="http://das.sanger.ac.uk:80/das/decipher_collapsed_36/features"> >> >> Here 4 features are returned but only 1 displayed. >> >> So am I missing something? or these sources are data duplications. And >> is there a display problem? > > This is an identified issue - the DAS sources in both cases say DO NOT > BUMP - so we don't hence we cannot display all the features. We have > solved this for some sources by getting them to remove the > "0" entries in the stylesheets. > > From andy.jenkinson at ebi.ac.uk Wed Feb 18 16:18:59 2009 From: andy.jenkinson at ebi.ac.uk (Andy Jenkinson) Date: Wed, 18 Feb 2009 16:18:59 +0000 Subject: [DAS] comments - hierarchical features Message-ID: <499C34F3.8000801@ebi.ac.uk> Hi all, As you may know, soon a new revision of the DAS specification will be published. One of the features to be added is improved support for hierarchical features, and I'm looking for input regarding a detail of how this will be done. The plan is to replace the structure with something similar to the DAS/2 approach: parent features have concise elements that identify other (separate) child features. Child features have elements to represent the reciprocal relationship. This means the group data no longer needs to be duplicated when shared by several features, and groups can themselves have start/endpoints: ... start, end, notes and other verbose content ... ... content ... ... content ... Here, both contain references to each other representing the same link. However, it would be possible to represent the relationship even if only one feature links to the other: ... ... Therefore the option exists to omit the element from the specification entirely. Over the last couple of years we have seen DAS sources become more and more dense, and browsers wishing to display larger regions. As a result, there is significant pressure to minimise the verbosity of the XML response (there are other changes to the upcoming spec to help with this). Whilst DAS2's alternative content negotiation feature sidesteps the issue, DAS does not yet have this and in any case it is my belief that the fallback XML format should still be fit for purpose. The counter argument (i.e. the case for requiring both and elements) is based around the rendering efficiency benefits of streaming. If a client knows for sure that it has parsed all features that are related to each other, it can render them while it waits for the server to send the rest of the response. A client could potentially use this to offer a significant usability boost - a user's perception of the speed of an interface is greatly influenced by how fast a display starts to render rather than the time it takes to complete. But at the moment there are no DAS clients that use this (it is not possible with the current spec, and some clients such as Ensembl cannot due to the way the data is rendered). I am not sure to what extent it would be used in future either, for example it could not be used where post-processing of the entire set of features is necessary (e.g. binning). So my question is: should the specification require bi-directional references ( and ), or uni-directional ( only)? Whichever approach is taken, replacing the structure will significantly reduce verbosity for groups with large numbers of child features, but do we want to reduce this further by removing elements at the cost of the potential for "streaming"> Apologies for the long and technical post. Andy From andy.jenkinson at ebi.ac.uk Wed Feb 18 17:01:18 2009 From: andy.jenkinson at ebi.ac.uk (Andy Jenkinson) Date: Wed, 18 Feb 2009 17:01:18 +0000 Subject: [DAS] [proserver-users] how to not share private data but use proserver? In-Reply-To: References: Message-ID: <499C3EDE.8000409@ebi.ac.uk> Hi Catherine, A great question and one that has surfaced a number of times recently. There is no formal specification of how to do authentication within DAS, but it has been discussed and will hopefully be addressed soon. I am copying this to the DAS mailing list as I believe it's relevant. In the meantime, ProServer does contain an immature authentication framework that might be sufficient, depending on your situation. There are two implementations: "ip" and "http". I hope you will bear with me whilst I explain these: The former allows you to define an IP range whitelist so you can restrict access to certain machines, but there are two caveats: first, you cannot filter the IPs of your users' machines because the IP is unlikely to be forwarded by the DAS client (i.e. in your case Ensembl, I believe?). Thus all you can do is block requests that are not from Ensembl's webservers. So if somebody knows the URL of your DAS server, they can visualise the data through Ensembl. The second caveat is that IP addresses can be spoofed, so if a malicious party has the technical knowledge (and knows the URL) they can pretend to be within the allowed IP range. The second method is vastly more robust, but would require a change to Ensembl. It works by extracting a token from the DAS request (e.g. a header or parameter) and forwarding it to a known third party server to check if the request should be allowed or denied. This system is similar to how OpenID works, but was designed for use by Ensembl (wherein the token would be encrypted and the third party would be Ensembl itself). The idea was that you would be able to control access for specific users/groups via the Ensembl interface. Unfortunately it has yet to be implemented in the Ensembl web code. It has been suggested before to use simple HTTP user:password URL syntax (UCSC use this for BED files). To cut a long story short, this *might* work without needing to modify Ensembl, but despite appearances it's actually less secure than using IP filtering. Hope that's useful, Andy Catherine Leroy wrote: > Hi, > > I have kind of a 'sellfish' question. > > I would like to build my own internal Das Server so that my users > (post-docs) can visualize their unpublished data. > > From what I understand and what we tested, if somebody has the url of a > proserver server that is inside Sanger, then this somebody can have > access to the data served by this server even from outside the Sanger. > In my case, I really don't want that to happen. > > Is there a work around that? > > Thank you very much in advance, > Cheers, > Catherine > > > ------------------------------------------------------------------------ > > _______________________________________________ > proserver-users mailing list > proserver-users at sanger.ac.uk > http://lists.sanger.ac.uk/mailman/listinfo/proserver-users From david.nix at hci.utah.edu Wed Feb 18 17:35:56 2009 From: david.nix at hci.utah.edu (David Nix) Date: Wed, 18 Feb 2009 10:35:56 -0700 Subject: [DAS] [proserver-users] how to not share private data but use proserver? In-Reply-To: <499C3EDE.8000409@ebi.ac.uk> Message-ID: I've built an authentication method into the genoviz DAS/2 server. It may be appropriate for your use. Installation instructions are at http://bioserver.hci.utah.edu/BioInfo/index.php/Software:DAS2 . We use this to grant public/ private access to particular folders for ~20 different lab groups (we're a core facility). Once installed modify the restrictedDirectories.txt and users.txt files to define who can see what. The test install comes with some restricted data directories. This is DAS/2 not DAS and only IGB has been modified to make the appropriate handshake. Although this could be built into other browsers. -cheers, David -- David Austin Nix, PhD | HCI Bioinformatics | Huntsman Cancer Institute | 2000 Circle of Hope | SLC, UT 84112 | Rm: 3165 | Vc: 801.587.4611 | Fx: 801.585.6458 | david.nix at hci.utah.edu | http://bioserver.hci.utah.edu -cheers, David On 2/18/09 10:01 AM, "Andy Jenkinson" wrote: > Hi Catherine, > > A great question and one that has surfaced a number of times recently. > There is no formal specification of how to do authentication within DAS, > but it has been discussed and will hopefully be addressed soon. I am > copying this to the DAS mailing list as I believe it's relevant. > > In the meantime, ProServer does contain an immature authentication > framework that might be sufficient, depending on your situation. There > are two implementations: "ip" and "http". I hope you will bear with me > whilst I explain these: > > The former allows you to define an IP range whitelist so you can > restrict access to certain machines, but there are two caveats: first, > you cannot filter the IPs of your users' machines because the IP is > unlikely to be forwarded by the DAS client (i.e. in your case Ensembl, I > believe?). Thus all you can do is block requests that are not from > Ensembl's webservers. So if somebody knows the URL of your DAS server, > they can visualise the data through Ensembl. The second caveat is that > IP addresses can be spoofed, so if a malicious party has the technical > knowledge (and knows the URL) they can pretend to be within the allowed > IP range. > > The second method is vastly more robust, but would require a change to > Ensembl. It works by extracting a token from the DAS request (e.g. a > header or parameter) and forwarding it to a known third party server to > check if the request should be allowed or denied. This system is similar > to how OpenID works, but was designed for use by Ensembl (wherein the > token would be encrypted and the third party would be Ensembl itself). > The idea was that you would be able to control access for specific > users/groups via the Ensembl interface. Unfortunately it has yet to be > implemented in the Ensembl web code. > > It has been suggested before to use simple HTTP user:password URL syntax > (UCSC use this for BED files). To cut a long story short, this *might* > work without needing to modify Ensembl, but despite appearances it's > actually less secure than using IP filtering. > > Hope that's useful, > Andy > > Catherine Leroy wrote: >> Hi, >> >> I have kind of a 'sellfish' question. >> >> I would like to build my own internal Das Server so that my users >> (post-docs) can visualize their unpublished data. >> >> From what I understand and what we tested, if somebody has the url of a >> proserver server that is inside Sanger, then this somebody can have >> access to the data served by this server even from outside the Sanger. >> In my case, I really don't want that to happen. >> >> Is there a work around that? >> >> Thank you very much in advance, >> Cheers, >> Catherine >> >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> proserver-users mailing list >> proserver-users at sanger.ac.uk >> http://lists.sanger.ac.uk/mailman/listinfo/proserver-users > _______________________________________________ > DAS mailing list > DAS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das From andy.jenkinson at ebi.ac.uk Thu Feb 19 13:31:23 2009 From: andy.jenkinson at ebi.ac.uk (Andy Jenkinson) Date: Thu, 19 Feb 2009 13:31:23 +0000 Subject: [DAS] comments - hierarchical features In-Reply-To: References: <499C34F3.8000801@ebi.ac.uk> Message-ID: <499D5F2B.60109@ebi.ac.uk> Hi Chris, Thanks for the feedback. I suspect you may be right about the optimisation - lots of other pieces have to fall into place for it to work. Having subfeatures indicate their containers is indeed how GFF3 works (and is also how existing DAS works), but in most cases it's a fair bit more verbose. The other aspect I forgot to mention was the DAS-DAS2 transition. The parent/part syntax is borrowed directly from DAS2 as I'm keen to avoid more divergence when there remains a possibility of uniting them. If we don't keep both elements, this isn't so important though. Speaking personally, I'm not too worried about a lack of obviousness of the relationship for using parent/part as I believe it's reasonably obvious from the XML, but then again I already know what to expect. So I certainly value your perspective if you think it is significantly confusing? Cheers, Andy Chris Mungall wrote: > > I suggest you name relations such that the inverses and directionality > are obvious > > part_of / has_part > parent_of / child_of > has_parent / has_child > > But not > > part / parent > > The argument for specifying both seems like premature optimization. I > suggest you align what you're doing with GFF3 as far as possible and > have subfeatures indicate their containing features. > > On Feb 18, 2009, at 8:18 AM, Andy Jenkinson wrote: > >> Hi all, >> >> As you may know, soon a new revision of the DAS specification will be >> published. One of the features to be added is improved support for >> hierarchical features, and I'm looking for input regarding a detail of >> how this will be done. >> >> The plan is to replace the structure with something similar to >> the DAS/2 approach: parent features have concise elements that >> identify other (separate) child features. Child features have >> elements to represent the reciprocal relationship. This means the >> group data no longer needs to be duplicated when shared by several >> features, and groups can themselves have start/endpoints: >> >> >> >> >> ... start, end, notes and other verbose content ... >> >> >> >> ... content ... >> >> >> >> ... content ... >> >> >> Here, both contain references to each other representing the same >> link. However, it would be possible to represent the relationship even >> if only one feature links to the other: >> >> >> >> ... >> >> >> ... >> >> >> Therefore the option exists to omit the element from the >> specification entirely. Over the last couple of years we have seen DAS >> sources become more and more dense, and browsers wishing to display >> larger regions. As a result, there is significant pressure to minimise >> the verbosity of the XML response (there are other changes to the >> upcoming spec to help with this). Whilst DAS2's alternative content >> negotiation feature sidesteps the issue, DAS does not yet have this >> and in any case it is my belief that the fallback XML format should >> still be fit for purpose. >> >> The counter argument (i.e. the case for requiring both and >> elements) is based around the rendering efficiency benefits of >> streaming. If a client knows for sure that it has parsed all features >> that are related to each other, it can render them while it waits for >> the server to send the rest of the response. A client could >> potentially use this to offer a significant usability boost - a user's >> perception of the speed of an interface is greatly influenced by how >> fast a display starts to render rather than the time it takes to >> complete. But at the moment there are no DAS clients that use this (it >> is not possible with the current spec, and some clients such as >> Ensembl cannot due to the way the data is rendered). I am not sure to >> what extent it would be used in future either, for example it could >> not be used where post-processing of the entire set of features is >> necessary (e.g. binning). >> >> So my question is: should the specification require bi-directional >> references ( and ), or uni-directional ( only)? >> Whichever approach is taken, replacing the structure will >> significantly reduce verbosity for groups with large numbers of child >> features, but do we want to reduce this further by removing >> elements at the cost of the potential for "streaming"> >> >> Apologies for the long and technical post. >> Andy >> _______________________________________________ >> DAS mailing list >> DAS at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/das >> From lincoln.stein at gmail.com Thu Feb 19 17:05:28 2009 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Thu, 19 Feb 2009 12:05:28 -0500 Subject: [DAS] comments - hierarchical features In-Reply-To: <499D5F2B.60109@ebi.ac.uk> References: <499C34F3.8000801@ebi.ac.uk> <499D5F2B.60109@ebi.ac.uk> Message-ID: <6dce9a0b0902190905t4ce2b44eyc412d2b255e68bb9@mail.gmail.com> I am in favor of the bidirectional links, even though it is significantly more verbose. I find it convenient to extract a subfeature from the datastream and not lose the hierarchy; it is also a handy way to identify features that are not part of a larger hierachy. Although it adds some computational overhead, DAS does compress very nicely with standard LZH algorithms, and so stream overhead is not as bad as it seems. Lincoln On Thu, Feb 19, 2009 at 8:31 AM, Andy Jenkinson wrote: > Hi Chris, > > Thanks for the feedback. I suspect you may be right about the optimisation > - lots of other pieces have to fall into place for it to work. Having > subfeatures indicate their containers is indeed how GFF3 works (and is also > how existing DAS works), but in most cases it's a fair bit more verbose. > > The other aspect I forgot to mention was the DAS-DAS2 transition. The > parent/part syntax is borrowed directly from DAS2 as I'm keen to avoid more > divergence when there remains a possibility of uniting them. If we don't > keep both elements, this isn't so important though. > > Speaking personally, I'm not too worried about a lack of obviousness of the > relationship for using parent/part as I believe it's reasonably obvious from > the XML, but then again I already know what to expect. So I certainly value > your perspective if you think it is significantly confusing? > > Cheers, > Andy > > Chris Mungall wrote: > >> >> I suggest you name relations such that the inverses and directionality are >> obvious >> >> part_of / has_part >> parent_of / child_of >> has_parent / has_child >> >> But not >> >> part / parent >> >> The argument for specifying both seems like premature optimization. I >> suggest you align what you're doing with GFF3 as far as possible and have >> subfeatures indicate their containing features. >> >> >> On Feb 18, 2009, at 8:18 AM, Andy Jenkinson wrote: >> >> Hi all, >>> >>> As you may know, soon a new revision of the DAS specification will be >>> published. One of the features to be added is improved support for >>> hierarchical features, and I'm looking for input regarding a detail of how >>> this will be done. >>> >>> The plan is to replace the structure with something similar to >>> the DAS/2 approach: parent features have concise elements that >>> identify other (separate) child features. Child features have >>> elements to represent the reciprocal relationship. This means the group data >>> no longer needs to be duplicated when shared by several features, and groups >>> can themselves have start/endpoints: >>> >>> >>> >>> >>> ... start, end, notes and other verbose content ... >>> >>> >>> >>> ... content ... >>> >>> >>> >>> ... content ... >>> >>> >>> Here, both contain references to each other representing the same link. >>> However, it would be possible to represent the relationship even if only one >>> feature links to the other: >>> >>> >>> >>> ... >>> >>> >>> ... >>> >>> >>> Therefore the option exists to omit the element from the >>> specification entirely. Over the last couple of years we have seen DAS >>> sources become more and more dense, and browsers wishing to display larger >>> regions. As a result, there is significant pressure to minimise the >>> verbosity of the XML response (there are other changes to the upcoming spec >>> to help with this). Whilst DAS2's alternative content negotiation feature >>> sidesteps the issue, DAS does not yet have this and in any case it is my >>> belief that the fallback XML format should still be fit for purpose. >>> >>> The counter argument (i.e. the case for requiring both and >>> elements) is based around the rendering efficiency benefits of >>> streaming. If a client knows for sure that it has parsed all features that >>> are related to each other, it can render them while it waits for the server >>> to send the rest of the response. A client could potentially use this to >>> offer a significant usability boost - a user's perception of the speed of an >>> interface is greatly influenced by how fast a display starts to render >>> rather than the time it takes to complete. But at the moment there are no >>> DAS clients that use this (it is not possible with the current spec, and >>> some clients such as Ensembl cannot due to the way the data is rendered). I >>> am not sure to what extent it would be used in future either, for example it >>> could not be used where post-processing of the entire set of features is >>> necessary (e.g. binning). >>> >>> So my question is: should the specification require bi-directional >>> references ( and ), or uni-directional ( only)? >>> Whichever approach is taken, replacing the structure will >>> significantly reduce verbosity for groups with large numbers of child >>> features, but do we want to reduce this further by removing >>> elements at the cost of the potential for "streaming"> >>> >>> Apologies for the long and technical post. >>> Andy >>> _______________________________________________ >>> DAS mailing list >>> DAS at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/das >>> >>> _______________________________________________ > DAS mailing list > DAS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das > -- Lincoln D. Stein Director, Informatics and Biocomputing Platform Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Renata Musa From jeedward at yahoo.com Fri Feb 20 15:28:08 2009 From: jeedward at yahoo.com (John Edward) Date: Fri, 20 Feb 2009 07:28:08 -0800 (PST) Subject: [DAS] Draft paper submission deadline extended: BCBGC-09 Message-ID: <420051.25773.qm@web45909.mail.sp1.yahoo.com> Draft paper submission deadline extended: BCBGC-09 ? The deadline for draft paper submission at the 2009 International Conference on Bioinformatics, Computational Biology, Genomics and Chemoinformatics (BCBGC-09) (website: http://www.PromoteResearch.org ) is extended due to numerous requests from the authors. The conference will be held during July 13-16 2009 in Orlando, FL, USA. We invite draft paper submissions. The conference will take place at the same time and venue where several other international conferences are taking place. The other conferences include: ????????? International Conference on Artificial Intelligence and Pattern Recognition (AIPR-09) ????????? International Conference on Automation, Robotics and Control Systems (ARCS-09) ????????? International Conference on Enterprise Information Systems and Web Technologies (EISWT-09) ????????? International Conference on High Performance Computing, Networking and Communication Systems (HPCNCS-09) ????????? International Conference on Information Security and Privacy (ISP-09) ????????? International Conference on Recent Advances in Information Technology and Applications (RAITA-09) ????????? International Conference on Software Engineering Theory and Practice (SETP-09) ????????? International Conference on Theory and Applications of Computational Science (TACS-09) ????????? International Conference on Theoretical and Mathematical Foundations of Computer Science (TMFCS-09) ? The website http://www.PromoteResearch.org contains more details. ? Sincerely John Edward Publicity committee From gregghelt at gmail.com Mon Feb 23 13:29:56 2009 From: gregghelt at gmail.com (Gregg Helt) Date: Mon, 23 Feb 2009 08:29:56 -0500 Subject: [DAS] comments - hierarchical features In-Reply-To: <6dce9a0b0902190905t4ce2b44eyc412d2b255e68bb9@mail.gmail.com> References: <499C34F3.8000801@ebi.ac.uk> <499D5F2B.60109@ebi.ac.uk> <6dce9a0b0902190905t4ce2b44eyc412d2b255e68bb9@mail.gmail.com> Message-ID: <50158cb00902230529p65141bc3p55502b19e4583409@mail.gmail.com> I also prefer bidirectional links, both for parsing optimizations and for consistency with DAS/2. As far as naming of the elements, "parent" and "part" were chosen for DAS/2 after some discussion, but I don't there were any major justifications for those over a different name pair. Some may be wondering why not avoid link elements altogether and just represent feature hierarchies by allowing nested feature elements? So the A1 parent, B1 & B2 part relationship would look like: ... ... ... We considered this during development of DAS/2, but the main use case that argues against it is when multiple parents share children. For example if multiple transcripts share exons -- this is how alternative splicing is modeled in many GMOD databases. Gregg On Thu, Feb 19, 2009 at 12:05 PM, Lincoln Stein wrote: > I am in favor of the bidirectional links, even though it is significantly > more verbose. I find it convenient to extract a subfeature from the > datastream and not lose the hierarchy; it is also a handy way to identify > features that are not part of a larger hierachy. > > Although it adds some computational overhead, DAS does compress very nicely > with standard LZH algorithms, and so stream overhead is not as bad as it > seems. > > Lincoln > > On Thu, Feb 19, 2009 at 8:31 AM, Andy Jenkinson >wrote: > > > Hi Chris, > > > > Thanks for the feedback. I suspect you may be right about the > optimisation > > - lots of other pieces have to fall into place for it to work. Having > > subfeatures indicate their containers is indeed how GFF3 works (and is > also > > how existing DAS works), but in most cases it's a fair bit more verbose. > > > > The other aspect I forgot to mention was the DAS-DAS2 transition. The > > parent/part syntax is borrowed directly from DAS2 as I'm keen to avoid > more > > divergence when there remains a possibility of uniting them. If we don't > > keep both elements, this isn't so important though. > > > > Speaking personally, I'm not too worried about a lack of obviousness of > the > > relationship for using parent/part as I believe it's reasonably obvious > from > > the XML, but then again I already know what to expect. So I certainly > value > > your perspective if you think it is significantly confusing? > > > > Cheers, > > Andy > > > > Chris Mungall wrote: > > > >> > >> I suggest you name relations such that the inverses and directionality > are > >> obvious > >> > >> part_of / has_part > >> parent_of / child_of > >> has_parent / has_child > >> > >> But not > >> > >> part / parent > >> > >> The argument for specifying both seems like premature optimization. I > >> suggest you align what you're doing with GFF3 as far as possible and > have > >> subfeatures indicate their containing features. > >> > >> > >> On Feb 18, 2009, at 8:18 AM, Andy Jenkinson wrote: > >> > >> Hi all, > >>> > >>> As you may know, soon a new revision of the DAS specification will be > >>> published. One of the features to be added is improved support for > >>> hierarchical features, and I'm looking for input regarding a detail of > how > >>> this will be done. > >>> > >>> The plan is to replace the structure with something similar to > >>> the DAS/2 approach: parent features have concise elements that > >>> identify other (separate) child features. Child features have > >>> elements to represent the reciprocal relationship. This means the group > data > >>> no longer needs to be duplicated when shared by several features, and > groups > >>> can themselves have start/endpoints: > >>> > >>> > >>> > >>> > >>> ... start, end, notes and other verbose content ... > >>> > >>> > >>> > >>> ... content ... > >>> > >>> > >>> > >>> ... content ... > >>> > >>> > >>> Here, both contain references to each other representing the same link. > >>> However, it would be possible to represent the relationship even if > only one > >>> feature links to the other: > >>> > >>> > >>> > >>> ... > >>> > >>> > >>> ... > >>> > >>> > >>> Therefore the option exists to omit the element from the > >>> specification entirely. Over the last couple of years we have seen DAS > >>> sources become more and more dense, and browsers wishing to display > larger > >>> regions. As a result, there is significant pressure to minimise the > >>> verbosity of the XML response (there are other changes to the upcoming > spec > >>> to help with this). Whilst DAS2's alternative content negotiation > feature > >>> sidesteps the issue, DAS does not yet have this and in any case it is > my > >>> belief that the fallback XML format should still be fit for purpose. > >>> > >>> The counter argument (i.e. the case for requiring both and > >>> elements) is based around the rendering efficiency benefits of > >>> streaming. If a client knows for sure that it has parsed all features > that > >>> are related to each other, it can render them while it waits for the > server > >>> to send the rest of the response. A client could potentially use this > to > >>> offer a significant usability boost - a user's perception of the speed > of an > >>> interface is greatly influenced by how fast a display starts to render > >>> rather than the time it takes to complete. But at the moment there are > no > >>> DAS clients that use this (it is not possible with the current spec, > and > >>> some clients such as Ensembl cannot due to the way the data is > rendered). I > >>> am not sure to what extent it would be used in future either, for > example it > >>> could not be used where post-processing of the entire set of features > is > >>> necessary (e.g. binning). > >>> > >>> So my question is: should the specification require bi-directional > >>> references ( and ), or uni-directional ( only)? > >>> Whichever approach is taken, replacing the structure will > >>> significantly reduce verbosity for groups with large numbers of child > >>> features, but do we want to reduce this further by removing > >>> elements at the cost of the potential for "streaming"> > >>> > >>> Apologies for the long and technical post. > >>> Andy > >>> _______________________________________________ > >>> DAS mailing list > >>> DAS at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/das > >>> > >>> _______________________________________________ > > DAS mailing list > > DAS at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/das > > > > > > -- > Lincoln D. Stein > Director, Informatics and Biocomputing Platform > Ontario Institute for Cancer Research > 101 College St., Suite 800 > Toronto, ON, Canada M5G0A3 > 416 673-8514 > Assistant: Renata Musa > _______________________________________________ > DAS mailing list > DAS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das > From andy.jenkinson at ebi.ac.uk Mon Feb 23 13:34:02 2009 From: andy.jenkinson at ebi.ac.uk (Andy Jenkinson) Date: Mon, 23 Feb 2009 13:34:02 +0000 Subject: [DAS] comments - hierarchical features In-Reply-To: <50158cb00902230529p65141bc3p55502b19e4583409@mail.gmail.com> References: <499C34F3.8000801@ebi.ac.uk> <499D5F2B.60109@ebi.ac.uk> <6dce9a0b0902190905t4ce2b44eyc412d2b255e68bb9@mail.gmail.com> <50158cb00902230529p65141bc3p55502b19e4583409@mail.gmail.com> Message-ID: <49A2A5CA.9050100@ebi.ac.uk> Gregg Helt wrote: > Some may be wondering why not avoid link elements altogether and just > represent feature hierarchies by allowing nested feature elements? So > the A1 parent, B1 & B2 part relationship would look like: > > ... > ... > ... > > > We considered this during development of DAS/2, but the main use case > that argues against it is when multiple parents share children. For > example if multiple transcripts share exons -- this is how alternative > splicing is modeled in many GMOD databases. I think this model is quite common for other databases too. From christoph.gille at charite.de Mon Feb 23 23:10:01 2009 From: christoph.gille at charite.de (Dr. Christoph Gille) Date: Tue, 24 Feb 2009 00:10:01 +0100 (CET) Subject: [DAS] example program output In-Reply-To: References: Message-ID: This is my first post here. I embedded DAS some time ago. Here is an output example of the program: http://3d-alignment.eu/exampleOutput.html Please send suggestions and recommendations Christoph From lincoln.stein at gmail.com Tue Feb 24 13:04:14 2009 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Tue, 24 Feb 2009 08:04:14 -0500 Subject: [DAS] example program output In-Reply-To: References: Message-ID: <6dce9a0b0902240504s5cfd8597q1fbee0df03b6c8a3@mail.gmail.com> It's very nice! Is this program something that you're considering making public? Lincoln On Mon, Feb 23, 2009 at 6:10 PM, Dr. Christoph Gille < christoph.gille at charite.de> wrote: > This is my first post here. > I embedded DAS some time ago. > > Here is an output example of the program: > > http://3d-alignment.eu/exampleOutput.html > > Please send suggestions and recommendations > > > Christoph > > > _______________________________________________ > DAS mailing list > DAS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das > -- Lincoln D. Stein Director, Informatics and Biocomputing Platform Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Renata Musa From andy.jenkinson at ebi.ac.uk Tue Feb 24 17:32:09 2009 From: andy.jenkinson at ebi.ac.uk (Andy Jenkinson) Date: Tue, 24 Feb 2009 17:32:09 +0000 Subject: [DAS] [Fwd: [Bosc-announce] BOSC 2009 Call for Abstracts] Message-ID: <49A42F19.6090305@ebi.ac.uk> The following may be of interest to others on the list. BOSC is organised by the OBF, with which the BioDAS project is affiliated, and has been a good forum for DAS discussion. Cheers, Andy -------- Original Message -------- Subject: [Bosc-announce] BOSC 2009 Call for Abstracts Date: Tue, 24 Feb 2009 08:55:08 -0800 From: Kam Dahlquist To: kdahlquist at lmu.edu Please forward as appropriate and forgive multiple postings. Call for Abstracts for the 2009 Bioinformatics Open Source Conference (BOSC) 2009 An ISMB 2009 Special Interest Group (SIG) Date: June 27-28, 2009 Location: Stockholm, Sweden URL: http://www.open-bio.org/wiki/BOSC_2009 Abstract submission via EasyChair: https://www.easychair.org/login.cgi?conf=bosc2009 Important Dates Monday, April 13: Abstract deadline May 1, 2009: Notification of accepted abstracts May 15, 2009: Early Registration Discount Cut-off date June 27-28, 2009: BOSC 2009 The Bioinformatics Open Source Conference (BOSC) is sponsored by the Open Bioinformatics Foundation (O|B|F), a non-profit group dedicated to promoting the practice and philosophy of Open Source software development within the biological research community. To be considered for acceptance, software systems representing the central topic in a presentation submitted to BOSC must be licensed with a recognized Open Source License, and be freely available for download in source code form. Many open source bioinformatics packages are widely used by the research community across many application areas and form a cornerstone in enabling research in the genomic and post-genomic era. Open source bioinformatics software has facilitated rapid innovation, dissemination, and wide adoption of new computational methods, reusable software components, and standards. One of the hallmarks of BOSC is the coming together of the open source developer community in one location to meet face-to-face. This creates synergy where participants can work together to create use cases, prototype working code, or run bootcamps for developers from other projects as short, informal, and hands-on tutorials in new software packages and emerging technologies. In short, BOSC is not just a conference for presentations of completed work, but is a dynamic meeting where collaborative work gets done and attendees can learn about new or on-going developments that they can directly apply to their own work. This year?s conference will mark the 10th anniversary of BOSC. To celebrate the special occasion, the theme of this year?s conference is ?Looking Back and Looking Ahead: Open Source Solutions to Grand Challenges in Bioinformatics.? We are inviting abstracts for two different types of talks: 1. Descriptions of a particular open source software implementation by a member of the development team, especially in the areas listed below. 2. Reviews of open source software that compares and contrasts different solutions to the same bioinformatics problem. In addition, we are asking all speakers to come prepared to lead an informal tutorial on their software during a Birds of a Feather/hackathon session. This year?s topics include: Topics Design Patterns in Bioinformatics Regulatory Genomics Multicore and GPGPU computing Data & Analysis Management (shared session with DAM SIG) Computational Grids Visualization There will also be updates from O|B|F-sponsored projects, and as always, abstracts may be submitted for open source software that does not fit neatly into the above categories. Lightning Talks will also highlight very recent developments. To contact the organizing committee, e-mail bosc at open-bio.org. To sign up for BOSC-related announcements, subscribe to the Bosc-announce list (http://lists.open-bio.org/mailman/listinfo/bosc-announce). BOSC 2009 Organizing Committee Kam D. Dahlquist (Chair) Lonnie R. Welch (Co-chair) Hilmar Lapp Jens Lichtenberg Frank Drews Andrew Dalke Jim Procter Se?n I. O' Donoghue Anton Nekrutenko Steffen Moeller _______________________________________________ Bosc-announce mailing list Bosc-announce at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bosc-announce From jw12 at sanger.ac.uk Wed Feb 25 13:22:58 2009 From: jw12 at sanger.ac.uk (Jonathan Warren) Date: Wed, 25 Feb 2009 13:22:58 +0000 Subject: [DAS] das stylesheets Message-ID: <9783C660-4A35-48D6-9E61-B85EF2E85AE2@sanger.ac.uk> Warning! please note that if you are setting up a das server with a stylesheet, be sure to not have 0 as strange things happen in clients such as ensembl- remove these tags if you're not sure what to do with them! Thanks Jonathan See issue below posted to ensembl dev list: Abel Ureta-Vidal wrote: > > Hi, > > This is maybe more for the DAS/Web team. I'm interested in > structural variation data (CNV, chromosome gain/loss, etc....) > I've identified some data in Ensembl that is integrated via DAS > sources. I've identified the original DAS sources and > query them directly, no problem, they work as expected. > > I'd just like to report a couple of things on the Ensembl web > display side of things. > > 1/The live ensembl server points to 2 DAS sources for CNVs from > Redon et al paper (Nature 2006 I presume), > i.e. WGTP_loci_36 and WGTP_regions_36. > I could not find a clear description of the exact difference between > the 2, (what does WGTP stands for? Whole Genome > Tiling Platform?). Querying them directly and looking at their data, > it seems that these are 2 identical data sets > > $ GET http://das.sanger.ac.uk/das/WGTP_loci_36/features?segment=5:17198060,18198059 > > WGTP_loci_36 > $ GET http://das.sanger.ac.uk/das/WGTP_regions_36/features?segment=5:17198060,18198059 > > WGTP_regions_36 > $ diff WGTP_loci_36 WGTP_regions_36 > 5c5 > < > --- > > > > If I query the complete data from chromosome 5, I get exactly the > same **none** difference. These 2 sources > seem to be identical. Only WGTP_regions_36 is listed and validate in > the DAS registry (www.dasregistry.org). Should > this one only be considered as the reference? > > On the short region I queried 5:17198060,18198059, 5 features come > back. But I can only see 4 in the Ensembl Detailed View > FEATURE id="Chr5tp-4G10:Gain=32|loss=39" label="Chr5tp-4G10:Gain=32| > loss=39" > START 17417700 > END 17583657 > seems to be missing. > All features have different ids so this is not a feature grouping > artifact I presume. > > 2/ Same kind of issue with decipher and decipher_collapsed_36 > sources. Only decipher is in the DAS registry. > > $ GET http://das.sanger.ac.uk/das/decipher/features?segment=5:16621834,16821833 > > decipher > $ GET http://das.sanger.ac.uk/das/decipher_collapsed_36/features?segment=5:16621834,16821833 > > decipher_collapsed > $ diff decipher decipher_collapsed > 5c5 > < > --- > > > > Here 4 features are returned but only 1 displayed. > > So am I missing something? or these sources are data duplications. > And is there a display problem? This is an identified issue - the DAS sources in both cases say DO NOT BUMP - so we don't hence we cannot display all the features. We have solved this for some sources by getting them to remove the "0" entries in the stylesheets. -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From andy.jenkinson at ebi.ac.uk Wed Feb 25 15:47:17 2009 From: andy.jenkinson at ebi.ac.uk (Andy Jenkinson) Date: Wed, 25 Feb 2009 15:47:17 +0000 Subject: [DAS] das stylesheets In-Reply-To: <9783C660-4A35-48D6-9E61-B85EF2E85AE2@sanger.ac.uk> References: <9783C660-4A35-48D6-9E61-B85EF2E85AE2@sanger.ac.uk> Message-ID: <49A56805.4020506@ebi.ac.uk> I think the prevalence of this issue may come from the fact that ProServer's demo stylesheet has this instruction in it. I have updated the subversion repository to remove it. Jonathan Warren wrote: > Warning! > > please note that if you are setting up a das server with a stylesheet, > be sure to not have 0 as strange things happen in clients > such as ensembl- remove these tags if you're not sure what to do with them! > > Thanks > > Jonathan > > See issue below posted to ensembl dev list: > > Abel Ureta-Vidal wrote: >> >> Hi, >> >> This is maybe more for the DAS/Web team. I'm interested in structural >> variation data (CNV, chromosome gain/loss, etc....) >> I've identified some data in Ensembl that is integrated via DAS >> sources. I've identified the original DAS sources and >> query them directly, no problem, they work as expected. >> >> I'd just like to report a couple of things on the Ensembl web display >> side of things. >> >> 1/The live ensembl server points to 2 DAS sources for CNVs from Redon >> et al paper (Nature 2006 I presume), >> i.e. WGTP_loci_36 and WGTP_regions_36. >> I could not find a clear description of the exact difference between >> the 2, (what does WGTP stands for? Whole Genome >> Tiling Platform?). Querying them directly and looking at their data, >> it seems that these are 2 identical data sets >> >> $ GET >> http://das.sanger.ac.uk/das/WGTP_loci_36/features?segment=5:17198060,18198059 > >> WGTP_loci_36 >> $ GET >> http://das.sanger.ac.uk/das/WGTP_regions_36/features?segment=5:17198060,18198059 > >> WGTP_regions_36 >> $ diff WGTP_loci_36 WGTP_regions_36 >> 5c5 >> < > href="http://das.sanger.ac.uk:80/das/WGTP_loci_36/features"> >> --- >> > > href="http://das.sanger.ac.uk:80/das/WGTP_regions_36/features"> >> >> If I query the complete data from chromosome 5, I get exactly the same >> **none** difference. These 2 sources >> seem to be identical. Only WGTP_regions_36 is listed and validate in >> the DAS registry (www.dasregistry.org). Should >> this one only be considered as the reference? >> >> On the short region I queried 5:17198060,18198059, 5 features come >> back. But I can only see 4 in the Ensembl Detailed View >> FEATURE id="Chr5tp-4G10:Gain=32|loss=39" >> label="Chr5tp-4G10:Gain=32|loss=39" >> START 17417700 >> END 17583657 >> seems to be missing. >> All features have different ids so this is not a feature grouping >> artifact I presume. >> >> 2/ Same kind of issue with decipher and decipher_collapsed_36 sources. >> Only decipher is in the DAS registry. >> >> $ GET >> http://das.sanger.ac.uk/das/decipher/features?segment=5:16621834,16821833 > >> decipher >> $ GET >> http://das.sanger.ac.uk/das/decipher_collapsed_36/features?segment=5:16621834,16821833 > >> decipher_collapsed >> $ diff decipher decipher_collapsed >> 5c5 >> < > href="http://das.sanger.ac.uk:80/das/decipher/features"> >> --- >> > > href="http://das.sanger.ac.uk:80/das/decipher_collapsed_36/features"> >> >> Here 4 features are returned but only 1 displayed. >> >> So am I missing something? or these sources are data duplications. And >> is there a display problem? > > This is an identified issue - the DAS sources in both cases say DO NOT > BUMP - so we don't hence we cannot display all the features. We have > solved this for some sources by getting them to remove the > "0" entries in the stylesheets. > >