From jw12 at sanger.ac.uk Tue Mar 1 05:09:28 2011 From: jw12 at sanger.ac.uk (Jonathan Warren) Date: Tue, 1 Mar 2011 10:09:28 +0000 Subject: [DAS] RELAX NG schema for "sequence" command In-Reply-To: <0F015AE7-09D7-4F4D-B547-353830FD82A2@broadinstitute.org> References: <0F015AE7-09D7-4F4D-B547-353830FD82A2@broadinstitute.org> Message-ID: <563D9C80-555A-4866-9E74-F04FF5458E27@sanger.ac.uk> This is fixed now. On 28 Feb 2011, at 18:34, Andrew Roberts wrote: > Hello, > > I'm working on some validations, and I think I may have found an > error in sequence.rng (www.dasregistry.org/validation1.6/ > sequence.rng). It appears that this schema makes no provision for > the actual text data that belongs to the sequence element (i.e. the > bases themselves). Is this a known issue? > > If not, I think the solution is to add a element inside the > sequence element. Line 10 should read, > > attribute> attribute> attribute> text> > > Cheers, > > Andrew > _______________________________________________ > DAS mailing list > DAS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das Jonathan Warren Senior Developer and DAS coordinator blog: http://biodasman.wordpress.com/ jw12 at sanger.ac.uk Ext: 2314 Telephone: 01223 492314 -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From aroberts at broadinstitute.org Tue Mar 1 10:19:13 2011 From: aroberts at broadinstitute.org (Andrew Roberts) Date: Tue, 1 Mar 2011 10:19:13 -0500 Subject: [DAS] RELAX NG schema for "sequence" command In-Reply-To: <563D9C80-555A-4866-9E74-F04FF5458E27@sanger.ac.uk> References: <0F015AE7-09D7-4F4D-B547-353830FD82A2@broadinstitute.org> <563D9C80-555A-4866-9E74-F04FF5458E27@sanger.ac.uk> Message-ID: <1FD158A0-4032-48A2-AA86-383F9487BB52@broadinstitute.org> Alright, now I'm hitting the lack of support for ERRORSEGMENT within the sequence command response. I think there needs to be a choice inside the oneOrMore, so that elements can be SEQUENCE or ERRORSEGMENT. I attached what I'm using as a testing schema in the meantime, hopefully it's helpful. Andrew -------------- next part -------------- A non-text attachment was scrubbed... Name: sequence.rng Type: application/octet-stream Size: 1267 bytes Desc: not available URL: -------------- next part -------------- On Mar 1, 2011, at 5:09 AM, Jonathan Warren wrote: > This is fixed now. > > On 28 Feb 2011, at 18:34, Andrew Roberts wrote: > >> Hello, >> >> I'm working on some validations, and I think I may have found an >> error in sequence.rng (www.dasregistry.org/validation1.6/sequence.rng) >> . It appears that this schema makes no provision for the actual >> text data that belongs to the sequence element (i.e. the bases >> themselves). Is this a known issue? >> >> If not, I think the solution is to add a element inside >> the sequence element. Line 10 should read, >> >> > attribute>> attribute>> attribute>> text> >> >> Cheers, >> >> Andrew >> _______________________________________________ >> DAS mailing list >> DAS at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/das > > Jonathan Warren > Senior Developer and DAS coordinator > blog: http://biodasman.wordpress.com/ > jw12 at sanger.ac.uk > Ext: 2314 > Telephone: 01223 492314 > > > > > > > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome > ResearchLimited, a charity registered in England with number 1021457 > and acompany registered in England with number 2742969, whose > registeredoffice is 215 Euston Road, London, NW1 2BE. From jw12 at sanger.ac.uk Tue Mar 1 10:54:05 2011 From: jw12 at sanger.ac.uk (Jonathan Warren) Date: Tue, 1 Mar 2011 15:54:05 +0000 Subject: [DAS] RELAX NG schema for "sequence" command In-Reply-To: <1FD158A0-4032-48A2-AA86-383F9487BB52@broadinstitute.org> References: <0F015AE7-09D7-4F4D-B547-353830FD82A2@broadinstitute.org> <563D9C80-555A-4866-9E74-F04FF5458E27@sanger.ac.uk> <1FD158A0-4032-48A2-AA86-383F9487BB52@broadinstitute.org> Message-ID: <2516E6D2-3584-49AB-8648-14DECF48205D@sanger.ac.uk> Hi Andrew I guess this depends on what you are using the RNG for. What are you using it for? The registry needs the result from a sequence request to conform to the sequence.rng as it is on the registry as if a sequence is requested the test region put into the registry should return a sequence, if it returns an ERRORSEGMENT the the registry wants that reported as invalid. If I add the errorsegment to that document as you suggest the registry won't pick it up as invalid. On 1 Mar 2011, at 15:19, Andrew Roberts wrote: > Alright, now I'm hitting the lack of support for ERRORSEGMENT > within the sequence command response. I think there needs to be a > choice inside the oneOrMore, so that elements can be SEQUENCE or > ERRORSEGMENT. I attached what I'm using as a testing schema in the > meantime, hopefully it's helpful. > > Andrew > > > > On Mar 1, 2011, at 5:09 AM, Jonathan Warren wrote: > >> This is fixed now. >> >> On 28 Feb 2011, at 18:34, Andrew Roberts wrote: >> >>> Hello, >>> >>> I'm working on some validations, and I think I may have found an >>> error in sequence.rng (www.dasregistry.org/validation1.6/sequence.rng) >>> . It appears that this schema makes no provision for the actual >>> text data that belongs to the sequence element (i.e. the bases >>> themselves). Is this a known issue? >>> >>> If not, I think the solution is to add a element inside >>> the sequence element. Line 10 should read, >>> >>> >> attribute>>> attribute>>> attribute>>> text> >>> >>> Cheers, >>> >>> Andrew >>> _______________________________________________ >>> DAS mailing list >>> DAS at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/das >> >> Jonathan Warren >> Senior Developer and DAS coordinator >> blog: http://biodasman.wordpress.com/ >> jw12 at sanger.ac.uk >> Ext: 2314 >> Telephone: 01223 492314 >> >> >> >> >> >> >> >> >> >> -- >> The Wellcome Trust Sanger Institute is operated by Genome >> ResearchLimited, a charity registered in England with number >> 1021457 and acompany registered in England with number 2742969, >> whose registeredoffice is 215 Euston Road, London, NW1 2BE. > Jonathan Warren Senior Developer and DAS coordinator blog: http://biodasman.wordpress.com/ jw12 at sanger.ac.uk Ext: 2314 Telephone: 01223 492314 -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From andy.jenkinson at ebi.ac.uk Tue Mar 1 11:19:12 2011 From: andy.jenkinson at ebi.ac.uk (Andy Jenkinson) Date: Tue, 1 Mar 2011 16:19:12 +0000 Subject: [DAS] RELAX NG schema for "sequence" command In-Reply-To: <2516E6D2-3584-49AB-8648-14DECF48205D@sanger.ac.uk> References: <0F015AE7-09D7-4F4D-B547-353830FD82A2@broadinstitute.org> <563D9C80-555A-4866-9E74-F04FF5458E27@sanger.ac.uk> <1FD158A0-4032-48A2-AA86-383F9487BB52@broadinstitute.org> <2516E6D2-3584-49AB-8648-14DECF48205D@sanger.ac.uk> Message-ID: <45146209-B676-4307-B020-C87AFFFD0235@ebi.ac.uk> To put another way, the ERRORSEGMENT element is only valid if the server supports the error-segment capability. I guess the issue is that the only use of the RNG schemas upto now for validation has been by the registry. Personally from a technical point of view, I think including the errorsegment element in the RNG schemas (this will apply to many commands not just sequence) is the "correct" interpretation. However that means the registry would need to apply it's errorsegment test separately rather than relying on the rng to flag it as invalid. It already does the reverse of this test separately so this makes sense to me (i.e. if I ask for a bad segment do you give me an errorsegment element?). Cheers, Andy On 1 Mar 2011, at 15:54, Jonathan Warren wrote: > Hi Andrew > > I guess this depends on what you are using the RNG for. What are you using it for? The registry needs the result from a sequence request to conform to the sequence.rng as it is on the registry as if a sequence is requested the test region put into the registry should return a sequence, if it returns an ERRORSEGMENT the the registry wants that reported as invalid. If I add the errorsegment to that document as you suggest the registry won't pick it up as invalid. > > On 1 Mar 2011, at 15:19, Andrew Roberts wrote: > >> Alright, now I'm hitting the lack of support for ERRORSEGMENT within the sequence command response. I think there needs to be a choice inside the oneOrMore, so that elements can be SEQUENCE or ERRORSEGMENT. I attached what I'm using as a testing schema in the meantime, hopefully it's helpful. >> >> Andrew >> >> >> >> On Mar 1, 2011, at 5:09 AM, Jonathan Warren wrote: >> >>> This is fixed now. >>> >>> On 28 Feb 2011, at 18:34, Andrew Roberts wrote: >>> >>>> Hello, >>>> >>>> I'm working on some validations, and I think I may have found an error in sequence.rng (www.dasregistry.org/validation1.6/sequence.rng). It appears that this schema makes no provision for the actual text data that belongs to the sequence element (i.e. the bases themselves). Is this a known issue? >>>> >>>> If not, I think the solution is to add a element inside the sequence element. Line 10 should read, >>>> >>>> >>>> >>>> Cheers, >>>> >>>> Andrew >>>> _______________________________________________ >>>> DAS mailing list >>>> DAS at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/das >>> >>> Jonathan Warren >>> Senior Developer and DAS coordinator >>> blog: http://biodasman.wordpress.com/ >>> jw12 at sanger.ac.uk >>> Ext: 2314 >>> Telephone: 01223 492314 >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> -- >>> The Wellcome Trust Sanger Institute is operated by Genome ResearchLimited, a charity registered in England with number 1021457 and acompany registered in England with number 2742969, whose registeredoffice is 215 Euston Road, London, NW1 2BE. >> > > Jonathan Warren > Senior Developer and DAS coordinator > blog: http://biodasman.wordpress.com/ > jw12 at sanger.ac.uk > Ext: 2314 > Telephone: 01223 492314 > > > > > > > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome ResearchLimited, a charity registered in England with number 1021457 and acompany registered in England with number 2742969, whose registeredoffice is 215 Euston Road, London, NW1 2BE._______________________________________________ > DAS mailing list > DAS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das From aroberts at broadinstitute.org Tue Mar 1 11:20:38 2011 From: aroberts at broadinstitute.org (Andrew Roberts) Date: Tue, 1 Mar 2011 11:20:38 -0500 Subject: [DAS] RELAX NG schema for "sequence" command In-Reply-To: <2516E6D2-3584-49AB-8648-14DECF48205D@sanger.ac.uk> References: <0F015AE7-09D7-4F4D-B547-353830FD82A2@broadinstitute.org> <563D9C80-555A-4866-9E74-F04FF5458E27@sanger.ac.uk> <1FD158A0-4032-48A2-AA86-383F9487BB52@broadinstitute.org> <2516E6D2-3584-49AB-8648-14DECF48205D@sanger.ac.uk> Message-ID: <3E08E18F-CE0A-4B3F-A3FA-7A0EF5269E95@broadinstitute.org> Ah, interesting. I'm exercising my implementations of the das commands, and then validating the XML against the rng as a correctness test. I had been assuming that the rng was a description of all possible server responses, including error cases, but I now see what you guys are using them for. Thanks! On Mar 1, 2011, at 10:54 AM, Jonathan Warren wrote: > Hi Andrew > > I guess this depends on what you are using the RNG for. What are you using it for? The registry needs the result from a sequence request to conform to the sequence.rng as it is on the registry as if a sequence is requested the test region put into the registry should return a sequence, if it returns an ERRORSEGMENT the the registry wants that reported as invalid. If I add the errorsegment to that document as you suggest the registry won't pick it up as invalid. > > On 1 Mar 2011, at 15:19, Andrew Roberts wrote: > >> Alright, now I'm hitting the lack of support for ERRORSEGMENT within the sequence command response. I think there needs to be a choice inside the oneOrMore, so that elements can be SEQUENCE or ERRORSEGMENT. I attached what I'm using as a testing schema in the meantime, hopefully it's helpful. >> >> Andrew >> >> >> >> On Mar 1, 2011, at 5:09 AM, Jonathan Warren wrote: >> >>> This is fixed now. >>> >>> On 28 Feb 2011, at 18:34, Andrew Roberts wrote: >>> >>>> Hello, >>>> >>>> I'm working on some validations, and I think I may have found an error in sequence.rng (www.dasregistry.org/validation1.6/sequence.rng). It appears that this schema makes no provision for the actual text data that belongs to the sequence element (i.e. the bases themselves). Is this a known issue? >>>> >>>> If not, I think the solution is to add a element inside the sequence element. Line 10 should read, >>>> >>>> >>>> >>>> Cheers, >>>> >>>> Andrew >>>> _______________________________________________ >>>> DAS mailing list >>>> DAS at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/das >>> >>> Jonathan Warren >>> Senior Developer and DAS coordinator >>> blog: http://biodasman.wordpress.com/ >>> jw12 at sanger.ac.uk >>> Ext: 2314 >>> Telephone: 01223 492314 >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> -- >>> The Wellcome Trust Sanger Institute is operated by Genome ResearchLimited, a charity registered in England with number 1021457 and acompany registered in England with number 2742969, whose registeredoffice is 215 Euston Road, London, NW1 2BE. >> > > Jonathan Warren > Senior Developer and DAS coordinator > blog: http://biodasman.wordpress.com/ > jw12 at sanger.ac.uk > Ext: 2314 > Telephone: 01223 492314 > > > > > > > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome ResearchLimited, a charity registered in England with number 1021457 and acompany registered in England with number 2742969, whose registeredoffice is 215 Euston Road, London, NW1 2BE. From jw12 at sanger.ac.uk Tue Mar 1 11:33:51 2011 From: jw12 at sanger.ac.uk (Jonathan Warren) Date: Tue, 1 Mar 2011 16:33:51 +0000 Subject: [DAS] RELAX NG schema for "sequence" command In-Reply-To: <3E08E18F-CE0A-4B3F-A3FA-7A0EF5269E95@broadinstitute.org> References: <0F015AE7-09D7-4F4D-B547-353830FD82A2@broadinstitute.org> <563D9C80-555A-4866-9E74-F04FF5458E27@sanger.ac.uk> <1FD158A0-4032-48A2-AA86-383F9487BB52@broadinstitute.org> <2516E6D2-3584-49AB-8648-14DECF48205D@sanger.ac.uk> <3E08E18F-CE0A-4B3F-A3FA-7A0EF5269E95@broadinstitute.org> Message-ID: <2CA04BEC-F6A4-4AC4-B30C-06F0D647F235@sanger.ac.uk> If your data source is publicaly available you can test it by making a request like this: http://www.dasregistry.org/validate?url=http://das.sanger.ac.uk/das/otter_das/&testcode=22:19173435,19372173 which returns properties for the status of each command. Or you can use the validation page on the registry interface to do the same thing. Some people here are working on some XSD documents in order to use JAXB which will probably be of use to you as they should be structured more as you suggest. We maybe need to make it more clear how we use these RNG documents in the documentation. Thanks for the feedback! On 1 Mar 2011, at 16:20, Andrew Roberts wrote: > Ah, interesting. I'm exercising my implementations of the das > commands, and then validating the XML against the rng as a > correctness test. I had been assuming that the rng was a description > of all possible server responses, including error cases, but I now > see what you guys are using them for. > > Thanks! > > > > On Mar 1, 2011, at 10:54 AM, Jonathan Warren > wrote: > >> Hi Andrew >> >> I guess this depends on what you are using the RNG for. What are >> you using it for? The registry needs the result from a sequence >> request to conform to the sequence.rng as it is on the registry as >> if a sequence is requested the test region put into the registry >> should return a sequence, if it returns an ERRORSEGMENT the the >> registry wants that reported as invalid. If I add the errorsegment >> to that document as you suggest the registry won't pick it up as >> invalid. >> >> On 1 Mar 2011, at 15:19, Andrew Roberts wrote: >> >>> Alright, now I'm hitting the lack of support for ERRORSEGMENT >>> within the sequence command response. I think there needs to be a >>> choice inside the oneOrMore, so that elements can be SEQUENCE or >>> ERRORSEGMENT. I attached what I'm using as a testing schema in >>> the meantime, hopefully it's helpful. >>> >>> Andrew >>> >>> >>> >>> On Mar 1, 2011, at 5:09 AM, Jonathan Warren wrote: >>> >>>> This is fixed now. >>>> >>>> On 28 Feb 2011, at 18:34, Andrew Roberts wrote: >>>> >>>>> Hello, >>>>> >>>>> I'm working on some validations, and I think I may have found >>>>> an error in sequence.rng (www.dasregistry.org/validation1.6/sequence.rng) >>>>> . It appears that this schema makes no provision for the actual >>>>> text data that belongs to the sequence element (i.e. the bases >>>>> themselves). Is this a known issue? >>>>> >>>>> If not, I think the solution is to add a element >>>>> inside the sequence element. Line 10 should read, >>>>> >>>>> >>>> attribute>>>>> attribute>>>>> attribute>>>>> text> >>>>> >>>>> Cheers, >>>>> >>>>> Andrew >>>>> _______________________________________________ >>>>> DAS mailing list >>>>> DAS at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/das >>>> >>>> Jonathan Warren >>>> Senior Developer and DAS coordinator >>>> blog: http://biodasman.wordpress.com/ >>>> jw12 at sanger.ac.uk >>>> Ext: 2314 >>>> Telephone: 01223 492314 >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> -- >>>> The Wellcome Trust Sanger Institute is operated by Genome >>>> ResearchLimited, a charity registered in England with number >>>> 1021457 and acompany registered in England with number 2742969, >>>> whose registeredoffice is 215 Euston Road, London, NW1 2BE. >>> >> >> Jonathan Warren >> Senior Developer and DAS coordinator >> blog: http://biodasman.wordpress.com/ >> jw12 at sanger.ac.uk >> Ext: 2314 >> Telephone: 01223 492314 >> >> >> >> >> >> >> >> >> >> -- >> The Wellcome Trust Sanger Institute is operated by Genome >> ResearchLimited, a charity registered in England with number >> 1021457 and acompany registered in England with number 2742969, >> whose registeredoffice is 215 Euston Road, London, NW1 2BE. Jonathan Warren Senior Developer and DAS coordinator blog: http://biodasman.wordpress.com/ jw12 at sanger.ac.uk Ext: 2314 Telephone: 01223 492314 -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From thomas.a.down at gmail.com Sat Mar 5 10:01:42 2011 From: thomas.a.down at gmail.com (Thomas Down) Date: Sat, 5 Mar 2011 15:01:42 +0000 Subject: [DAS] Adjacent feature extension Message-ID: Following on from a discussion at the DAS Workshop: DAS doesn't have offer any specific support for clients that provide mechanisms for skipping from the current position to the "next" or "previous" feature. I'd like to propose a small extension (one extra filter option on the existing "features" command) to facilitate this. Full details are here: https://github.com/dasmoth/dalliance/wiki/AdjacentFeatures ...but briefly, a request like: /das/features?adjacent=chr21:30000000 ...would be expected to return a standard DASGFF document containing the two features either side of the specified point. All comments are welcome! One open question is whether the query should * just* return the adjacent features, or should also return other features overlapping the adjacent feature. My preference is for them former, but the latter does have the merit of being quirk-wards compatible with the existing feature_id filter. Does anyone else have strong feelings one way or another. There isn't currently a full implementation of this, but if nobody comes up with major objections, I'm hoping to try implementations in Dazzle (server) and Dalliance (client) within the next couple of weeks. Thomas. From andy.jenkinson at ebi.ac.uk Mon Mar 7 05:04:08 2011 From: andy.jenkinson at ebi.ac.uk (Andy Jenkinson) Date: Mon, 7 Mar 2011 10:04:08 +0000 Subject: [DAS] Adjacent feature extension In-Reply-To: References: Message-ID: <4CB1186B-D5F0-4750-A297-88538BB196B5@ebi.ac.uk> Hi Thomas, Thanks for this. Regarding the option of whether to return just one feature per side or all overlapping features, the only other advantage that immediately springs to mind for the latter (in addition to some measure of consistency, as you mention) is that it allows the client to immediately render the exact region of that feature without triggering another request. It would generally mean changing zoom level. I'm can't say if clients are likely to follow this mechanism as opposed to, say, pan and centre on the feature, but if they wanted to it would be more efficient (and possibly a little bit more efficient anyway depending on how your client does its requests). Disadvantages I can think of: - "adjacent" request takes marginally longer - not quite as obvious what clients should put in their UI controls - need to pick a feature to be able to do "jump to BRCA1" - risk of servers not implementing it correctly and only returning one feature anyway (although I don't think this is likely as the concept is different to "feature-by-id") Some things to further define: - servers can't return a fake feature - should servers return features on different reference sequences if there are none one the current one? - how should servers treat features that overlap the adjacent range? Treat them as the adjacent feature to return, or only include features completely outside the query range? What if the next feature completely outside the query range is part of the same feature hierarchy (e.g. an exon outside the current window). Any thoughts from anyone on these? Cheers, Andy On 5 Mar 2011, at 15:01, Thomas Down wrote: > Following on from a discussion at the DAS Workshop: > > DAS doesn't have offer any specific support for clients that provide > mechanisms for skipping from the current position to the "next" or > "previous" feature. I'd like to propose a small extension (one extra filter > option on the existing "features" command) to facilitate this. Full details > are here: > > https://github.com/dasmoth/dalliance/wiki/AdjacentFeatures > > ...but briefly, a request like: > > /das/features?adjacent=chr21:30000000 > > ...would be expected to return a standard DASGFF document containing the two > features either side of the specified point. > > All comments are welcome! One open question is whether the query should * > just* return the adjacent features, or should also return other features > overlapping the adjacent feature. My preference is for them former, but the > latter does have the merit of being quirk-wards compatible with the existing > feature_id filter. Does anyone else have strong feelings one way or > another. > > There isn't currently a full implementation of this, but if nobody comes up > with major objections, I'm hoping to try implementations in Dazzle (server) > and Dalliance (client) within the next couple of weeks. > > Thomas. > _______________________________________________ > DAS mailing list > DAS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das From thomas.a.down at gmail.com Mon Mar 7 05:35:02 2011 From: thomas.a.down at gmail.com (Thomas Down) Date: Mon, 7 Mar 2011 10:35:02 +0000 Subject: [DAS] Adjacent feature extension In-Reply-To: <4CB1186B-D5F0-4750-A297-88538BB196B5@ebi.ac.uk> References: <4CB1186B-D5F0-4750-A297-88538BB196B5@ebi.ac.uk> Message-ID: On Mon, Mar 7, 2011 at 10:04 AM, Andy Jenkinson wrote: > Hi Thomas, > > Thanks for this. Regarding the option of whether to return just one feature > per side or all overlapping features, the only other advantage that > immediately springs to mind for the latter (in addition to some measure of > consistency, as you mention) is that it allows the client to immediately > render the exact region of that feature without triggering another request. > It would generally mean changing zoom level. I'm can't say if clients are > likely to follow this mechanism as opposed to, say, pan and centre on the > feature, but if they wanted to it would be more efficient (and possibly a > little bit more efficient anyway depending on how your client does its > requests). > Yep, I agree. I'd be interested to learn whether there are any clients that would seriously consider taking advantage of this. My own thinking is that even if we do adjust zoom level (as Dalliance sometimes does, e.g. in the "jump to gene..." navigation op), clients are much more likely to zoom to a view that contains the target feature plus a "sensible" amount of flanking sequence, rather than a view where the target feature is perfectly framed. Furthermore, this rather seems like optimizing for the case where only one annotation source is active. Surely we're talking about the *distributed*annotation system, and clients will still have to go off and query all the other annotation sources, even if they are able to skip the one which responded to the "adjacent" query. So long as there's some kind of query parallelization in place, this probably isn't a performance issue. Do any other client developers feel differently? > Disadvantages I can think of: > - "adjacent" request takes marginally longer > - not quite as obvious what clients should put in their UI controls - need > to pick a feature to be able to do "jump to BRCA1" > - risk of servers not implementing it correctly and only returning one > feature anyway (although I don't think this is likely as the concept is > different to "feature-by-id") > > Some things to further define: > - servers can't return a fake feature > Yep, will clarify this. > - should servers return features on different reference sequences if there > are none one the current one? > In my opinion, absolutely yes. Otherwise the "10 features in the genome" case remains a massive pain (and potentially a disaster, for inhomogeneous-dstributed data; won't someone think of the MHC tiling arrays? :-). And even worse for the "10 features in UniProt" case (where I can also see this feature being quite interesting). I've tried to be explicit about this in my proposal (see the penultimate paragraph + example 3), but any suggestions for further clarifications are welcome. > - how should servers treat features that overlap the adjacent range? Treat > them as the adjacent feature to return, or only include features completely > outside the query range? What if the next feature completely outside the > query range is part of the same feature hierarchy (e.g. an exon outside the > current window). > It's a point rather than a range, but yes I agree this is still an open question. I'd actually written the spec such that overlapping features do get returned (on the assumption that clients will do "trivial" cases of next/previous feature in-memory without a network round trip), but again if other client developers do things differently, I'd like to know. I think "include overlapping" will have less special-cases to worry about, though. e.g. the PART/PARENT issue you allude to. Let clients deal with that ("dumb servers, smart clients"). Thomas. From ljgarcia at ebi.ac.uk Mon Mar 7 05:49:09 2011 From: ljgarcia at ebi.ac.uk (Leyla Garcia) Date: Mon, 07 Mar 2011 10:49:09 +0000 Subject: [DAS] Adjacent feature extension In-Reply-To: References: <4CB1186B-D5F0-4750-A297-88538BB196B5@ebi.ac.uk> Message-ID: <4D74B825.6080509@ebi.ac.uk> On 07/03/2011 10:35, Thomas Down wrote: > On Mon, Mar 7, 2011 at 10:04 AM, Andy Jenkinsonwrote: > >> Hi Thomas, >> >> Thanks for this. Regarding the option of whether to return just one feature >> per side or all overlapping features, the only other advantage that >> immediately springs to mind for the latter (in addition to some measure of >> consistency, as you mention) is that it allows the client to immediately >> render the exact region of that feature without triggering another request. >> It would generally mean changing zoom level. I'm can't say if clients are >> likely to follow this mechanism as opposed to, say, pan and centre on the >> feature, but if they wanted to it would be more efficient (and possibly a >> little bit more efficient anyway depending on how your client does its >> requests). >> > Yep, I agree. I'd be interested to learn whether there are any clients that > would seriously consider taking advantage of this. My own thinking is that > even if we do adjust zoom level (as Dalliance sometimes does, e.g. in the > "jump to gene..." navigation op), clients are much more likely to zoom to a > view that contains the target feature plus a "sensible" amount of flanking > sequence, rather than a view where the target feature is perfectly framed. > > Furthermore, this rather seems like optimizing for the case where only one > annotation source is active. Surely we're talking about the > *distributed*annotation system, and clients will still have to go off > and query all the > other annotation sources, even if they are able to skip the one which > responded to the "adjacent" query. So long as there's some kind of query > parallelization in place, this probably isn't a performance issue. > > Do any other client developers feel differently? So we can include or exclude the overlapping features. What about adding another filter? "adjacent" would retrieve only those around the feature requested and "overlapping" would returns the overlaps? Using both would be interpreted as OR. Does is make sense? Would that be interesting for people using this type of filters? >> Disadvantages I can think of: >> - "adjacent" request takes marginally longer >> - not quite as obvious what clients should put in their UI controls - need >> to pick a feature to be able to do "jump to BRCA1" >> - risk of servers not implementing it correctly and only returning one >> feature anyway (although I don't think this is likely as the concept is >> different to "feature-by-id") >> >> Some things to further define: >> - servers can't return a fake feature > Yep, will clarify this. > >> - should servers return features on different reference sequences if there >> are none one the current one? > In my opinion, absolutely yes. Otherwise the "10 features in the genome" > case remains a massive pain (and potentially a disaster, for > inhomogeneous-dstributed data; won't someone think of the MHC tiling arrays? > :-). And even worse for the "10 features in UniProt" case (where I can also > see this feature being quite interesting). Mmm, I do not understand it, "10 features in UniProt" case? "10 features in the genome" case? Could you please some more information about it? > I've tried to be explicit about this in my proposal (see the penultimate > paragraph + example 3), but any suggestions for further clarifications are > welcome. > > >> - how should servers treat features that overlap the adjacent range? Treat >> them as the adjacent feature to return, or only include features completely >> outside the query range? What if the next feature completely outside the >> query range is part of the same feature hierarchy (e.g. an exon outside the >> current window). >> > It's a point rather than a range, but yes I agree this is still an open > question. I'd actually written the spec such that overlapping features do > get returned (on the assumption that clients will do "trivial" cases of > next/previous feature in-memory without a network round trip), but again if > other client developers do things differently, I'd like to know. > > I think "include overlapping" will have less special-cases to worry about, > though. e.g. the PART/PARENT issue you allude to. Let clients deal with > that ("dumb servers, smart clients"). > > Thomas. > _______________________________________________ > DAS mailing list > DAS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das From jw12 at sanger.ac.uk Mon Mar 7 05:57:48 2011 From: jw12 at sanger.ac.uk (Jonathan Warren) Date: Mon, 7 Mar 2011 10:57:48 +0000 Subject: [DAS] Adjacent feature extension In-Reply-To: References: <4CB1186B-D5F0-4750-A297-88538BB196B5@ebi.ac.uk> Message-ID: On 7 Mar 2011, at 10:35, Thomas Down wrote: > On Mon, Mar 7, 2011 at 10:04 AM, Andy Jenkinson >wrote: > >> Hi Thomas, >> >> Thanks for this. Regarding the option of whether to return just one >> feature >> per side or all overlapping features, the only other advantage that >> immediately springs to mind for the latter (in addition to some >> measure of >> consistency, as you mention) is that it allows the client to >> immediately >> render the exact region of that feature without triggering another >> request. >> It would generally mean changing zoom level. I'm can't say if >> clients are >> likely to follow this mechanism as opposed to, say, pan and centre >> on the >> feature, but if they wanted to it would be more efficient (and >> possibly a >> little bit more efficient anyway depending on how your client does >> its >> requests). >> > > Yep, I agree. I'd be interested to learn whether there are any > clients that > would seriously consider taking advantage of this. My own thinking > is that > even if we do adjust zoom level (as Dalliance sometimes does, e.g. > in the > "jump to gene..." navigation op), clients are much more likely to > zoom to a > view that contains the target feature plus a "sensible" amount of > flanking > sequence, rather than a view where the target feature is perfectly > framed. > > Furthermore, this rather seems like optimizing for the case where > only one > annotation source is active. Surely we're talking about the > *distributed*annotation system, and clients will still have to go off > and query all the > other annotation sources, even if they are able to skip the one which > responded to the "adjacent" query. So long as there's some kind of > query > parallelization in place, this probably isn't a performance issue. My vote would ideally to change feature_by_id to return one feature and have the adjacent_feature as returning one feature. This in my opinion would mean these capabilities on servers do "exactly as they say on the tin" and would be easier to implement for data providers and are thus more likely to be implemented? If the feature_id capability as it stands is needed it could be changed to something more akin to what it means like feature_id_region but I would bet no one would bother to change it/use it? However the reality is that we are too late to change the old feature_by_id, but I don't think we need to make the same mistake twice by repeating it for adjacent_features? > > Do any other client developers feel differently? > > >> Disadvantages I can think of: >> - "adjacent" request takes marginally longer >> - not quite as obvious what clients should put in their UI controls >> - need >> to pick a feature to be able to do "jump to BRCA1" >> - risk of servers not implementing it correctly and only returning >> one >> feature anyway (although I don't think this is likely as the >> concept is >> different to "feature-by-id") >> >> Some things to further define: >> - servers can't return a fake feature >> > > Yep, will clarify this. > > >> - should servers return features on different reference sequences >> if there >> are none one the current one? >> > > In my opinion, absolutely yes. Otherwise the "10 features in the > genome" > case remains a massive pain (and potentially a disaster, for > inhomogeneous-dstributed data; won't someone think of the MHC tiling > arrays? > :-). And even worse for the "10 features in UniProt" case (where I > can also > see this feature being quite interesting). > > I've tried to be explicit about this in my proposal (see the > penultimate > paragraph + example 3), but any suggestions for further > clarifications are > welcome. > > >> - how should servers treat features that overlap the adjacent >> range? Treat >> them as the adjacent feature to return, or only include features >> completely >> outside the query range? What if the next feature completely >> outside the >> query range is part of the same feature hierarchy (e.g. an exon >> outside the >> current window). >> > > It's a point rather than a range, but yes I agree this is still an > open > question. I'd actually written the spec such that overlapping > features do > get returned (on the assumption that clients will do "trivial" cases > of > next/previous feature in-memory without a network round trip), but > again if > other client developers do things differently, I'd like to know. > > I think "include overlapping" will have less special-cases to worry > about, > though. e.g. the PART/PARENT issue you allude to. Let clients deal > with > that ("dumb servers, smart clients"). > > Thomas. > _______________________________________________ > DAS mailing list > DAS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das Jonathan Warren Senior Developer and DAS coordinator blog: http://biodasman.wordpress.com/ jw12 at sanger.ac.uk Ext: 2314 Telephone: 01223 492314 -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From andy.jenkinson at ebi.ac.uk Mon Mar 7 06:19:22 2011 From: andy.jenkinson at ebi.ac.uk (Andy Jenkinson) Date: Mon, 7 Mar 2011 11:19:22 +0000 Subject: [DAS] Adjacent feature extension In-Reply-To: References: <4CB1186B-D5F0-4750-A297-88538BB196B5@ebi.ac.uk>

Message-ID: On 7 Mar 2011, at 10:57, Jonathan Warren wrote: > > My vote would ideally to change feature_by_id to return one feature and have the adjacent_feature as returning one feature. This in my opinion would mean these capabilities on servers do "exactly as they say on the tin" and would be easier to implement for data providers and are thus more likely to be implemented? > If the feature_id capability as it stands is needed it could be changed to something more akin to what it means like feature_id_region but I would bet no one would bother to change it/use it? > > However the reality is that we are too late to change the old feature_by_id, but I don't think we need to make the same mistake twice by repeating it for adjacent_features? I disagree. I think the problems with feature-by-id are that a) the name of the capability implies singular, and b) the concept itself (i.e. getting a feature by its ID) is such a common operation that is otherwise missing in DAS. I don't think either of those apply to an "adjacent" capability unless you specifically choose to call it "adjacent-feature" as opposed to "adjacent-features". I honestly don't think a capability called "adjacent-features" with a query structure like "/das/features?adjacent=foo:1" implies singular, rather the opposite in fact. To me that query suggests "get me the features adjacent to foo:1". True that 2 features is plural which still leaves a "one feature either side" interpretation possible, but IMO certainly not implicit enough to stop anyone implementing it to actually read the specification/documentation. Add to that the fact that this is an entirely new behaviour that we have the chance to properly document and make it clear exactly what the server must do. So IMO we have a clear choice. As to feature-by-id, I know changing behaviour is potentially a very disruptive change, but I think we can potentially do this purely because servers don't tend to implement it correctly anyway. Clients can happily filter out any additional features returned by old servers, and if any clients are reliant on the server including all overlapping features then as far as I am concerned they are either a) targeting specific servers rather than DAS-wide and thus unaffected, or b) already broken :) I have to admit that the feature-by-id capability is one of the (many) things I loathe having to explain and would love to change it. Doing so would be consistent with what we were trying to do with 1.6 (i.e. rationalise existing use of the spec) but I chickened out really. Cheers, Andy From thomas.a.down at gmail.com Mon Mar 7 06:31:06 2011 From: thomas.a.down at gmail.com (Thomas Down) Date: Mon, 7 Mar 2011 11:31:06 +0000 Subject: [DAS] Adjacent feature extension In-Reply-To: <4D74B825.6080509@ebi.ac.uk> References: <4CB1186B-D5F0-4750-A297-88538BB196B5@ebi.ac.uk> <4D74B825.6080509@ebi.ac.uk> Message-ID: On Mon, Mar 7, 2011 at 10:49 AM, Leyla Garcia wrote: > > So we can include or exclude the overlapping features. What about adding > another filter? "adjacent" would retrieve only those around the feature > requested and "overlapping" would returns the overlaps? Using both would be > interpreted as OR. Does is make sense? > Would that be interesting for people using this type of filters? > How does the "overlapping" filter differ from the existing "segment" filter? The proposal in its current form states that segment and adjacent filters may be combined, and servers should return the union of features selected by the individual filters (e.g. a logical OR, just as you suggest). > In my opinion, absolutely yes. Otherwise the "10 features in the genome" >> case remains a massive pain (and potentially a disaster, for >> inhomogeneous-dstributed data; won't someone think of the MHC tiling >> arrays? >> :-). And even worse for the "10 features in UniProt" case (where I can >> also >> see this feature being quite interesting). >> > Mmm, I do not understand it, "10 features in UniProt" case? "10 features in > the genome" case? Could you please some more information about it? > Yes, think of a track that is very sparsely, and potentially inhomogenously, populated. Things like "imprinted regions in the human genome" (okay, there are more than 10 of those, but maybe no more than 100, and they're clustered). Or "my favorite rare protein domain". Scanning through such tracks by scrolling isn't going to be terribly helpful! One of the use cases I always have in the back of my mind when discussing stuff like this is the biologist who has done some kind of analysis, has 50 "interesting" regions of the genome (or proteome), and wants to flip through them quickly to eyeball, sanity check, and see if they can see any contextual patterns. I work all the time with people who want this, and sometimes do it myself. Up until now, I've tended to solve it by creating an HTML frameset with a list of regions in one frame and a visualization tool (either Ensembl or Dalliance) on the other. It's far from an optional solution. Allowing people to flick from region to region *within the browser* is far smoother. Thomas. From gsalazar at cs.uct.ac.za Mon Mar 7 06:16:01 2011 From: gsalazar at cs.uct.ac.za (Gustavo Salazar) Date: Mon, 7 Mar 2011 11:16:01 +0000 Subject: [DAS] Adjacent feature extension In-Reply-To: References: <4CB1186B-D5F0-4750-A297-88538BB196B5@ebi.ac.uk>

Message-ID: On 7 Mar 2011, at 10:57, Jonathan Warren wrote: > > On 7 Mar 2011, at 10:35, Thomas Down wrote: > >> On Mon, Mar 7, 2011 at 10:04 AM, Andy Jenkinson wrote: >> >>> Hi Thomas, >>> >>> Thanks for this. Regarding the option of whether to return just one feature >>> per side or all overlapping features, the only other advantage that >>> immediately springs to mind for the latter (in addition to some measure of >>> consistency, as you mention) is that it allows the client to immediately >>> render the exact region of that feature without triggering another request. >>> It would generally mean changing zoom level. I'm can't say if clients are >>> likely to follow this mechanism as opposed to, say, pan and centre on the >>> feature, but if they wanted to it would be more efficient (and possibly a >>> little bit more efficient anyway depending on how your client does its >>> requests). >>> >> >> Yep, I agree. I'd be interested to learn whether there are any clients that >> would seriously consider taking advantage of this. My own thinking is that >> even if we do adjust zoom level (as Dalliance sometimes does, e.g. in the >> "jump to gene..." navigation op), clients are much more likely to zoom to a >> view that contains the target feature plus a "sensible" amount of flanking >> sequence, rather than a view where the target feature is perfectly framed. >> >> Furthermore, this rather seems like optimizing for the case where only one >> annotation source is active. Surely we're talking about the >> *distributed*annotation system, and clients will still have to go off >> and query all the >> other annotation sources, even if they are able to skip the one which >> responded to the "adjacent" query. So long as there's some kind of query >> parallelization in place, this probably isn't a performance issue. > > My vote would ideally to change feature_by_id to return one feature and have the adjacent_feature as returning one feature. This in my opinion would mean these capabilities on servers do "exactly as they say on the tin" and would be easier to implement for data providers and are thus more likely to be implemented? > If the feature_id capability as it stands is needed it could be changed to something more akin to what it means like feature_id_region but I would bet no one would bother to change it/use it? > > However the reality is that we are too late to change the old feature_by_id, but I don't think we need to make the same mistake twice by repeating it for adjacent_features? I agree with Jonathan, feature_by_id sounds like it gets the feature by the requested Id, and to be honest is the way I have implemented before, so if you ask me I will say the adjacent capability should just return one feature. I don't think we are too late to change the old feature_by_id behaviour and we can take this as the opportunity to make such a change. > > >> >> Do any other client developers feel differently? >> >> >>> Disadvantages I can think of: >>> - "adjacent" request takes marginally longer >>> - not quite as obvious what clients should put in their UI controls - need >>> to pick a feature to be able to do "jump to BRCA1" >>> - risk of servers not implementing it correctly and only returning one >>> feature anyway (although I don't think this is likely as the concept is >>> different to "feature-by-id") >>> >>> Some things to further define: >>> - servers can't return a fake feature >>> >> >> Yep, will clarify this. >> >> >>> - should servers return features on different reference sequences if there >>> are none one the current one? >>> >> >> In my opinion, absolutely yes. Otherwise the "10 features in the genome" >> case remains a massive pain (and potentially a disaster, for >> inhomogeneous-dstributed data; won't someone think of the MHC tiling arrays? >> :-). And even worse for the "10 features in UniProt" case (where I can also >> see this feature being quite interesting). >> >> I've tried to be explicit about this in my proposal (see the penultimate >> paragraph + example 3), but any suggestions for further clarifications are >> welcome. >> >> >>> - how should servers treat features that overlap the adjacent range? Treat >>> them as the adjacent feature to return, or only include features completely >>> outside the query range? What if the next feature completely outside the >>> query range is part of the same feature hierarchy (e.g. an exon outside the >>> current window). >>> >> >> It's a point rather than a range, but yes I agree this is still an open >> question. I'd actually written the spec such that overlapping features do >> get returned (on the assumption that clients will do "trivial" cases of >> next/previous feature in-memory without a network round trip), but again if >> other client developers do things differently, I'd like to know. >> >> I think "include overlapping" will have less special-cases to worry about, >> though. e.g. the PART/PARENT issue you allude to. Let clients deal with >> that ("dumb servers, smart clients"). >> >> Thomas. >> _______________________________________________ >> DAS mailing list >> DAS at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/das > > Jonathan Warren > Senior Developer and DAS coordinator > blog: http://biodasman.wordpress.com/ > jw12 at sanger.ac.uk > Ext: 2314 > Telephone: 01223 492314 > > > > > > > > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome ResearchLimited, a charity registered in England with number 1021457 and acompany registered in England with number 2742969, whose registeredoffice is 215 Euston Road, London, NW1 2BE._______________________________________________ > DAS mailing list > DAS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das From jw12 at sanger.ac.uk Mon Mar 7 06:51:56 2011 From: jw12 at sanger.ac.uk (Jonathan Warren) Date: Mon, 7 Mar 2011 11:51:56 +0000 Subject: [DAS] Adjacent feature extension In-Reply-To: References: <4CB1186B-D5F0-4750-A297-88538BB196B5@ebi.ac.uk>

Message-ID: On 7 Mar 2011, at 11:19, Andy Jenkinson wrote: > On 7 Mar 2011, at 10:57, Jonathan Warren wrote: > >> >> My vote would ideally to change feature_by_id to return one feature >> and have the adjacent_feature as returning one feature. This in my >> opinion would mean these capabilities on servers do "exactly as >> they say on the tin" and would be easier to implement for data >> providers and are thus more likely to be implemented? >> If the feature_id capability as it stands is needed it could be >> changed to something more akin to what it means like >> feature_id_region but I would bet no one would bother to change it/ >> use it? >> >> However the reality is that we are too late to change the old >> feature_by_id, but I don't think we need to make the same mistake >> twice by repeating it for adjacent_features? > > I disagree. I think the problems with feature-by-id are that a) the > name of the capability implies singular, and b) the concept itself > (i.e. getting a feature by its ID) is such a common operation that > is otherwise missing in DAS. I don't think either of those apply to > an "adjacent" capability unless you specifically choose to call it > "adjacent-feature" as opposed to "adjacent-features". I honestly > don't think a capability called "adjacent-features" with a query > structure like "/das/features?adjacent=foo:1" implies singular, > rather the opposite in fact. To me that query suggests "get me the > features adjacent to foo:1". True that 2 features is plural which > still leaves a "one feature either side" interpretation possible, > but IMO certainly not implicit enough to stop anyone implementing it > to actually read the specification/documentation. Add to that the > fact that this is an entirely new behaviour that we have the chance > to properly document and make it clear exactly what the server must > do. > > So IMO we have a clear choice. I still think it's simpler to implement it for one feature either side and keep complexity in the client. Generally how many people stay wake after line 10 when reading the spec? :) Lets see if there are more votes... > > As to feature-by-id, I know changing behaviour is potentially a very > disruptive change, but I think we can potentially do this purely > because servers don't tend to implement it correctly anyway. Clients > can happily filter out any additional features returned by old > servers, and if any clients are reliant on the server including all > overlapping features then as far as I am concerned they are either > a) targeting specific servers rather than DAS-wide and thus > unaffected, or b) already broken :) So you agree feature-by_id should be changed if we have the stomach for it? - good and Gustavo too. Well done Andy - You have just agreed to write Spec 1.7 or 3??? ;) Your argument above can be used for leaving the spec as it is then as well - but ideally I agree and guess we can call it spec 1.61 assuming other people agree. > > I have to admit that the feature-by-id capability is one of the > (many) things I loathe having to explain and would love to change > it. Doing so would be consistent with what we were trying to do with > 1.6 (i.e. rationalise existing use of the spec) but I chickened out > really. > > Cheers, > Andy Jonathan Warren Senior Developer and DAS coordinator blog: http://biodasman.wordpress.com/ jw12 at sanger.ac.uk Ext: 2314 Telephone: 01223 492314 -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From thomas.a.down at gmail.com Mon Mar 7 07:01:15 2011 From: thomas.a.down at gmail.com (Thomas Down) Date: Mon, 7 Mar 2011 12:01:15 +0000 Subject: [DAS] Adjacent feature extension In-Reply-To: References: <4CB1186B-D5F0-4750-A297-88538BB196B5@ebi.ac.uk>

Message-ID: On Mon, Mar 7, 2011 at 11:51 AM, Jonathan Warren wrote: > As to feature-by-id, I know changing behaviour is potentially a very >> disruptive change, but I think we can potentially do this purely because >> servers don't tend to implement it correctly anyway. Clients can happily >> filter out any additional features returned by old servers, and if any >> clients are reliant on the server including all overlapping features then as >> far as I am concerned they are either a) targeting specific servers rather >> than DAS-wide and thus unaffected, or b) already broken :) >> > So you agree feature-by_id should be changed if we have the stomach for it? > - good and Gustavo too. Well done Andy - You have just agreed to write Spec > 1.7 or 3??? ;) Your argument above can be used for leaving the spec as it is > then as well - but ideally I agree and guess we can call it spec 1.61 > assuming other people agree. > I don't know quite how productive this particular worm-can will be, but FWIW I'd be happy to see it change. The current spec has caused significant hassle (with both my client- and server-dev hats on!). Thomas. PS. One thing I would ask is: can we discuss the merits of the feature-adjacent extension independently of any possible changes to the core spec. It's addressing some specific needs, and if we can get the last few wrinkles sorted out I'm very eager to get it finished and deployed soon. From jw12 at sanger.ac.uk Mon Mar 7 07:16:05 2011 From: jw12 at sanger.ac.uk (Jonathan Warren) Date: Mon, 7 Mar 2011 12:16:05 +0000 Subject: [DAS] Adjacent feature extension In-Reply-To: References: <4CB1186B-D5F0-4750-A297-88538BB196B5@ebi.ac.uk>

Message-ID: <6E8B83BF-8B65-41F8-B697-696431D5176E@sanger.ac.uk> On 7 Mar 2011, at 12:01, Thomas Down wrote: > > > On Mon, Mar 7, 2011 at 11:51 AM, Jonathan Warren > wrote: > As to feature-by-id, I know changing behaviour is potentially a very > disruptive change, but I think we can potentially do this purely > because servers don't tend to implement it correctly anyway. Clients > can happily filter out any additional features returned by old > servers, and if any clients are reliant on the server including all > overlapping features then as far as I am concerned they are either > a) targeting specific servers rather than DAS-wide and thus > unaffected, or b) already broken :) > So you agree feature-by_id should be changed if we have the stomach > for it? - good and Gustavo too. Well done Andy - You have just > agreed to write Spec 1.7 or 3??? ;) Your argument above can be used > for leaving the spec as it is then as well - but ideally I agree and > guess we can call it spec 1.61 assuming other people agree. > > I don't know quite how productive this particular worm-can will be, > but FWIW I'd be happy to see it change. The current spec has caused > significant hassle (with both my client- and server-dev hats on!). > > Thomas. > > PS. One thing I would ask is: can we discuss the merits of the > feature-adjacent extension independently of any possible changes to > the core spec. It's addressing some specific needs, and if we can > get the last few wrinkles sorted out I'm very eager to get it > finished and deployed soon. Fair enough, adjacent features would be an extension and so not affect the core spec. As you know a similar adjacent_feature proposal has been posted before to the list and yes we should definitely come up with a solution now. I'd say if we don't have any more objections in the next couple of days then go with your proposal as is? I'll then put support into the registry this week if that is the case. If you could also then copy the proposal from here https://github.com/dasmoth/dalliance/wiki/AdjacentFeatures to the extensions page here: http://www.biodas.org/wiki/DAS1.6E#Adjacent_Feature_filter noting in large letters that it was agreed by the community on such a such a date? Jonathan Warren Senior Developer and DAS coordinator blog: http://biodasman.wordpress.com/ jw12 at sanger.ac.uk Ext: 2314 Telephone: 01223 492314 -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From andy.jenkinson at ebi.ac.uk Mon Mar 7 07:43:32 2011 From: andy.jenkinson at ebi.ac.uk (Andy Jenkinson) Date: Mon, 7 Mar 2011 12:43:32 +0000 Subject: [DAS] Adjacent feature extension In-Reply-To: References: <4CB1186B-D5F0-4750-A297-88538BB196B5@ebi.ac.uk>

Message-ID: <4796FEEA-3D6A-43A4-A76D-2212E2F690B6@ebi.ac.uk> On 7 Mar 2011, at 11:51, Jonathan Warren wrote: > On 7 Mar 2011, at 11:19, Andy Jenkinson wrote: > >> On 7 Mar 2011, at 10:57, Jonathan Warren wrote: >> >>> >>> My vote would ideally to change feature_by_id to return one feature and have the adjacent_feature as returning one feature. This in my opinion would mean these capabilities on servers do "exactly as they say on the tin" and would be easier to implement for data providers and are thus more likely to be implemented? >>> If the feature_id capability as it stands is needed it could be changed to something more akin to what it means like feature_id_region but I would bet no one would bother to change it/use it? >>> >>> However the reality is that we are too late to change the old feature_by_id, but I don't think we need to make the same mistake twice by repeating it for adjacent_features? >> >> I disagree. I think the problems with feature-by-id are that a) the name of the capability implies singular, and b) the concept itself (i.e. getting a feature by its ID) is such a common operation that is otherwise missing in DAS. I don't think either of those apply to an "adjacent" capability unless you specifically choose to call it "adjacent-feature" as opposed to "adjacent-features". I honestly don't think a capability called "adjacent-features" with a query structure like "/das/features?adjacent=foo:1" implies singular, rather the opposite in fact. To me that query suggests "get me the features adjacent to foo:1". True that 2 features is plural which still leaves a "one feature either side" interpretation possible, but IMO certainly not implicit enough to stop anyone implementing it to actually read the specification/documentation. Add to that the fact that this is an entirely new behaviour that we have the chance to properly document and make it clear exactly what the server must do. >> >> So IMO we have a clear choice. > I still think it's simpler to implement it for one feature either side and keep complexity in the client. Generally how many people stay wake after line 10 when reading the spec? :) Lets see if there are more votes... It probably is simpler to implement (well, to implement with maximum efficiency) and I am not advocating one over the other, but IMO the implementation considerations are a separate part of our choice and are orthogonal to whether it's confusing for those implementing it and consequently whether we see divergence from the spec like we do with feature-by-id. As Gustavo says, he'd implement feature-by-id as one feature because that's what he thinks it means, not because it's difficult. I'd posit that it'd be a one line change for any server maintainer to fix theirs to implement it correctly (i.e. use the feature's start/end to resubmit the query), it's just that it'd be more complicated to do it in a single step from the beginning. We should be under no illusions though that people are going to be able to implement this easily without reading the documentation carefully, no matter which option is chosen. In particular, I can foresee servers not interpreting the "type" filter appropriately, being likely to process the adjacent query then apply the type filter, which would be wrong. I have a feeling most sources implement the type filter as a passive "post filter" rather than an active one. I can tell you right now that it is going to be really quite difficult for me to implement "adjacent" correctly for the ASTD gene/transcript/exon sources, and I suspect the same will be true for retrofitting lots of other sources. >> >> As to feature-by-id, I know changing behaviour is potentially a very disruptive change, but I think we can potentially do this purely because servers don't tend to implement it correctly anyway. Clients can happily filter out any additional features returned by old servers, and if any clients are reliant on the server including all overlapping features then as far as I am concerned they are either a) targeting specific servers rather than DAS-wide and thus unaffected, or b) already broken :) > So you agree feature-by_id should be changed if we have the stomach for it? - good and Gustavo too. Well done Andy - You have just agreed to write Spec 1.7 or 3??? ;) Your argument above can be used for leaving the spec as it is then as well - but ideally I agree and guess we can call it spec 1.61 assuming other people agree. I already have a small list of changes for DAS 1.7 or whatever and think it's fine for that context. In any case, let's keep these two issues separate as Thomas says. >> >> I have to admit that the feature-by-id capability is one of the (many) things I loathe having to explain and would love to change it. Doing so would be consistent with what we were trying to do with 1.6 (i.e. rationalise existing use of the spec) but I chickened out really. >> >> Cheers, >> Andy > > Jonathan Warren > Senior Developer and DAS coordinator > blog: http://biodasman.wordpress.com/ > jw12 at sanger.ac.uk > Ext: 2314 > Telephone: 01223 492314 > > > > > > > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome ResearchLimited, a charity registered in England with number 1021457 and acompany registered in England with number 2742969, whose registeredoffice is 215 Euston Road, London, NW1 2BE. From thomas.a.down at gmail.com Mon Mar 7 07:57:37 2011 From: thomas.a.down at gmail.com (Thomas Down) Date: Mon, 7 Mar 2011 12:57:37 +0000 Subject: [DAS] Adjacent feature extension In-Reply-To: <4796FEEA-3D6A-43A4-A76D-2212E2F690B6@ebi.ac.uk> References: <4CB1186B-D5F0-4750-A297-88538BB196B5@ebi.ac.uk>

<4796FEEA-3D6A-43A4-A76D-2212E2F690B6@ebi.ac.uk> Message-ID: On Mon, Mar 7, 2011 at 12:43 PM, Andy Jenkinson wrote: > We should be under no illusions though that people are going to be able to > implement this easily without reading the documentation carefully, no matter > which option is chosen. In particular, I can foresee servers not > interpreting the "type" filter appropriately, being likely to process the > adjacent query then apply the type filter, which would be wrong. I have a > feeling most sources implement the type filter as a passive "post filter" > rather than an active one. I can tell you right now that it is going to be > really quite difficult for me to implement "adjacent" correctly for the ASTD > gene/transcript/exon sources, and I suspect the same will be true for > retrofitting lots of other sources. > I appreciate this is tough. However, given the existence of UCSC-like "super-sources" containing basically unrelated types, with potentially quite different densities and dispositions, I find it hard to see how this is going to be useful without some kind of interaction with the types filter along the lines of what I've proposed. Thomas. From andy.jenkinson at ebi.ac.uk Mon Mar 7 08:11:46 2011 From: andy.jenkinson at ebi.ac.uk (Andy Jenkinson) Date: Mon, 7 Mar 2011 13:11:46 +0000 Subject: [DAS] Adjacent feature extension In-Reply-To: References: <4CB1186B-D5F0-4750-A297-88538BB196B5@ebi.ac.uk>

<4796FEEA-3D6A-43A4-A76D-2212E2F690B6@ebi.ac.uk> Message-ID: <31F4DDA4-EFE4-467F-A2BE-A65CACD8BDD4@ebi.ac.uk> On 7 Mar 2011, at 12:57, Thomas Down wrote: > On Mon, Mar 7, 2011 at 12:43 PM, Andy Jenkinson wrote: > We should be under no illusions though that people are going to be able to implement this easily without reading the documentation carefully, no matter which option is chosen. In particular, I can foresee servers not interpreting the "type" filter appropriately, being likely to process the adjacent query then apply the type filter, which would be wrong. I have a feeling most sources implement the type filter as a passive "post filter" rather than an active one. I can tell you right now that it is going to be really quite difficult for me to implement "adjacent" correctly for the ASTD gene/transcript/exon sources, and I suspect the same will be true for retrofitting lots of other sources. > > I appreciate this is tough. However, given the existence of UCSC-like "super-sources" containing basically unrelated types, with potentially quite different densities and dispositions, I find it hard to see how this is going to be useful without some kind of interaction with the types filter along the lines of what I've proposed. Absolutely. From jw12 at sanger.ac.uk Mon Mar 7 09:11:20 2011 From: jw12 at sanger.ac.uk (Jonathan Warren) Date: Mon, 7 Mar 2011 14:11:20 +0000 Subject: [DAS] Adjacent feature extension In-Reply-To: <4796FEEA-3D6A-43A4-A76D-2212E2F690B6@ebi.ac.uk> References: <4CB1186B-D5F0-4750-A297-88538BB196B5@ebi.ac.uk>

<4796FEEA-3D6A-43A4-A76D-2212E2F690B6@ebi.ac.uk> Message-ID: On 7 Mar 2011, at 12:43, Andy Jenkinson wrote: > On 7 Mar 2011, at 11:51, Jonathan Warren wrote: > >> On 7 Mar 2011, at 11:19, Andy Jenkinson wrote: >> >>> On 7 Mar 2011, at 10:57, Jonathan Warren wrote: >>> >>>> >>>> My vote would ideally to change feature_by_id to return one >>>> feature and have the adjacent_feature as returning one feature. >>>> This in my opinion would mean these capabilities on servers do >>>> "exactly as they say on the tin" and would be easier to implement >>>> for data providers and are thus more likely to be implemented? >>>> If the feature_id capability as it stands is needed it could be >>>> changed to something more akin to what it means like >>>> feature_id_region but I would bet no one would bother to change >>>> it/use it? >>>> >>>> However the reality is that we are too late to change the old >>>> feature_by_id, but I don't think we need to make the same mistake >>>> twice by repeating it for adjacent_features? >>> >>> I disagree. I think the problems with feature-by-id are that a) >>> the name of the capability implies singular, and b) the concept >>> itself (i.e. getting a feature by its ID) is such a common >>> operation that is otherwise missing in DAS. I don't think either >>> of those apply to an "adjacent" capability unless you specifically >>> choose to call it "adjacent-feature" as opposed to "adjacent- >>> features". I honestly don't think a capability called "adjacent- >>> features" with a query structure like "/das/features?adjacent=foo: >>> 1" implies singular, rather the opposite in fact. To me that query >>> suggests "get me the features adjacent to foo:1". True that 2 >>> features is plural which still leaves a "one feature either side" >>> interpretation possible, but IMO certainly not implicit enough to >>> stop anyone implementing it to actually read the specification/ >>> documentation. Add to that the fact that this is an entirely new >>> behaviour that we have the chance to properly document and make it >>> clear exactly what the server must do. >>> >>> So IMO we have a clear choice. >> I still think it's simpler to implement it for one feature either >> side and keep complexity in the client. Generally how many people >> stay wake after line 10 when reading the spec? :) Lets see if there >> are more votes... > > It probably is simpler to implement (well, to implement with maximum > efficiency) and I am not advocating one over the other, but IMO the > implementation considerations are a separate part of our choice and > are orthogonal to whether it's confusing for those implementing it > and consequently whether we see divergence from the spec like we do > with feature-by-id. As Gustavo says, he'd implement feature-by-id as > one feature because that's what he thinks it means, not because it's > difficult. I'd posit that it'd be a one line change for any server > maintainer to fix theirs to implement it correctly (i.e. use the > feature's start/end to resubmit the query), it's just that it'd be > more complicated to do it in a single step from the beginning. > > We should be under no illusions though that people are going to be > able to implement this easily without reading the documentation > carefully, no matter which option is chosen. Good template methods and or examples in tutorials examples will encourage use of this command. > In particular, I can foresee servers not interpreting the "type" > filter appropriately, being likely to process the adjacent query > then apply the type filter, which would be wrong. I have a feeling > most sources implement the type filter as a passive "post filter" > rather than an active one. I can tell you right now that it is going > to be really quite difficult for me to implement "adjacent" > correctly for the ASTD gene/transcript/exon sources, and I suspect > the same will be true for retrofitting lots of other sources. This is an optional capability though right? > >>> >>> As to feature-by-id, I know changing behaviour is potentially a >>> very disruptive change, but I think we can potentially do this >>> purely because servers don't tend to implement it correctly >>> anyway. Clients can happily filter out any additional features >>> returned by old servers, and if any clients are reliant on the >>> server including all overlapping features then as far as I am >>> concerned they are either a) targeting specific servers rather >>> than DAS-wide and thus unaffected, or b) already broken :) >> So you agree feature-by_id should be changed if we have the stomach >> for it? - good and Gustavo too. Well done Andy - You have just >> agreed to write Spec 1.7 or 3??? ;) Your argument above can be used >> for leaving the spec as it is then as well - but ideally I agree >> and guess we can call it spec 1.61 assuming other people agree. > > I already have a small list of changes for DAS 1.7 or whatever and > think it's fine for that context. In any case, let's keep these two > issues separate as Thomas says. I was really hoping not to do another major spec revision for at least 3 years and to focus on extensions giving new capabilities- otherwise for the core capabilities everyone is always playing catch up! This maybe something to discuss at some point soon. > >>> >>> I have to admit that the feature-by-id capability is one of the >>> (many) things I loathe having to explain and would love to change >>> it. Doing so would be consistent with what we were trying to do >>> with 1.6 (i.e. rationalise existing use of the spec) but I >>> chickened out really. >>> >>> Cheers, >>> Andy >> >> Jonathan Warren >> Senior Developer and DAS coordinator >> blog: http://biodasman.wordpress.com/ >> jw12 at sanger.ac.uk >> Ext: 2314 >> Telephone: 01223 492314 >> >> >> >> >> >> >> >> >> >> -- >> The Wellcome Trust Sanger Institute is operated by Genome >> ResearchLimited, a charity registered in England with number >> 1021457 and acompany registered in England with number 2742969, >> whose registeredoffice is 215 Euston Road, London, NW1 2BE. > Jonathan Warren Senior Developer and DAS coordinator blog: http://biodasman.wordpress.com/ jw12 at sanger.ac.uk Ext: 2314 Telephone: 01223 492314 -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From thomas.a.down at gmail.com Mon Mar 7 09:27:12 2011 From: thomas.a.down at gmail.com (Thomas Down) Date: Mon, 7 Mar 2011 14:27:12 +0000 Subject: [DAS] Adjacent feature extension In-Reply-To: References: <4CB1186B-D5F0-4750-A297-88538BB196B5@ebi.ac.uk>

<4796FEEA-3D6A-43A4-A76D-2212E2F690B6@ebi.ac.uk> Message-ID: On Mon, Mar 7, 2011 at 2:11 PM, Jonathan Warren wrote: > In particular, I can foresee servers not interpreting the "type" filter >> appropriately, being likely to process the adjacent query then apply the >> type filter, which would be wrong. I have a feeling most sources implement >> the type filter as a passive "post filter" rather than an active one. I can >> tell you right now that it is going to be really quite difficult for me to >> implement "adjacent" correctly for the ASTD gene/transcript/exon sources, >> and I suspect the same will be true for retrofitting lots of other sources. >> > This is an optional capability though right? > That is definitely the intent of my proposal. Even if this eventually gets rolled into the spec., my suggestion would be to keep it optional (negotiated via a CAPABILITY). Thomas. From thomas.a.down at gmail.com Mon Mar 7 09:41:31 2011 From: thomas.a.down at gmail.com (Thomas Down) Date: Mon, 7 Mar 2011 14:41:31 +0000 Subject: [DAS] Locations between bases Message-ID: The day has come when I find I need to be able to distinguish between a feature which actually *covers* one or a small number of bases (e.g. a SNP) and a feature which refers to a point between two bases (e.g. an insertion). Have any other DAS folk dealt with this before, and if so how? One possible approach would be something like: 30000000 29999999 ...as a way to reference the position between the 29999999th and 30000000th bases in a sequence... but is this going to break lots of existing client code? (I also rather dislike it because there's currently sanity-checking code in Dazzle to stop you doing stuff like this...) Any better ideas? Thomas. PS. Yes, I'm aware this is a solved problem in DAS/2. Afraid I need a DAS/1 solution, though. From andy.jenkinson at ebi.ac.uk Mon Mar 7 10:04:32 2011 From: andy.jenkinson at ebi.ac.uk (Andy Jenkinson) Date: Mon, 7 Mar 2011 15:04:32 +0000 Subject: [DAS] Adjacent feature extension In-Reply-To: <6E8B83BF-8B65-41F8-B697-696431D5176E@sanger.ac.uk> References: <4CB1186B-D5F0-4750-A297-88538BB196B5@ebi.ac.uk>

<6E8B83BF-8B65-41F8-B697-696431D5176E@sanger.ac.uk> Message-ID: On 7 Mar 2011, at 12:16, Jonathan Warren wrote: > I'd say if we don't have any more objections in the next couple of days then go with your proposal as is? I'll then put support into the registry this week if that is the case. If you could also then copy the proposal from here https://github.com/dasmoth/dalliance/wiki/AdjacentFeatures to the extensions page here: > http://www.biodas.org/wiki/DAS1.6E#Adjacent_Feature_filter noting in large letters that it was agreed by the community on such a such a date? I think there is a lot left to be clarified so adopting it "as is" is a no go for me. In particular, take a look at this diagram and see if you can work out what will be returned with "adjacent" queries for either side of the viewing area, and do they make sense for what the client is trying to achieve? -------------- next part -------------- A non-text attachment was scrubbed... Name: DAS-Adjacent.png Type: image/png Size: 40385 bytes Desc: not available URL: -------------- next part -------------- The client has "seen" gene 2 and all its parts. If the client asks for features adjacent to the left/right sides of the viewing area, what should the server return? To the left: gene 1, transcript 1, exon 2, or SNP 1? Does it matter? Should gene 1 and all its parts be returned because that is what happens in a segment query? If not, is this confusing for the spec? To the right: gene 2, transcript 2, exon 4, transcript 3, exon 5, or SNP 2? If no special arrangements are made for excluding overlapping features (thus either gene 2, transcript 2 or exon 4 are returned above), what position should the client submit instead in its overlap query to get SNP 2 and SNP 3? (Hint: it's impossible to get SNP 2). What if the genes were nonpositional features? None of the above are unresolvable problems, the simplest way it to say that overlapping features should be returned, and that clients should not try to jump beyond them. It means you don't really get a "next feature" capability in the same way a user probably intends (i.e. "next gene"), but does make it impossible to miss transcript 3 and SNP 2 in the above diagram. It still needs to be defined which feature will be returned if multiple have the same position that case (ideally we want the one that extends the furthers in the adjacent direction), but that can be complicated by things like nonpositional parent features etc. The alternative is to exclude overlapping features, but that raises the question of whether parts are considered overlapping if their parents are. Implementing "next gene" means excluding all of gene 2's transcripts and exons (thus returning SNP 2), but is this even what the user meant? Maybe they meant "next transcript". So not simple. But the main questions are: 1. should overlapping features be excluded in overlap requests 2. if so, should non-overlapping features with overlapping parents/parts be excluded? 3. separate from 1 and 2, should the nearest feature's parents and parts also be returned? From andy.jenkinson at ebi.ac.uk Mon Mar 7 10:12:32 2011 From: andy.jenkinson at ebi.ac.uk (Andy Jenkinson) Date: Mon, 7 Mar 2011 15:12:32 +0000 Subject: [DAS] Adjacent feature extension In-Reply-To: References: <4CB1186B-D5F0-4750-A297-88538BB196B5@ebi.ac.uk>

<4796FEEA-3D6A-43A4-A76D-2212E2F690B6@ebi.ac.uk> Message-ID: On 7 Mar 2011, at 14:11, Jonathan Warren wrote: > > On 7 Mar 2011, at 12:43, Andy Jenkinson wrote: > >> In particular, I can foresee servers not interpreting the "type" filter appropriately, being likely to process the adjacent query then apply the type filter, which would be wrong. I have a feeling most sources implement the type filter as a passive "post filter" rather than an active one. I can tell you right now that it is going to be really quite difficult for me to implement "adjacent" correctly for the ASTD gene/transcript/exon sources, and I suspect the same will be true for retrofitting lots of other sources. > This is an optional capability though right? Of course, I'm just saying it's difficult for servers to implement so we should expect to see bad implementations exactly as we do with feature-by-id. >> >>>> >>>> As to feature-by-id, I know changing behaviour is potentially a very disruptive change, but I think we can potentially do this purely because servers don't tend to implement it correctly anyway. Clients can happily filter out any additional features returned by old servers, and if any clients are reliant on the server including all overlapping features then as far as I am concerned they are either a) targeting specific servers rather than DAS-wide and thus unaffected, or b) already broken :) >>> So you agree feature-by_id should be changed if we have the stomach for it? - good and Gustavo too. Well done Andy - You have just agreed to write Spec 1.7 or 3??? ;) Your argument above can be used for leaving the spec as it is then as well - but ideally I agree and guess we can call it spec 1.61 assuming other people agree. >> >> I already have a small list of changes for DAS 1.7 or whatever and think it's fine for that context. In any case, let's keep these two issues separate as Thomas says. > > I was really hoping not to do another major spec revision for at least 3 years and to focus on extensions giving new capabilities- otherwise for the core capabilities everyone is always playing catch up! This maybe something to discuss at some point soon. >> >>>> >>>> I have to admit that the feature-by-id capability is one of the (many) things I loathe having to explain and would love to change it. Doing so would be consistent with what we were trying to do with 1.6 (i.e. rationalise existing use of the spec) but I chickened out really. >>>> >>>> Cheers, >>>> Andy >>> >>> Jonathan Warren >>> Senior Developer and DAS coordinator >>> blog: http://biodasman.wordpress.com/ >>> jw12 at sanger.ac.uk >>> Ext: 2314 >>> Telephone: 01223 492314 >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> -- >>> The Wellcome Trust Sanger Institute is operated by Genome ResearchLimited, a charity registered in England with number 1021457 and acompany registered in England with number 2742969, whose registeredoffice is 215 Euston Road, London, NW1 2BE. >> > > Jonathan Warren > Senior Developer and DAS coordinator > blog: http://biodasman.wordpress.com/ > jw12 at sanger.ac.uk > Ext: 2314 > Telephone: 01223 492314 > > > > > > > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome ResearchLimited, a charity registered in England with number 1021457 and acompany registered in England with number 2742969, whose registeredoffice is 215 Euston Road, London, NW1 2BE. From andy.jenkinson at ebi.ac.uk Mon Mar 7 10:21:59 2011 From: andy.jenkinson at ebi.ac.uk (Andy Jenkinson) Date: Mon, 7 Mar 2011 15:21:59 +0000 Subject: [DAS] Locations between bases In-Reply-To: References: Message-ID: Oh Thomas, why do you do it to us? May I ask what the specific application is? Depending on what you need, an insertion can be modelled with a feature covering both bases with a specific glyph (or type). I think what you propose is a common approach, but it is very likely going to break things for some clients (or at least give undefined results). Some additional element would be more compatible. On 7 Mar 2011, at 14:41, Thomas Down wrote: > The day has come when I find I need to be able to distinguish between a > feature which actually *covers* one or a small number of bases (e.g. a SNP) > and a feature which refers to a point between two bases (e.g. an > insertion). Have any other DAS folk dealt with this before, and if so how? > One possible approach would be something like: > > 30000000 > 29999999 > > ...as a way to reference the position between the 29999999th and 30000000th > bases in a sequence... but is this going to break lots of existing client > code? > > (I also rather dislike it because there's currently sanity-checking code in > Dazzle to stop you doing stuff like this...) > > Any better ideas? > > Thomas. > > PS. Yes, I'm aware this is a solved problem in DAS/2. Afraid I need a DAS/1 > solution, though. > _______________________________________________ > DAS mailing list > DAS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das From birney at ebi.ac.uk Mon Mar 7 10:26:59 2011 From: birney at ebi.ac.uk (Ewan Birney) Date: Mon, 7 Mar 2011 15:26:59 +0000 Subject: [DAS] Locations between bases In-Reply-To: References:

Message-ID: I also note that this is stretching DAS's mindset away from "DAS is to support visualisation of data on clients for human users to read" to "DAS semantically represents things accurately for programmatic decisions" I think one is better off keeping these things separate. On 7 Mar 2011, at 15:21, Andy Jenkinson wrote: > Oh Thomas, why do you do it to us? > > May I ask what the specific application is? Depending on what you > need, an insertion can be modelled with a feature covering both > bases with a specific glyph (or type). > > I think what you propose is a common approach, but it is very likely > going to break things for some clients (or at least give undefined > results). Some additional element would be more compatible. > > On 7 Mar 2011, at 14:41, Thomas Down wrote: > >> The day has come when I find I need to be able to distinguish >> between a >> feature which actually *covers* one or a small number of bases >> (e.g. a SNP) >> and a feature which refers to a point between two bases (e.g. an >> insertion). Have any other DAS folk dealt with this before, and if >> so how? >> One possible approach would be something like: >> >> 30000000 >> 29999999 >> >> ...as a way to reference the position between the 29999999th and >> 30000000th >> bases in a sequence... but is this going to break lots of existing >> client >> code? >> >> (I also rather dislike it because there's currently sanity-checking >> code in >> Dazzle to stop you doing stuff like this...) >> >> Any better ideas? >> >> Thomas. >> >> PS. Yes, I'm aware this is a solved problem in DAS/2. Afraid I >> need a DAS/1 >> solution, though. >> _______________________________________________ >> DAS mailing list >> DAS at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/das > > > _______________________________________________ > DAS mailing list > DAS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das From jw12 at sanger.ac.uk Mon Mar 7 10:49:12 2011 From: jw12 at sanger.ac.uk (Jonathan Warren) Date: Mon, 7 Mar 2011 15:49:12 +0000 Subject: [DAS] Adjacent feature extension In-Reply-To: References: <4CB1186B-D5F0-4750-A297-88538BB196B5@ebi.ac.uk>

<6E8B83BF-8B65-41F8-B697-696431D5176E@sanger.ac.uk> Message-ID: <2869357C-159E-4D2D-A169-B91D63E31680@sanger.ac.uk> On 7 Mar 2011, at 15:04, Andy Jenkinson wrote: > On 7 Mar 2011, at 12:16, Jonathan Warren wrote: > >> I'd say if we don't have any more objections in the next couple of >> days then go with your proposal as is? I'll then put support into >> the registry this week if that is the case. If you could also then >> copy the proposal from here https://github.com/dasmoth/dalliance/wiki/AdjacentFeatures >> to the extensions page here: >> http://www.biodas.org/wiki/DAS1.6E#Adjacent_Feature_filter noting >> in large letters that it was agreed by the community on such a such >> a date? > > I think there is a lot left to be clarified so adopting it "as is" > is a no go for me. In particular, take a look at this diagram and > see if you can work out what will be returned with "adjacent" > queries for either side of the viewing area, and do they make sense > for what the client is trying to achieve? > > > The client has "seen" gene 2 and all its parts. > > If the client asks for features adjacent to the left/right sides of > the viewing area, what should the server return? I don't think it makes sense to ask for a next right in this case as there are features here already. This is for sparse data sources so it's ok just to return whats there if someone specifically wants to hit the next feature button or a client can blank the next right button out. It's up to the client. Next left should return SNP1 if asked for an adjacent request.... or genes and constituents if filtered on gene. If you take the intention of this as finding features where data is sparse then I don't think there are big issues. Part of the point of the extensions phase is to try these things out with examples and refine the specs. To leave acceptance of this will be a big mistake in my view. > To the left: gene 1, transcript 1, exon 2, or SNP 1? Does it matter? > Should gene 1 and all its parts be returned because that is what > happens in a segment query? If not, is this confusing for the spec? > To the right: gene 2, transcript 2, exon 4, transcript 3, exon 5, or > SNP 2? > > If no special arrangements are made for excluding overlapping > features (thus either gene 2, transcript 2 or exon 4 are returned > above), what position should the client submit instead in its > overlap query to get SNP 2 and SNP 3? (Hint: it's impossible to get > SNP 2). > > What if the genes were nonpositional features? > > > None of the above are unresolvable problems, the simplest way it to > say that overlapping features should be returned, and that clients > should not try to jump beyond them. It means you don't really get a > "next feature" capability in the same way a user probably intends > (i.e. "next gene"), but does make it impossible to miss transcript 3 > and SNP 2 in the above diagram. It still needs to be defined which > feature will be returned if multiple have the same position that > case (ideally we want the one that extends the furthers in the > adjacent direction), but that can be complicated by things like > nonpositional parent features etc. > > The alternative is to exclude overlapping features, but that raises > the question of whether parts are considered overlapping if their > parents are. Implementing "next gene" means excluding all of gene > 2's transcripts and exons (thus returning SNP 2), but is this even > what the user meant? Maybe they meant "next transcript". > > So not simple. But the main questions are: > 1. should overlapping features be excluded in overlap requests > 2. if so, should non-overlapping features with overlapping parents/ > parts be excluded? > 3. separate from 1 and 2, should the nearest feature's parents and > parts also be returned? Jonathan Warren Senior Developer and DAS coordinator blog: http://biodasman.wordpress.com/ jw12 at sanger.ac.uk Ext: 2314 Telephone: 01223 492314 -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From thomas.a.down at gmail.com Mon Mar 7 11:03:50 2011 From: thomas.a.down at gmail.com (Thomas Down) Date: Mon, 7 Mar 2011 16:03:50 +0000 Subject: [DAS] Locations between bases In-Reply-To: References:

Message-ID: Well, that's a separate debate, and I don't really want to open it at the moment. However, right now I *am* thinking quite specifically of visualization issues. Once you're zoomed in to base-pair resolution, there is a meaningful distinction between "on a base" and "between a base". Andy: what I'm trying to model is insertion sites. I guess that's the main use for this kind of thing. Thomas. On Mon, Mar 7, 2011 at 3:26 PM, Ewan Birney wrote: > > I also note that this is stretching DAS's mindset away from > > "DAS is to support visualisation of data on clients for human users to > read" > > to > > "DAS semantically represents things accurately for programmatic > decisions" > > > I think one is better off keeping these things separate. > > > > On 7 Mar 2011, at 15:21, Andy Jenkinson wrote: > > Oh Thomas, why do you do it to us? >> >> May I ask what the specific application is? Depending on what you need, an >> insertion can be modelled with a feature covering both bases with a specific >> glyph (or type). >> >> I think what you propose is a common approach, but it is very likely going >> to break things for some clients (or at least give undefined results). Some >> additional element would be more compatible. >> >> On 7 Mar 2011, at 14:41, Thomas Down wrote: >> >> The day has come when I find I need to be able to distinguish between a >>> feature which actually *covers* one or a small number of bases (e.g. a >>> SNP) >>> and a feature which refers to a point between two bases (e.g. an >>> insertion). Have any other DAS folk dealt with this before, and if so >>> how? >>> One possible approach would be something like: >>> >>> 30000000 >>> 29999999 >>> >>> ...as a way to reference the position between the 29999999th and >>> 30000000th >>> bases in a sequence... but is this going to break lots of existing client >>> code? >>> >>> (I also rather dislike it because there's currently sanity-checking code >>> in >>> Dazzle to stop you doing stuff like this...) >>> >>> Any better ideas? >>> >>> Thomas. >>> >>> PS. Yes, I'm aware this is a solved problem in DAS/2. Afraid I need a >>> DAS/1 >>> solution, though. >>> _______________________________________________ >>> DAS mailing list >>> DAS at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/das >>> >> >> >> _______________________________________________ >> DAS mailing list >> DAS at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/das >> > > From birney at ebi.ac.uk Mon Mar 7 11:07:15 2011 From: birney at ebi.ac.uk (Ewan Birney) Date: Mon, 7 Mar 2011 16:07:15 +0000 Subject: [DAS] Locations between bases In-Reply-To: References:

Message-ID: <3D11AC15-23B4-4AF1-AF90-136AFAF1540A@ebi.ac.uk> I'd say this is start=end, and a convention that this indicates the point to the left or the right with a glyph. With this converntion for the edge case one either has to allow the 0 base (right based insertion convention, insertion at the very start) or end+1 position (left based insertion convention, insertion at the every end). I know, this triggers doing things on inter-base coordinates, but that's just a big, big change. On 7 Mar 2011, at 16:03, Thomas Down wrote: > Well, that's a separate debate, and I don't really want to open it > at the moment. > > However, right now I am thinking quite specifically of visualization > issues. Once you're zoomed in to base-pair resolution, there is a > meaningful distinction between "on a base" and "between a base". > > Andy: what I'm trying to model is insertion sites. I guess that's > the main use for this kind of thing. > > Thomas. > > > > On Mon, Mar 7, 2011 at 3:26 PM, Ewan Birney wrote: > > I also note that this is stretching DAS's mindset away from > > "DAS is to support visualisation of data on clients for human > users to read" > > to > > "DAS semantically represents things accurately for programmatic > decisions" > > > I think one is better off keeping these things separate. > > > > On 7 Mar 2011, at 15:21, Andy Jenkinson wrote: > > Oh Thomas, why do you do it to us? > > May I ask what the specific application is? Depending on what you > need, an insertion can be modelled with a feature covering both > bases with a specific glyph (or type). > > I think what you propose is a common approach, but it is very likely > going to break things for some clients (or at least give undefined > results). Some additional element would be more compatible. > > On 7 Mar 2011, at 14:41, Thomas Down wrote: > > The day has come when I find I need to be able to distinguish > between a > feature which actually *covers* one or a small number of bases (e.g. > a SNP) > and a feature which refers to a point between two bases (e.g. an > insertion). Have any other DAS folk dealt with this before, and if > so how? > One possible approach would be something like: > > 30000000 > 29999999 > > ...as a way to reference the position between the 29999999th and > 30000000th > bases in a sequence... but is this going to break lots of existing > client > code? > > (I also rather dislike it because there's currently sanity-checking > code in > Dazzle to stop you doing stuff like this...) > > Any better ideas? > > Thomas. > > PS. Yes, I'm aware this is a solved problem in DAS/2. Afraid I need > a DAS/1 > solution, though. > _______________________________________________ > DAS mailing list > DAS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das > > > _______________________________________________ > DAS mailing list > DAS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das > > From thomas.a.down at gmail.com Mon Mar 7 11:10:35 2011 From: thomas.a.down at gmail.com (Thomas Down) Date: Mon, 7 Mar 2011 16:10:35 +0000 Subject: [DAS] Locations between bases In-Reply-To: <3D11AC15-23B4-4AF1-AF90-136AFAF1540A@ebi.ac.uk> References:

<3D11AC15-23B4-4AF1-AF90-136AFAF1540A@ebi.ac.uk> Message-ID: On Mon, Mar 7, 2011 at 4:07 PM, Ewan Birney wrote: > > I'd say this is start=end, and a convention that this indicates the point > to the left or the right with a glyph. With this converntion for the edge > case one either > has to allow the 0 base (right based insertion convention, insertion at the > very > start) or end+1 position (left based insertion convention, insertion at the > every end). > Thanks Ewan (+Ed and Andy), that sounds best. I'm going to need a non-standard stylesheet glyph to do what I'm after as well, so perhaps it's best to attach the "interbaseness" to that, rather than the feature itself. > I know, this triggers doing things on inter-base coordinates, but that's > just > a big, big change. > Yep, that's not really an option. Thomas. From andy.jenkinson at ebi.ac.uk Mon Mar 7 11:19:03 2011 From: andy.jenkinson at ebi.ac.uk (Andy Jenkinson) Date: Mon, 7 Mar 2011 16:19:03 +0000 Subject: [DAS] Adjacent feature extension In-Reply-To: <2869357C-159E-4D2D-A169-B91D63E31680@sanger.ac.uk> References: <4CB1186B-D5F0-4750-A297-88538BB196B5@ebi.ac.uk>

<6E8B83BF-8B65-41F8-B697-696431D5176E@sanger.ac.uk> <2869357C-159E-4D2D-A169-B91D63E31680@sanger.ac.uk> Message-ID: <3F20D5FA-A369-4045-A865-728D2D7312ED@ebi.ac.uk> On 7 Mar 2011, at 15:49, Jonathan Warren wrote: > > On 7 Mar 2011, at 15:04, Andy Jenkinson wrote: > >> On 7 Mar 2011, at 12:16, Jonathan Warren wrote: >> >>> I'd say if we don't have any more objections in the next couple of days then go with your proposal as is? I'll then put support into the registry this week if that is the case. If you could also then copy the proposal from here https://github.com/dasmoth/dalliance/wiki/AdjacentFeatures to the extensions page here: >>> http://www.biodas.org/wiki/DAS1.6E#Adjacent_Feature_filter noting in large letters that it was agreed by the community on such a such a date? >> >> I think there is a lot left to be clarified so adopting it "as is" is a no go for me. In particular, take a look at this diagram and see if you can work out what will be returned with "adjacent" queries for either side of the viewing area, and do they make sense for what the client is trying to achieve? >> >> >> The client has "seen" gene 2 and all its parts. >> >> If the client asks for features adjacent to the left/right sides of the viewing area, what should the server return? > I don't think it makes sense to ask for a next right in this case as there are features here already. This is for sparse data sources so it's ok just to return whats there if someone specifically wants to hit the next feature button or a client can blank the next right button out. It's up to the client. Agree. But what I don't want to see is clients implementing some weird hybrid where they offer a "next right" button that bypasses SNP 2. If we expect clients to behave in a certain way, we should say so. > Next left should return SNP1 if asked for an adjacent request.... or genes and constituents if filtered on gene. Why SNP 1, and not any of the others at the same position? How is the server supposed to decide? Does it matter? How would this be worded in the spec? > If you take the intention of this as finding features where data is sparse then I don't think there are big issues. These aren't big issues (taken in that context), but I absolutely want to make sure we don't make the mistakes of the past by leaving ambiguity in the spec - whether it's an extension or otherwise. It's all very well us knowing what we designed it for, but if it isn't written down then it's going to cause problems. For the avoidance of doubt, I am very keen to get this done, but I see no sense in doing it in a way that we're not going to regret later (of which there are already countless examples). > Part of the point of the extensions phase is to try these things out with examples and refine the specs. To leave acceptance of this will be a big mistake in my view. I'm not sure what you mean by "leave acceptance". I'm trying to work through these things, not put blocks in the way. I am trying right now to implement it and these are the things I have immediately come up against so I need to get input right now on how to do it. Or to put another way, I can't create my example without refining the specs. From jw12 at sanger.ac.uk Mon Mar 7 11:31:13 2011 From: jw12 at sanger.ac.uk (Jonathan Warren) Date: Mon, 7 Mar 2011 16:31:13 +0000 Subject: [DAS] Adjacent feature extension In-Reply-To: <3F20D5FA-A369-4045-A865-728D2D7312ED@ebi.ac.uk> References: <4CB1186B-D5F0-4750-A297-88538BB196B5@ebi.ac.uk>

<6E8B83BF-8B65-41F8-B697-696431D5176E@sanger.ac.uk> <2869357C-159E-4D2D-A169-B91D63E31680@sanger.ac.uk> <3F20D5FA-A369-4045-A865-728D2D7312ED@ebi.ac.uk> Message-ID: <15C814D8-14CD-487B-9AD9-2D5ECAABE6E0@sanger.ac.uk> On 7 Mar 2011, at 16:19, Andy Jenkinson wrote: > On 7 Mar 2011, at 15:49, Jonathan Warren wrote: > >> >> On 7 Mar 2011, at 15:04, Andy Jenkinson wrote: >> >>> On 7 Mar 2011, at 12:16, Jonathan Warren wrote: >>> >>>> I'd say if we don't have any more objections in the next couple >>>> of days then go with your proposal as is? I'll then put support >>>> into the registry this week if that is the case. If you could >>>> also then copy the proposal from here https://github.com/dasmoth/dalliance/wiki/AdjacentFeatures >>>> to the extensions page here: >>>> http://www.biodas.org/wiki/DAS1.6E#Adjacent_Feature_filter noting >>>> in large letters that it was agreed by the community on such a >>>> such a date? >>> >>> I think there is a lot left to be clarified so adopting it "as is" >>> is a no go for me. In particular, take a look at this diagram and >>> see if you can work out what will be returned with "adjacent" >>> queries for either side of the viewing area, and do they make >>> sense for what the client is trying to achieve? >>> >>> >>> The client has "seen" gene 2 and all its parts. >>> >>> If the client asks for features adjacent to the left/right sides >>> of the viewing area, what should the server return? >> I don't think it makes sense to ask for a next right in this case >> as there are features here already. This is for sparse data sources >> so it's ok just to return whats there if someone specifically wants >> to hit the next feature button or a client can blank the next right >> button out. It's up to the client. > > Agree. But what I don't want to see is clients implementing some > weird hybrid where they offer a "next right" button that bypasses > SNP 2. If we expect clients to behave in a certain way, we should > say so. > >> Next left should return SNP1 if asked for an adjacent request.... >> or genes and constituents if filtered on gene. > > Why SNP 1, and not any of the others at the same position? How is > the server supposed to decide? Does it matter? How would this be > worded in the spec? It doesn't matter unless filtered. > >> If you take the intention of this as finding features where data is >> sparse then I don't think there are big issues. > > These aren't big issues (taken in that context), but I absolutely > want to make sure we don't make the mistakes of the past by leaving > ambiguity in the spec - whether it's an extension or otherwise. It's > all very well us knowing what we designed it for, but if it isn't > written down then it's going to cause problems. For the avoidance of > doubt, I am very keen to get this done, but I see no sense in doing > it in a way that we're not going to regret later (of which there are > already countless examples). > >> Part of the point of the extensions phase is to try these things >> out with examples and refine the specs. To leave acceptance of this >> will be a big mistake in my view. > > I'm not sure what you mean by "leave acceptance". I'm trying to work > through these things, not put blocks in the way. I am trying right > now to implement it and these are the things I have immediately come > up against so I need to get input right now on how to do it. Cool! I'll shut up then. > Or to put another way, I can't create my example without refining > the specs. Jonathan Warren Senior Developer and DAS coordinator blog: http://biodasman.wordpress.com/ jw12 at sanger.ac.uk Ext: 2314 Telephone: 01223 492314 -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From thomas.a.down at gmail.com Mon Mar 7 11:37:06 2011 From: thomas.a.down at gmail.com (Thomas Down) Date: Mon, 7 Mar 2011 16:37:06 +0000 Subject: [DAS] Adjacent feature extension In-Reply-To: References: <4CB1186B-D5F0-4750-A297-88538BB196B5@ebi.ac.uk>

<6E8B83BF-8B65-41F8-B697-696431D5176E@sanger.ac.uk> Message-ID: On Mon, Mar 7, 2011 at 3:04 PM, Andy Jenkinson wrote: > What if the genes were nonpositional features? > Addressing this one separately: I don't believe non-positional features are adjacent to anything, and => they should never be included as the result of an adjacent= filter. Happy to clarify this. Anyone want to argue this the other way? Thomas. From andy.jenkinson at ebi.ac.uk Mon Mar 7 11:44:16 2011 From: andy.jenkinson at ebi.ac.uk (Andy Jenkinson) Date: Mon, 7 Mar 2011 16:44:16 +0000 Subject: [DAS] Adjacent feature extension In-Reply-To: <15C814D8-14CD-487B-9AD9-2D5ECAABE6E0@sanger.ac.uk> References: <4CB1186B-D5F0-4750-A297-88538BB196B5@ebi.ac.uk>

<6E8B83BF-8B65-41F8-B697-696431D5176E@sanger.ac.uk> <2869357C-159E-4D2D-A169-B91D63E31680@sanger.ac.uk> <3F20D5FA-A369-4045-A865-728D2D7312ED@ebi.ac.uk> <15C814D8-14CD-487B-9AD9-2D5ECAABE6E0@sanger.ac.uk> Message-ID: On 7 Mar 2011, at 16:31, Jonathan Warren wrote: > > On 7 Mar 2011, at 16:19, Andy Jenkinson wrote: > >> On 7 Mar 2011, at 15:49, Jonathan Warren wrote: >> >>> >>> On 7 Mar 2011, at 15:04, Andy Jenkinson wrote: >>> >>>> On 7 Mar 2011, at 12:16, Jonathan Warren wrote: >>>> >>>>> I'd say if we don't have any more objections in the next couple of days then go with your proposal as is? I'll then put support into the registry this week if that is the case. If you could also then copy the proposal from here https://github.com/dasmoth/dalliance/wiki/AdjacentFeatures to the extensions page here: >>>>> http://www.biodas.org/wiki/DAS1.6E#Adjacent_Feature_filter noting in large letters that it was agreed by the community on such a such a date? >>>> >>>> I think there is a lot left to be clarified so adopting it "as is" is a no go for me. In particular, take a look at this diagram and see if you can work out what will be returned with "adjacent" queries for either side of the viewing area, and do they make sense for what the client is trying to achieve? >>>> >>>> >>>> The client has "seen" gene 2 and all its parts. >>>> >>>> If the client asks for features adjacent to the left/right sides of the viewing area, what should the server return? >>> I don't think it makes sense to ask for a next right in this case as there are features here already. This is for sparse data sources so it's ok just to return whats there if someone specifically wants to hit the next feature button or a client can blank the next right button out. It's up to the client. >> >> Agree. But what I don't want to see is clients implementing some weird hybrid where they offer a "next right" button that bypasses SNP 2. If we expect clients to behave in a certain way, we should say so. >> >>> Next left should return SNP1 if asked for an adjacent request.... or genes and constituents if filtered on gene. >> >> Why SNP 1, and not any of the others at the same position? How is the server supposed to decide? Does it matter? How would this be worded in the spec? > It doesn't matter unless filtered. OK. Unless there are any objections, Thomas can you add this to the wiki page? Something like "If there is a choice of features at the same position, the server may return any one of them."? As for the overlap question, let's say that features overlapping the "adjacent" parameter can be included, as it's too complicated otherwise (I just thought of another edge case and it's not pretty!). We should also include a "design note" for client developers. For returning parents/parts, let's explicitly limit it to one feature. It does make it behave unlike the "normal" segment-based request, but it makes lots of the other issues, like nonpositional parents, moot. Everyone happy? From edgrif at sanger.ac.uk Mon Mar 7 11:02:29 2011 From: edgrif at sanger.ac.uk (Ed Griffiths) Date: Mon, 7 Mar 2011 16:02:29 +0000 (GMT) Subject: [DAS] Locations between bases In-Reply-To: References: Message-ID: Hi Thomas, A while ago I spent some time trying to persuade the GFF version 3 people not to use base positions to code this difference as it leads to ambiguity. I failed but I would urge you not to use positions to do it but instead have some other data that indicates that what you want is the position between two bases. Ed On Mon, 7 Mar 2011, Thomas Down wrote: > The day has come when I find I need to be able to distinguish between a > feature which actually *covers* one or a small number of bases (e.g. a SNP) > and a feature which refers to a point between two bases (e.g. an > insertion). Have any other DAS folk dealt with this before, and if so how? > One possible approach would be something like: > > 30000000 > 29999999 > > ...as a way to reference the position between the 29999999th and 30000000th > bases in a sequence... but is this going to break lots of existing client > code? > > (I also rather dislike it because there's currently sanity-checking code in > Dazzle to stop you doing stuff like this...) > > Any better ideas? > > Thomas. > > PS. Yes, I'm aware this is a solved problem in DAS/2. Afraid I need a DAS/1 > solution, though. > _______________________________________________ > DAS mailing list > DAS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das > -- ------------------------------------------------------------------------ | Ed Griffiths, Acedb/ZMap development, Informatics Group, | | The Morgan Building, Sanger Institute, Wellcome Trust Genome Campus | | Hinxton, Cambridge CB10 1HH | | | | email: edgrif at sanger.ac.uk Tel: +44-1223-496844 Fax: +44-1223-494919 | ------------------------------------------------------------------------ -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From thomas.a.down at gmail.com Mon Mar 7 11:51:10 2011 From: thomas.a.down at gmail.com (Thomas Down) Date: Mon, 7 Mar 2011 16:51:10 +0000 Subject: [DAS] Adjacent feature extension In-Reply-To: References: <4CB1186B-D5F0-4750-A297-88538BB196B5@ebi.ac.uk>

<6E8B83BF-8B65-41F8-B697-696431D5176E@sanger.ac.uk> Message-ID: There are several more-or-less separate issues tied up here that I see: 1. Handling of PART/PARENT. I neglected this completely in my original proposal. I can think of a couple of solutions: a) Treat a whole PART/PARENT graph as a single "complex" feature. This means that you'll always get full graphs back from any kind of feature query (modulo type filtering). If any part of the complex feature counts as "adjacent", then you'll get the whole thing back. b) Specify the adjacent= filter as ignoring any features with a PARENT attribute set. Jonathan/Andy, do either of you have an opinion on either of these two. I guess a) is probably the least likely to cause surprise? 2. My idea that an overlapping feature can count as adjacent. This initially seemed nice and simple but Andy's example of "SNP 2" has shown why it's broken when you have overlapping features. I'm quite happy to have the adjacent filter only select features that don't overlap the query position. 3. Features with matching start/end positions (therefore "equally adjacent"). I'm going to say "server picks" in this case. The client at least knows there's something there! I actually think this situation will be pretty rare in practice (for truly separate features, rather than having a transcript and exon starting in the same position). Does that tighten things up? Andy, thanks for thrashing this out. As you can probably work out, the use cases I've been working to involve rather sparsely-distributed features, but it's good to sort out the corner cases that arise as the density increases. Thomas. On Mon, Mar 7, 2011 at 3:04 PM, Andy Jenkinson wrote: > On 7 Mar 2011, at 12:16, Jonathan Warren wrote: > > > I'd say if we don't have any more objections in the next couple of days > then go with your proposal as is? I'll then put support into the registry > this week if that is the case. If you could also then copy the proposal from > here https://github.com/dasmoth/dalliance/wiki/AdjacentFeatures to the > extensions page here: > > http://www.biodas.org/wiki/DAS1.6E#Adjacent_Feature_filter noting in > large letters that it was agreed by the community on such a such a date? > > I think there is a lot left to be clarified so adopting it "as is" is a no > go for me. In particular, take a look at this diagram and see if you can > work out what will be returned with "adjacent" queries for either side of > the viewing area, and do they make sense for what the client is trying to > achieve? > > > > The client has "seen" gene 2 and all its parts. > > If the client asks for features adjacent to the left/right sides of the > viewing area, what should the server return? > To the left: gene 1, transcript 1, exon 2, or SNP 1? Does it matter? Should > gene 1 and all its parts be returned because that is what happens in a > segment query? If not, is this confusing for the spec? > To the right: gene 2, transcript 2, exon 4, transcript 3, exon 5, or SNP 2? > > If no special arrangements are made for excluding overlapping features > (thus either gene 2, transcript 2 or exon 4 are returned above), what > position should the client submit instead in its overlap query to get SNP 2 > and SNP 3? (Hint: it's impossible to get SNP 2). > > What if the genes were nonpositional features? > > > None of the above are unresolvable problems, the simplest way it to say > that overlapping features should be returned, and that clients should not > try to jump beyond them. It means you don't really get a "next feature" > capability in the same way a user probably intends (i.e. "next gene"), but > does make it impossible to miss transcript 3 and SNP 2 in the above diagram. > It still needs to be defined which feature will be returned if multiple have > the same position that case (ideally we want the one that extends the > furthers in the adjacent direction), but that can be complicated by things > like nonpositional parent features etc. > > The alternative is to exclude overlapping features, but that raises the > question of whether parts are considered overlapping if their parents are. > Implementing "next gene" means excluding all of gene 2's transcripts and > exons (thus returning SNP 2), but is this even what the user meant? Maybe > they meant "next transcript". > > So not simple. But the main questions are: > 1. should overlapping features be excluded in overlap requests > 2. if so, should non-overlapping features with overlapping parents/parts be > excluded? > 3. separate from 1 and 2, should the nearest feature's parents and parts > also be returned? > From rmb32 at cornell.edu Mon Mar 7 11:38:13 2011 From: rmb32 at cornell.edu (Robert Buels) Date: Mon, 07 Mar 2011 11:38:13 -0500 Subject: [DAS] Google Summer of Code project ideas Message-ID: <4D7509F5.6010703@cornell.edu> Hi all, I'm going to be OBF project admin again this year for Google Summer of code. OBF's application is due later this week, and we need to update our project ideas on the OBF wiki page and on each project's individual wiki pages. So, for each of the OBF projects that wants to do GSoC again this year, please: a.) Update the list of project ideas on your project's GSoC page (BioPython, BioPerl, BioRuby, etc). Add new ones, remove ones that have already been done or no longer relevant, etc. b.) Update the list of project ideas on the main OBF GSoC page (http://www.open-bio.org/wiki/Google_Summer_of_Code) to match. c.) Let me know via email that you have done so and it's ready for Google to peruse. Please have the updates done, if possible, by this Friday (March 11). The number and quality of the project ideas are part of the evaluation process for whether OBF is accepted as a Summer of Code organization again this year, so let's come up with some good ones. :-) Rob ---- Robert Buels (prospective) 2011 OBF GSoC Organization Admin From lincoln.stein at gmail.com Mon Mar 7 12:29:16 2011 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Mon, 7 Mar 2011 09:29:16 -0800 Subject: [DAS] Locations between bases In-Reply-To: References: Message-ID: Hi All, The GFF3 convention is to make start==end and to have the flagged insertion occur to the right of the designated base. The elegant way to do this is to use Jim Kent's convention of half-open spans, in which case the coordinates apply to the position BETWEEN bases, but this causes too much confusion to satisfy a relatively rare requirement. Lincoln On Mon, Mar 7, 2011 at 8:02 AM, Ed Griffiths wrote: > Hi Thomas, > > A while ago I spent some time trying to persuade the GFF version 3 people > not to use base positions to code this difference as it leads to ambiguity. > I failed but I would urge you not to use positions to do it but instead have > some other data that indicates that what you want is the position between > two bases. > > Ed > > > > On Mon, 7 Mar 2011, Thomas Down wrote: > > The day has come when I find I need to be able to distinguish between a >> feature which actually *covers* one or a small number of bases (e.g. a >> SNP) >> and a feature which refers to a point between two bases (e.g. an >> insertion). Have any other DAS folk dealt with this before, and if so >> how? >> One possible approach would be something like: >> >> 30000000 >> 29999999 >> >> ...as a way to reference the position between the 29999999th and >> 30000000th >> bases in a sequence... but is this going to break lots of existing client >> code? >> >> (I also rather dislike it because there's currently sanity-checking code >> in >> Dazzle to stop you doing stuff like this...) >> >> Any better ideas? >> >> Thomas. >> >> PS. Yes, I'm aware this is a solved problem in DAS/2. Afraid I need a >> DAS/1 >> solution, though. >> _______________________________________________ >> DAS mailing list >> DAS at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/das >> >> > -- > ------------------------------------------------------------------------ > | Ed Griffiths, Acedb/ZMap development, Informatics Group, | > | The Morgan Building, Sanger Institute, Wellcome Trust Genome Campus | > | Hinxton, Cambridge CB10 1HH | > | | > | email: edgrif at sanger.ac.uk Tel: +44-1223-496844 Fax: +44-1223-494919 | > ------------------------------------------------------------------------ > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research Limited, > a charity registered in England with number 1021457 and a company registered > in England with number 2742969, whose registered office is 215 Euston Road, > London, NW1 2BE. _______________________________________________ > > DAS mailing list > DAS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das > -- Lincoln D. Stein Director, Informatics and Biocomputing Platform Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Renata Musa From David.Nix at hci.utah.edu Mon Mar 7 11:38:10 2011 From: David.Nix at hci.utah.edu (David Nix) Date: Mon, 7 Mar 2011 09:38:10 -0700 Subject: [DAS] Locations between bases In-Reply-To: Message-ID: My two cents... I'm coming late to this thread so pardon if I missing something but I'd strongly recommend adopting interbase coordinates for DAS/1 (maybe this is already the case?). This is explicitly specified in DAS/2 spec. It's pretty critical when it comes to calling SNPs and INDELs. -cheers, D On 3/7/11 9:10 AM, "Thomas Down" wrote: On Mon, Mar 7, 2011 at 4:07 PM, Ewan Birney wrote: > > I'd say this is start=end, and a convention that this indicates the point > to the left or the right with a glyph. With this converntion for the edge > case one either > has to allow the 0 base (right based insertion convention, insertion at the > very > start) or end+1 position (left based insertion convention, insertion at the > every end). > Thanks Ewan (+Ed and Andy), that sounds best. I'm going to need a non-standard stylesheet glyph to do what I'm after as well, so perhaps it's best to attach the "interbaseness" to that, rather than the feature itself. > I know, this triggers doing things on inter-base coordinates, but that's > just > a big, big change. > Yep, that's not really an option. Thomas. _______________________________________________ DAS mailing list DAS at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/das From rmb32 at cornell.edu Thu Mar 10 12:18:34 2011 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 10 Mar 2011 12:18:34 -0500 Subject: [DAS] update Google Summer of Code project ideas Message-ID: <4D7907EA.202@cornell.edu> Hi all, Please make sure the BioDAS information is up to date for 2011 on both the OBF and BioDAS wikis. The current page is not fully updated for 2011. You might want to consider making the page name non-year-specific, and update the link to it. Please get this done in the next few days, I'm sending the OBF's application in to Google today, and they will be evaluating us to see that we're fully prepared. OBF wiki page: http://www.open-bio.org/wiki/Google_Summer_of_Code BioDAS wiki: http://www.biodas.org/wiki/GSoC2010 Rob ---- Robert Buels (prospective) 2011 OBF GSoC Organization Admin From andy.jenkinson at ebi.ac.uk Fri Mar 11 05:15:14 2011 From: andy.jenkinson at ebi.ac.uk (Andy Jenkinson) Date: Fri, 11 Mar 2011 10:15:14 +0000 Subject: [DAS] update Google Summer of Code project ideas In-Reply-To: <4D7907EA.202@cornell.edu> References: <4D7907EA.202@cornell.edu> Message-ID: <94994A04-3521-41C6-9BE3-417903453FB9@ebi.ac.uk> Thanks Robert, I've created a 2011 version (with the same projects!). I think they're both still doable. Is it too late to add more? On 10 Mar 2011, at 17:18, Robert Buels wrote: > Hi all, > > Please make sure the BioDAS information is up to date for 2011 on both the OBF and BioDAS wikis. The current page is not fully updated for 2011. > > You might want to consider making the page name non-year-specific, and update the link to it. > > Please get this done in the next few days, I'm sending the OBF's application in to Google today, and they will be evaluating us to see that we're fully prepared. > > OBF wiki page: http://www.open-bio.org/wiki/Google_Summer_of_Code > BioDAS wiki: http://www.biodas.org/wiki/GSoC2010 > > Rob > > ---- > Robert Buels > (prospective) 2011 OBF GSoC Organization Admin > _______________________________________________ > DAS mailing list > DAS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das From rmb32 at cornell.edu Fri Mar 11 11:02:47 2011 From: rmb32 at cornell.edu (Robert Buels) Date: Fri, 11 Mar 2011 11:02:47 -0500 Subject: [DAS] update Google Summer of Code project ideas In-Reply-To: <94994A04-3521-41C6-9BE3-417903453FB9@ebi.ac.uk> References: <4D7907EA.202@cornell.edu> <94994A04-3521-41C6-9BE3-417903453FB9@ebi.ac.uk> Message-ID: <4D7A47A7.2010105@cornell.edu> On 03/11/2011 05:15 AM, Andy Jenkinson wrote: > Thanks Robert, I've created a 2011 version (with the same projects!). I think they're both still doable. Is it too late to add more? No, not too late at all. You can edit it continuously, I just want to make sure that the pages look presentable by the time Google looks at them. They're not frozen or anything. Rob From suzi at berkeleybop.org Fri Mar 11 14:00:09 2011 From: suzi at berkeleybop.org (Suzanna Lewis) Date: Fri, 11 Mar 2011 11:00:09 -0800 Subject: [DAS] Locations between bases In-Reply-To: References: Message-ID: +1 On Mar 7, 2011, at 8:38 AM, David Nix wrote: > My two cents... > > I'm coming late to this thread so pardon if I missing something but I'd strongly recommend adopting interbase coordinates for DAS/1 (maybe this is already the case?). This is explicitly specified in DAS/2 spec. It's pretty critical when it comes to calling SNPs and INDELs. > > -cheers, D > > > On 3/7/11 9:10 AM, "Thomas Down" wrote: > > On Mon, Mar 7, 2011 at 4:07 PM, Ewan Birney wrote: > >> >> I'd say this is start=end, and a convention that this indicates the point >> to the left or the right with a glyph. With this converntion for the edge >> case one either >> has to allow the 0 base (right based insertion convention, insertion at the >> very >> start) or end+1 position (left based insertion convention, insertion at the >> every end). >> > > Thanks Ewan (+Ed and Andy), that sounds best. I'm going to need a > non-standard stylesheet glyph to do what I'm after as well, so perhaps it's > best to attach the "interbaseness" to that, rather than the feature itself. > > >> I know, this triggers doing things on inter-base coordinates, but that's >> just >> a big, big change. >> > > Yep, that's not really an option. > > Thomas. > _______________________________________________ > DAS mailing list > DAS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das > > > _______________________________________________ > DAS mailing list > DAS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das From jw12 at sanger.ac.uk Mon Mar 14 13:36:18 2011 From: jw12 at sanger.ac.uk (Jonathan Warren) Date: Mon, 14 Mar 2011 17:36:18 +0000 Subject: [DAS] DAS Registry update Message-ID: The registry has been updated to include: -data source response times per capability (ms) - hopefully this information will be useful to clients? -adjacent-feature capability support - new extension support to find next feature in sparsely populated das sources. -sources queried by coordinate system uri or id (interface also has links) rather than having to construct a query based on all components of coordinate systems i.e. authority, type, version, taxonomy. For more info read the history notes or just ask: Registry V3.0.6 14 03 2011 * response times for each capability for each source are now optionally returned in the sources document. Add the paramter times=true and these are listed e.g. http://www.dasregistry.org/das/sources?times=true or for a single source http://www.dasregistry.org/das/DS_409?times=true * all sources that have a particular coordinate system will be returned if you query using the coordinateid param and the url or just CS_DS identifier: http://www.dasregistry.org/das/sources?rows=1-10;coordinateid=CS_DS10 or the full coordinatesystem uri such as for sources with a specific coordinatesystem http://www.dasregistry/das/sources?rows=1-10;coordinateid=http://www.dasregistry.org/dasregistry/coordsys/CS_DS10 * added adjacent-feature capability validation * Capabilities now have correct name with "-" instead of "_" Jonathan Warren Senior Developer and DAS coordinator blog: http://biodasman.wordpress.com/ jw12 at sanger.ac.uk Ext: 2314 Telephone: 01223 492314 -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From jw12 at sanger.ac.uk Thu Mar 17 06:01:08 2011 From: jw12 at sanger.ac.uk (Jonathan Warren) Date: Thu, 17 Mar 2011 10:01:08 +0000 Subject: [DAS] change to registry ids Message-ID: <0541B8A0-5501-4C3C-B4B9-B48CB7B66D44@sanger.ac.uk> Very soon I'd like to change the data source Ids in the registry that have a prefix other than DS_ such as ENSEMBL_ or GRAMENE_ to a normal DS_ id. Over the last 4-6 months new ensembl and gramene sources been registered with normal DS_ ids anyway so I'd be surprised if this change would effect anything, so this change just affects sources registered a long time ago. If anyone has a client that relies on these or thinks that this would break anything please let me know in the next couple of days. Thanks Jonathan. Jonathan Warren Senior Developer and DAS coordinator blog: http://biodasman.wordpress.com/ jw12 at sanger.ac.uk Ext: 2314 Telephone: 01223 492314 -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From David.Nix at hci.utah.edu Thu Mar 17 11:13:43 2011 From: David.Nix at hci.utah.edu (David Nix) Date: Thu, 17 Mar 2011 09:13:43 -0600 Subject: [DAS] change to registry ids In-Reply-To: <0541B8A0-5501-4C3C-B4B9-B48CB7B66D44@sanger.ac.uk> Message-ID: Any chance of getting the registry to work with DAS/2 servers? Even a stripped down entry would help. -cheers, D On 3/17/11 4:01 AM, "Jonathan Warren" wrote: Very soon I'd like to change the data source Ids in the registry that have a prefix other than DS_ such as ENSEMBL_ or GRAMENE_ to a normal DS_ id. Over the last 4-6 months new ensembl and gramene sources been registered with normal DS_ ids anyway so I'd be surprised if this change would effect anything, so this change just affects sources registered a long time ago. If anyone has a client that relies on these or thinks that this would break anything please let me know in the next couple of days. Thanks Jonathan. Jonathan Warren Senior Developer and DAS coordinator blog: http://biodasman.wordpress.com/ jw12 at sanger.ac.uk Ext: 2314 Telephone: 01223 492314 -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. _______________________________________________ DAS mailing list DAS at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/das From rmb32 at cornell.edu Fri Mar 18 15:24:57 2011 From: rmb32 at cornell.edu (Robert Buels) Date: Fri, 18 Mar 2011 15:24:57 -0400 Subject: [DAS] Google Summer of Code is *ON* for OBF projects! Message-ID: <4D83B189.7040701@cornell.edu> Hi all, Great news: Google announced today that the Open Bioinformatics Foundation has been accepted as a mentoring organization for this summer's Google Summer of Code! GSoC is a Google-sponsored student internship program for open-source projects, open to students from around the world (not just US residents). Students are paid a $5000 USD stipend to work as a developer on an open-source project for the summer. For more on GSoC, see GSoC 2011 FAQ at http://bit.ly/hpoz8W Student applications are due April 8, 2011 at 19:00 UTC. Students who are interested in participating should look at the OBF's GSoC page at http://open-bio.org/wiki/Google_Summer_of_Code, which lists project ideas, and whom to contact about applying. For current developers on OBF projects, please consider volunteering to be a mentor if you have not already, and contribute project ideas. Just list your name and project ideas on OBF wiki and on the relevant project's GSoC wiki page. Thanks to all who helped make OBF's application to GSoC a success, and let's have a great, productive summer of code! Rob Buels OBF GSoC 2011 Administrator From jprocter at compbio.dundee.ac.uk Sat Mar 19 15:48:37 2011 From: jprocter at compbio.dundee.ac.uk (Jim Procter) Date: Sat, 19 Mar 2011 19:48:37 +0000 Subject: [DAS] Google Summer of Code is *ON* for OBF projects! In-Reply-To: <4D83B189.7040701@cornell.edu> References: <4D83B189.7040701@cornell.edu> Message-ID: <4D850895.1070706@compbio.dundee.ac.uk> On 18/03/2011 19:24, Robert Buels wrote: > Hi all, > > Great news: Google announced today that the Open Bioinformatics > Foundation has been accepted as a mentoring organization for this > summer's Google Summer of Code! W000t! You might all be interested see the other Bio-related orgs who got in: Genome Informatics http://gmod.org/wiki/GSoC google/gsoc2011/genomeinformatics National Evolutionary Synthesis Center (NESCent) http://phyloinformatics.net/Phyloinformatics_Summer_of_Code_2011 National Resource for Network Biology (GenMAPP, Cytoscape and WikiPathways) http://nrnb.org/gsoc Orange: Data Mining Fruitful & Fun http://orange.biolab.si/trac/wiki/GSoC/Ideas Theoretical Biophysics @ Humboldt University http://rumo.biologie.hu-berlin.de/gsoc Its going to be another busy summer :) Jim. -- ------------------------------------------------------------------- J. B. Procter (JALVIEW/ENFIN) Barton Bioinformatics Research Group Phone/Fax:+44(0)1382 388734/345764 http://www.compbio.dundee.ac.uk The University of Dundee is a Scottish Registered Charity, No. SC015096. From gsalazar at cs.uct.ac.za Wed Mar 23 08:08:14 2011 From: gsalazar at cs.uct.ac.za (Gustavo Salazar) Date: Wed, 23 Mar 2011 12:08:14 +0000 Subject: [DAS] Pagination Proposal Message-ID: <6EF91EC3-3A7F-4E9B-AD26-043BCC6D04A3@cs.uct.ac.za> Hello all, Following the momentum that the DAS workshop let us I started tackling one of the many projects that we defined during the 3rd day: The pagination for the features command. I added the proposal for the extension in the wiki: http://www.biodas.org/wiki/DAS1.6E#Pagination_for_DAS I have implemented it in MyDas and is included in the snapshot version of the repository in case anyone wants to play with it http://mydas.googlecode.com/svn/snapshot-repository/uk/ac/ebi/mydas/mydas/1.6.5-SNAPSHOT/ As a nightly version it may have changes whenever we do a release after some proper testing of it Looking for your feedback about it! Cheers, Gustavo. From andy.jenkinson at ebi.ac.uk Wed Mar 23 11:01:47 2011 From: andy.jenkinson at ebi.ac.uk (Andy Jenkinson) Date: Wed, 23 Mar 2011 15:01:47 +0000 Subject: [DAS] Pagination Proposal In-Reply-To: <6EF91EC3-3A7F-4E9B-AD26-043BCC6D04A3@cs.uct.ac.za> References: <6EF91EC3-3A7F-4E9B-AD26-043BCC6D04A3@cs.uct.ac.za> Message-ID: <79671B06-2235-4D7C-84B1-DD0CD7CA6569@ebi.ac.uk> Hi Gustavo, Thanks for this, looks fine to me mostly. One question: should segments that contain no features (due to the pagination limit) be included in the response? /das/foo/features?search=a*;rows=1-2 ... Also, I can't remember if we decided at the workshop whether servers would be allowed to overrule the client's requested range and return (for example) a smaller number of rows. This is how entry_points works, which is why it has "start" and "end" attributes in the response in addition to the "total" attribute. On 23 Mar 2011, at 12:08, Gustavo Salazar wrote: > Hello all, > > Following the momentum that the DAS workshop let us I started tackling one of the many projects that we defined during the 3rd day: The pagination for the features command. > I added the proposal for the extension in the wiki: > http://www.biodas.org/wiki/DAS1.6E#Pagination_for_DAS > I have implemented it in MyDas and is included in the snapshot version of the repository in case anyone wants to play with it > http://mydas.googlecode.com/svn/snapshot-repository/uk/ac/ebi/mydas/mydas/1.6.5-SNAPSHOT/ > As a nightly version it may have changes whenever we do a release after some proper testing of it > > Looking for your feedback about it! > > Cheers, > > Gustavo. > _______________________________________________ > DAS mailing list > DAS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das From gsalazar at cs.uct.ac.za Wed Mar 23 11:43:20 2011 From: gsalazar at cs.uct.ac.za (Gustavo Salazar) Date: Wed, 23 Mar 2011 15:43:20 +0000 Subject: [DAS] Pagination Proposal In-Reply-To: <79671B06-2235-4D7C-84B1-DD0CD7CA6569@ebi.ac.uk> References: <6EF91EC3-3A7F-4E9B-AD26-043BCC6D04A3@cs.uct.ac.za> <79671B06-2235-4D7C-84B1-DD0CD7CA6569@ebi.ac.uk> Message-ID: <262F92F2-AB74-43B8-88E6-824B87F1B139@cs.uct.ac.za> Hey Andy, I think we didn't discussed if empty segments should be included. In the implementation I got I am not including those, I think is not necessary. I think the agreement was to report the error as an HTTP error, but that was though more in the perspective of clients that are not supporting this capability, trying to avoid servers responding paginated segments when the rows attribute was not included. Cheers, Gustavo. On 23 Mar 2011, at 15:01, Andy Jenkinson wrote: > Hi Gustavo, > > Thanks for this, looks fine to me mostly. > > One question: should segments that contain no features (due to the pagination limit) be included in the response? > > /das/foo/features?search=a*;rows=1-2 > > > > > > > > > ... > > > Also, I can't remember if we decided at the workshop whether servers would be allowed to overrule the client's requested range and return (for example) a smaller number of rows. This is how entry_points works, which is why it has "start" and "end" attributes in the response in addition to the "total" attribute. > > On 23 Mar 2011, at 12:08, Gustavo Salazar wrote: > >> Hello all, >> >> Following the momentum that the DAS workshop let us I started tackling one of the many projects that we defined during the 3rd day: The pagination for the features command. >> I added the proposal for the extension in the wiki: >> http://www.biodas.org/wiki/DAS1.6E#Pagination_for_DAS >> I have implemented it in MyDas and is included in the snapshot version of the repository in case anyone wants to play with it >> http://mydas.googlecode.com/svn/snapshot-repository/uk/ac/ebi/mydas/mydas/1.6.5-SNAPSHOT/ >> As a nightly version it may have changes whenever we do a release after some proper testing of it >> >> Looking for your feedback about it! >> >> Cheers, >> >> Gustavo. >> _______________________________________________ >> DAS mailing list >> DAS at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/das > > > _______________________________________________ > DAS mailing list > DAS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das From andy.jenkinson at ebi.ac.uk Wed Mar 23 12:23:07 2011 From: andy.jenkinson at ebi.ac.uk (Andy Jenkinson) Date: Wed, 23 Mar 2011 16:23:07 +0000 Subject: [DAS] Pagination Proposal In-Reply-To: <262F92F2-AB74-43B8-88E6-824B87F1B139@cs.uct.ac.za> References: <6EF91EC3-3A7F-4E9B-AD26-043BCC6D04A3@cs.uct.ac.za> <79671B06-2235-4D7C-84B1-DD0CD7CA6569@ebi.ac.uk> <262F92F2-AB74-43B8-88E6-824B87F1B139@cs.uct.ac.za> Message-ID: On 23 Mar 2011, at 15:43, Gustavo Salazar wrote: > Hey Andy, > I think we didn't discussed if empty segments should be included. In the implementation I got I am not including those, I think is not necessary. That's fine, it makes the implementation easier too. > I think the agreement was to report the error as an HTTP error, but that was though more in the perspective of clients that are not supporting this capability, trying to avoid servers responding paginated segments when the rows attribute was not included. Absolutely, that's good. I've updated the wiki with clarifications of these points, and an example response illustrating how the server should choose which rows to return. > Cheers, > Gustavo. > > > On 23 Mar 2011, at 15:01, Andy Jenkinson wrote: > >> Hi Gustavo, >> >> Thanks for this, looks fine to me mostly. >> >> One question: should segments that contain no features (due to the pagination limit) be included in the response? >> >> /das/foo/features?search=a*;rows=1-2 >> >> >> >> >> >> >> >> >> ... >> >> >> Also, I can't remember if we decided at the workshop whether servers would be allowed to overrule the client's requested range and return (for example) a smaller number of rows. This is how entry_points works, which is why it has "start" and "end" attributes in the response in addition to the "total" attribute. >> >> On 23 Mar 2011, at 12:08, Gustavo Salazar wrote: >> >>> Hello all, >>> >>> Following the momentum that the DAS workshop let us I started tackling one of the many projects that we defined during the 3rd day: The pagination for the features command. >>> I added the proposal for the extension in the wiki: >>> http://www.biodas.org/wiki/DAS1.6E#Pagination_for_DAS >>> I have implemented it in MyDas and is included in the snapshot version of the repository in case anyone wants to play with it >>> http://mydas.googlecode.com/svn/snapshot-repository/uk/ac/ebi/mydas/mydas/1.6.5-SNAPSHOT/ >>> As a nightly version it may have changes whenever we do a release after some proper testing of it >>> >>> Looking for your feedback about it! >>> >>> Cheers, >>> >>> Gustavo. >>> _______________________________________________ >>> DAS mailing list >>> DAS at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/das >> >> >> _______________________________________________ >> DAS mailing list >> DAS at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/das > From ljgarcia at ebi.ac.uk Mon Mar 28 05:36:09 2011 From: ljgarcia at ebi.ac.uk (Leyla Garcia) Date: Mon, 28 Mar 2011 10:36:09 +0100 Subject: [DAS] Pagination Proposal In-Reply-To: References: <6EF91EC3-3A7F-4E9B-AD26-043BCC6D04A3@cs.uct.ac.za> <79671B06-2235-4D7C-84B1-DD0CD7CA6569@ebi.ac.uk> <262F92F2-AB74-43B8-88E6-824B87F1B139@cs.uct.ac.za> Message-ID: <4D905689.6030302@ebi.ac.uk> On 23/03/2011 16:23, Andy Jenkinson wrote: > On 23 Mar 2011, at 15:43, Gustavo Salazar wrote: > >> Hey Andy, >> I think we didn't discussed if empty segments should be included. In the implementation I got I am not including those, I think is not necessary. > That's fine, it makes the implementation easier too. > >> I think the agreement was to report the error as an HTTP error, but that was though more in the perspective of clients that are not supporting this capability, trying to avoid servers responding paginated segments when the rows attribute was not included. Gustavo, do you mean segment= and nothing after the =? For features and segment those are reported as an ERRRORSEGMENT for myDAS. We will keep it in this way, right? Leyla > Absolutely, that's good. > > I've updated the wiki with clarifications of these points, and an example response illustrating how the server should choose which rows to return. > >> Cheers, >> Gustavo. >> >> >> On 23 Mar 2011, at 15:01, Andy Jenkinson wrote: >> >>> Hi Gustavo, >>> >>> Thanks for this, looks fine to me mostly. >>> >>> One question: should segments that contain no features (due to the pagination limit) be included in the response? >>> >>> /das/foo/features?search=a*;rows=1-2 >>> >>> >>> >>> >>> >>> >>> >>> >>> ... >>> >>> >>> Also, I can't remember if we decided at the workshop whether servers would be allowed to overrule the client's requested range and return (for example) a smaller number of rows. This is how entry_points works, which is why it has "start" and "end" attributes in the response in addition to the "total" attribute. >>> >>> On 23 Mar 2011, at 12:08, Gustavo Salazar wrote: >>> >>>> Hello all, >>>> >>>> Following the momentum that the DAS workshop let us I started tackling one of the many projects that we defined during the 3rd day: The pagination for the features command. >>>> I added the proposal for the extension in the wiki: >>>> http://www.biodas.org/wiki/DAS1.6E#Pagination_for_DAS >>>> I have implemented it in MyDas and is included in the snapshot version of the repository in case anyone wants to play with it >>>> http://mydas.googlecode.com/svn/snapshot-repository/uk/ac/ebi/mydas/mydas/1.6.5-SNAPSHOT/ >>>> As a nightly version it may have changes whenever we do a release after some proper testing of it >>>> >>>> Looking for your feedback about it! >>>> >>>> Cheers, >>>> >>>> Gustavo. >>>> _______________________________________________ >>>> DAS mailing list >>>> DAS at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/das >>> >>> _______________________________________________ >>> DAS mailing list >>> DAS at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/das > > _______________________________________________ > DAS mailing list > DAS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das From gsalazar at cs.uct.ac.za Mon Mar 28 06:26:58 2011 From: gsalazar at cs.uct.ac.za (Gustavo Salazar) Date: Mon, 28 Mar 2011 11:26:58 +0100 Subject: [DAS] Pagination Proposal In-Reply-To: <4D905689.6030302@ebi.ac.uk> References: <6EF91EC3-3A7F-4E9B-AD26-043BCC6D04A3@cs.uct.ac.za> <79671B06-2235-4D7C-84B1-DD0CD7CA6569@ebi.ac.uk> <262F92F2-AB74-43B8-88E6-824B87F1B139@cs.uct.ac.za> <4D905689.6030302@ebi.ac.uk> Message-ID: <1A1EB553-60CD-427E-9BEB-6B202C33148A@cs.uct.ac.za> On 28 Mar 2011, at 10:36, Leyla Garcia wrote: > On 23/03/2011 16:23, Andy Jenkinson wrote: >> On 23 Mar 2011, at 15:43, Gustavo Salazar wrote: >> >>> Hey Andy, >>> I think we didn't discussed if empty segments should be included. In the implementation I got I am not including those, I think is not necessary. >> That's fine, it makes the implementation easier too. >> >>> I think the agreement was to report the error as an HTTP error, but that was though more in the perspective of clients that are not supporting this capability, trying to avoid servers responding paginated segments when the rows attribute was not included. > Gustavo, do you mean segment= and nothing after the =? For features and segment those are reported as an ERRRORSEGMENT for myDAS. We will keep it in this way, right? No, I'm talking about something different here... this is for the cases that the response is too big that the server wont be able to response. We considered the option of doing a pagination forced by the server, even if is not requested, however it might confuse clients that are not supporting this capability and these may think they are getting all the results. Thats why we decide to go with a HTTP error approach + X-DAS Status. The "segment=" case was not discussed here, but Andy told me a while ago that he will include some descriptions about this case in a future amendment of the spec 1.6.1 > Leyla >> Absolutely, that's good. >> >> I've updated the wiki with clarifications of these points, and an example response illustrating how the server should choose which rows to return. >> >>> Cheers, >>> Gustavo. >>> >>> >>> On 23 Mar 2011, at 15:01, Andy Jenkinson wrote: >>> >>>> Hi Gustavo, >>>> >>>> Thanks for this, looks fine to me mostly. >>>> >>>> One question: should segments that contain no features (due to the pagination limit) be included in the response? >>>> >>>> /das/foo/features?search=a*;rows=1-2 >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> ... >>>> >>>> >>>> Also, I can't remember if we decided at the workshop whether servers would be allowed to overrule the client's requested range and return (for example) a smaller number of rows. This is how entry_points works, which is why it has "start" and "end" attributes in the response in addition to the "total" attribute. >>>> >>>> On 23 Mar 2011, at 12:08, Gustavo Salazar wrote: >>>> >>>>> Hello all, >>>>> >>>>> Following the momentum that the DAS workshop let us I started tackling one of the many projects that we defined during the 3rd day: The pagination for the features command. >>>>> I added the proposal for the extension in the wiki: >>>>> http://www.biodas.org/wiki/DAS1.6E#Pagination_for_DAS >>>>> I have implemented it in MyDas and is included in the snapshot version of the repository in case anyone wants to play with it >>>>> http://mydas.googlecode.com/svn/snapshot-repository/uk/ac/ebi/mydas/mydas/1.6.5-SNAPSHOT/ >>>>> As a nightly version it may have changes whenever we do a release after some proper testing of it >>>>> >>>>> Looking for your feedback about it! >>>>> >>>>> Cheers, >>>>> >>>>> Gustavo. >>>>> _______________________________________________ >>>>> DAS mailing list >>>>> DAS at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/das >>>> >>>> _______________________________________________ >>>> DAS mailing list >>>> DAS at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/das >> >> _______________________________________________ >> DAS mailing list >> DAS at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/das > > _______________________________________________ > DAS mailing list > DAS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das From jw12 at sanger.ac.uk Thu Mar 31 05:37:49 2011 From: jw12 at sanger.ac.uk (Jonathan Warren) Date: Thu, 31 Mar 2011 10:37:49 +0100 Subject: [DAS] Videos from DAS workshop now available Message-ID: <17BEA02E-A5B5-4553-A600-5FD772A74FC9@sanger.ac.uk> Hi Thanks again for all those who presented at this years DAS workshop early this month. Videos of the presentations from day 2 are now available here: http://www.scivee.tv/browse Type das in the search box and they come up. Links will soon be added to the workshop wiki page. I'm pleased to say the quality of the video and the audio is quite reasonable. Many of the presentations have already had a number of views which is very encouraging. Thanks Jonathan Jonathan Warren Senior Developer and DAS coordinator blog: http://biodasman.wordpress.com/ jw12 at sanger.ac.uk Ext: 2314 Telephone: 01223 492314 -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From jw12 at sanger.ac.uk Tue Mar 1 10:09:28 2011 From: jw12 at sanger.ac.uk (Jonathan Warren) Date: Tue, 1 Mar 2011 10:09:28 +0000 Subject: [DAS] RELAX NG schema for "sequence" command In-Reply-To: <0F015AE7-09D7-4F4D-B547-353830FD82A2@broadinstitute.org> References: <0F015AE7-09D7-4F4D-B547-353830FD82A2@broadinstitute.org> Message-ID: <563D9C80-555A-4866-9E74-F04FF5458E27@sanger.ac.uk> This is fixed now. On 28 Feb 2011, at 18:34, Andrew Roberts wrote: > Hello, > > I'm working on some validations, and I think I may have found an > error in sequence.rng (www.dasregistry.org/validation1.6/ > sequence.rng). It appears that this schema makes no provision for > the actual text data that belongs to the sequence element (i.e. the > bases themselves). Is this a known issue? > > If not, I think the solution is to add a element inside the > sequence element. Line 10 should read, > > attribute> attribute> attribute> text> > > Cheers, > > Andrew > _______________________________________________ > DAS mailing list > DAS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das Jonathan Warren Senior Developer and DAS coordinator blog: http://biodasman.wordpress.com/ jw12 at sanger.ac.uk Ext: 2314 Telephone: 01223 492314 -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From aroberts at broadinstitute.org Tue Mar 1 15:19:13 2011 From: aroberts at broadinstitute.org (Andrew Roberts) Date: Tue, 1 Mar 2011 10:19:13 -0500 Subject: [DAS] RELAX NG schema for "sequence" command In-Reply-To: <563D9C80-555A-4866-9E74-F04FF5458E27@sanger.ac.uk> References: <0F015AE7-09D7-4F4D-B547-353830FD82A2@broadinstitute.org> <563D9C80-555A-4866-9E74-F04FF5458E27@sanger.ac.uk> Message-ID: <1FD158A0-4032-48A2-AA86-383F9487BB52@broadinstitute.org> Alright, now I'm hitting the lack of support for ERRORSEGMENT within the sequence command response. I think there needs to be a choice inside the oneOrMore, so that elements can be SEQUENCE or ERRORSEGMENT. I attached what I'm using as a testing schema in the meantime, hopefully it's helpful. Andrew -------------- next part -------------- A non-text attachment was scrubbed... Name: sequence.rng Type: application/octet-stream Size: 1267 bytes Desc: not available URL: -------------- next part -------------- On Mar 1, 2011, at 5:09 AM, Jonathan Warren wrote: > This is fixed now. > > On 28 Feb 2011, at 18:34, Andrew Roberts wrote: > >> Hello, >> >> I'm working on some validations, and I think I may have found an >> error in sequence.rng (www.dasregistry.org/validation1.6/sequence.rng) >> . It appears that this schema makes no provision for the actual >> text data that belongs to the sequence element (i.e. the bases >> themselves). Is this a known issue? >> >> If not, I think the solution is to add a element inside >> the sequence element. Line 10 should read, >> >> > attribute>> attribute>> attribute>> text> >> >> Cheers, >> >> Andrew >> _______________________________________________ >> DAS mailing list >> DAS at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/das > > Jonathan Warren > Senior Developer and DAS coordinator > blog: http://biodasman.wordpress.com/ > jw12 at sanger.ac.uk > Ext: 2314 > Telephone: 01223 492314 > > > > > > > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome > ResearchLimited, a charity registered in England with number 1021457 > and acompany registered in England with number 2742969, whose > registeredoffice is 215 Euston Road, London, NW1 2BE. From jw12 at sanger.ac.uk Tue Mar 1 15:54:05 2011 From: jw12 at sanger.ac.uk (Jonathan Warren) Date: Tue, 1 Mar 2011 15:54:05 +0000 Subject: [DAS] RELAX NG schema for "sequence" command In-Reply-To: <1FD158A0-4032-48A2-AA86-383F9487BB52@broadinstitute.org> References: <0F015AE7-09D7-4F4D-B547-353830FD82A2@broadinstitute.org> <563D9C80-555A-4866-9E74-F04FF5458E27@sanger.ac.uk> <1FD158A0-4032-48A2-AA86-383F9487BB52@broadinstitute.org> Message-ID: <2516E6D2-3584-49AB-8648-14DECF48205D@sanger.ac.uk> Hi Andrew I guess this depends on what you are using the RNG for. What are you using it for? The registry needs the result from a sequence request to conform to the sequence.rng as it is on the registry as if a sequence is requested the test region put into the registry should return a sequence, if it returns an ERRORSEGMENT the the registry wants that reported as invalid. If I add the errorsegment to that document as you suggest the registry won't pick it up as invalid. On 1 Mar 2011, at 15:19, Andrew Roberts wrote: > Alright, now I'm hitting the lack of support for ERRORSEGMENT > within the sequence command response. I think there needs to be a > choice inside the oneOrMore, so that elements can be SEQUENCE or > ERRORSEGMENT. I attached what I'm using as a testing schema in the > meantime, hopefully it's helpful. > > Andrew > > > > On Mar 1, 2011, at 5:09 AM, Jonathan Warren wrote: > >> This is fixed now. >> >> On 28 Feb 2011, at 18:34, Andrew Roberts wrote: >> >>> Hello, >>> >>> I'm working on some validations, and I think I may have found an >>> error in sequence.rng (www.dasregistry.org/validation1.6/sequence.rng) >>> . It appears that this schema makes no provision for the actual >>> text data that belongs to the sequence element (i.e. the bases >>> themselves). Is this a known issue? >>> >>> If not, I think the solution is to add a element inside >>> the sequence element. Line 10 should read, >>> >>> >> attribute>>> attribute>>> attribute>>> text> >>> >>> Cheers, >>> >>> Andrew >>> _______________________________________________ >>> DAS mailing list >>> DAS at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/das >> >> Jonathan Warren >> Senior Developer and DAS coordinator >> blog: http://biodasman.wordpress.com/ >> jw12 at sanger.ac.uk >> Ext: 2314 >> Telephone: 01223 492314 >> >> >> >> >> >> >> >> >> >> -- >> The Wellcome Trust Sanger Institute is operated by Genome >> ResearchLimited, a charity registered in England with number >> 1021457 and acompany registered in England with number 2742969, >> whose registeredoffice is 215 Euston Road, London, NW1 2BE. > Jonathan Warren Senior Developer and DAS coordinator blog: http://biodasman.wordpress.com/ jw12 at sanger.ac.uk Ext: 2314 Telephone: 01223 492314 -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From andy.jenkinson at ebi.ac.uk Tue Mar 1 16:19:12 2011 From: andy.jenkinson at ebi.ac.uk (Andy Jenkinson) Date: Tue, 1 Mar 2011 16:19:12 +0000 Subject: [DAS] RELAX NG schema for "sequence" command In-Reply-To: <2516E6D2-3584-49AB-8648-14DECF48205D@sanger.ac.uk> References: <0F015AE7-09D7-4F4D-B547-353830FD82A2@broadinstitute.org> <563D9C80-555A-4866-9E74-F04FF5458E27@sanger.ac.uk> <1FD158A0-4032-48A2-AA86-383F9487BB52@broadinstitute.org> <2516E6D2-3584-49AB-8648-14DECF48205D@sanger.ac.uk> Message-ID: <45146209-B676-4307-B020-C87AFFFD0235@ebi.ac.uk> To put another way, the ERRORSEGMENT element is only valid if the server supports the error-segment capability. I guess the issue is that the only use of the RNG schemas upto now for validation has been by the registry. Personally from a technical point of view, I think including the errorsegment element in the RNG schemas (this will apply to many commands not just sequence) is the "correct" interpretation. However that means the registry would need to apply it's errorsegment test separately rather than relying on the rng to flag it as invalid. It already does the reverse of this test separately so this makes sense to me (i.e. if I ask for a bad segment do you give me an errorsegment element?). Cheers, Andy On 1 Mar 2011, at 15:54, Jonathan Warren wrote: > Hi Andrew > > I guess this depends on what you are using the RNG for. What are you using it for? The registry needs the result from a sequence request to conform to the sequence.rng as it is on the registry as if a sequence is requested the test region put into the registry should return a sequence, if it returns an ERRORSEGMENT the the registry wants that reported as invalid. If I add the errorsegment to that document as you suggest the registry won't pick it up as invalid. > > On 1 Mar 2011, at 15:19, Andrew Roberts wrote: > >> Alright, now I'm hitting the lack of support for ERRORSEGMENT within the sequence command response. I think there needs to be a choice inside the oneOrMore, so that elements can be SEQUENCE or ERRORSEGMENT. I attached what I'm using as a testing schema in the meantime, hopefully it's helpful. >> >> Andrew >> >> >> >> On Mar 1, 2011, at 5:09 AM, Jonathan Warren wrote: >> >>> This is fixed now. >>> >>> On 28 Feb 2011, at 18:34, Andrew Roberts wrote: >>> >>>> Hello, >>>> >>>> I'm working on some validations, and I think I may have found an error in sequence.rng (www.dasregistry.org/validation1.6/sequence.rng). It appears that this schema makes no provision for the actual text data that belongs to the sequence element (i.e. the bases themselves). Is this a known issue? >>>> >>>> If not, I think the solution is to add a element inside the sequence element. Line 10 should read, >>>> >>>> >>>> >>>> Cheers, >>>> >>>> Andrew >>>> _______________________________________________ >>>> DAS mailing list >>>> DAS at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/das >>> >>> Jonathan Warren >>> Senior Developer and DAS coordinator >>> blog: http://biodasman.wordpress.com/ >>> jw12 at sanger.ac.uk >>> Ext: 2314 >>> Telephone: 01223 492314 >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> -- >>> The Wellcome Trust Sanger Institute is operated by Genome ResearchLimited, a charity registered in England with number 1021457 and acompany registered in England with number 2742969, whose registeredoffice is 215 Euston Road, London, NW1 2BE. >> > > Jonathan Warren > Senior Developer and DAS coordinator > blog: http://biodasman.wordpress.com/ > jw12 at sanger.ac.uk > Ext: 2314 > Telephone: 01223 492314 > > > > > > > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome ResearchLimited, a charity registered in England with number 1021457 and acompany registered in England with number 2742969, whose registeredoffice is 215 Euston Road, London, NW1 2BE._______________________________________________ > DAS mailing list > DAS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das From aroberts at broadinstitute.org Tue Mar 1 16:20:38 2011 From: aroberts at broadinstitute.org (Andrew Roberts) Date: Tue, 1 Mar 2011 11:20:38 -0500 Subject: [DAS] RELAX NG schema for "sequence" command In-Reply-To: <2516E6D2-3584-49AB-8648-14DECF48205D@sanger.ac.uk> References: <0F015AE7-09D7-4F4D-B547-353830FD82A2@broadinstitute.org> <563D9C80-555A-4866-9E74-F04FF5458E27@sanger.ac.uk> <1FD158A0-4032-48A2-AA86-383F9487BB52@broadinstitute.org> <2516E6D2-3584-49AB-8648-14DECF48205D@sanger.ac.uk> Message-ID: <3E08E18F-CE0A-4B3F-A3FA-7A0EF5269E95@broadinstitute.org> Ah, interesting. I'm exercising my implementations of the das commands, and then validating the XML against the rng as a correctness test. I had been assuming that the rng was a description of all possible server responses, including error cases, but I now see what you guys are using them for. Thanks! On Mar 1, 2011, at 10:54 AM, Jonathan Warren wrote: > Hi Andrew > > I guess this depends on what you are using the RNG for. What are you using it for? The registry needs the result from a sequence request to conform to the sequence.rng as it is on the registry as if a sequence is requested the test region put into the registry should return a sequence, if it returns an ERRORSEGMENT the the registry wants that reported as invalid. If I add the errorsegment to that document as you suggest the registry won't pick it up as invalid. > > On 1 Mar 2011, at 15:19, Andrew Roberts wrote: > >> Alright, now I'm hitting the lack of support for ERRORSEGMENT within the sequence command response. I think there needs to be a choice inside the oneOrMore, so that elements can be SEQUENCE or ERRORSEGMENT. I attached what I'm using as a testing schema in the meantime, hopefully it's helpful. >> >> Andrew >> >> >> >> On Mar 1, 2011, at 5:09 AM, Jonathan Warren wrote: >> >>> This is fixed now. >>> >>> On 28 Feb 2011, at 18:34, Andrew Roberts wrote: >>> >>>> Hello, >>>> >>>> I'm working on some validations, and I think I may have found an error in sequence.rng (www.dasregistry.org/validation1.6/sequence.rng). It appears that this schema makes no provision for the actual text data that belongs to the sequence element (i.e. the bases themselves). Is this a known issue? >>>> >>>> If not, I think the solution is to add a element inside the sequence element. Line 10 should read, >>>> >>>> >>>> >>>> Cheers, >>>> >>>> Andrew >>>> _______________________________________________ >>>> DAS mailing list >>>> DAS at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/das >>> >>> Jonathan Warren >>> Senior Developer and DAS coordinator >>> blog: http://biodasman.wordpress.com/ >>> jw12 at sanger.ac.uk >>> Ext: 2314 >>> Telephone: 01223 492314 >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> -- >>> The Wellcome Trust Sanger Institute is operated by Genome ResearchLimited, a charity registered in England with number 1021457 and acompany registered in England with number 2742969, whose registeredoffice is 215 Euston Road, London, NW1 2BE. >> > > Jonathan Warren > Senior Developer and DAS coordinator > blog: http://biodasman.wordpress.com/ > jw12 at sanger.ac.uk > Ext: 2314 > Telephone: 01223 492314 > > > > > > > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome ResearchLimited, a charity registered in England with number 1021457 and acompany registered in England with number 2742969, whose registeredoffice is 215 Euston Road, London, NW1 2BE. From jw12 at sanger.ac.uk Tue Mar 1 16:33:51 2011 From: jw12 at sanger.ac.uk (Jonathan Warren) Date: Tue, 1 Mar 2011 16:33:51 +0000 Subject: [DAS] RELAX NG schema for "sequence" command In-Reply-To: <3E08E18F-CE0A-4B3F-A3FA-7A0EF5269E95@broadinstitute.org> References: <0F015AE7-09D7-4F4D-B547-353830FD82A2@broadinstitute.org> <563D9C80-555A-4866-9E74-F04FF5458E27@sanger.ac.uk> <1FD158A0-4032-48A2-AA86-383F9487BB52@broadinstitute.org> <2516E6D2-3584-49AB-8648-14DECF48205D@sanger.ac.uk> <3E08E18F-CE0A-4B3F-A3FA-7A0EF5269E95@broadinstitute.org> Message-ID: <2CA04BEC-F6A4-4AC4-B30C-06F0D647F235@sanger.ac.uk> If your data source is publicaly available you can test it by making a request like this: http://www.dasregistry.org/validate?url=http://das.sanger.ac.uk/das/otter_das/&testcode=22:19173435,19372173 which returns properties for the status of each command. Or you can use the validation page on the registry interface to do the same thing. Some people here are working on some XSD documents in order to use JAXB which will probably be of use to you as they should be structured more as you suggest. We maybe need to make it more clear how we use these RNG documents in the documentation. Thanks for the feedback! On 1 Mar 2011, at 16:20, Andrew Roberts wrote: > Ah, interesting. I'm exercising my implementations of the das > commands, and then validating the XML against the rng as a > correctness test. I had been assuming that the rng was a description > of all possible server responses, including error cases, but I now > see what you guys are using them for. > > Thanks! > > > > On Mar 1, 2011, at 10:54 AM, Jonathan Warren > wrote: > >> Hi Andrew >> >> I guess this depends on what you are using the RNG for. What are >> you using it for? The registry needs the result from a sequence >> request to conform to the sequence.rng as it is on the registry as >> if a sequence is requested the test region put into the registry >> should return a sequence, if it returns an ERRORSEGMENT the the >> registry wants that reported as invalid. If I add the errorsegment >> to that document as you suggest the registry won't pick it up as >> invalid. >> >> On 1 Mar 2011, at 15:19, Andrew Roberts wrote: >> >>> Alright, now I'm hitting the lack of support for ERRORSEGMENT >>> within the sequence command response. I think there needs to be a >>> choice inside the oneOrMore, so that elements can be SEQUENCE or >>> ERRORSEGMENT. I attached what I'm using as a testing schema in >>> the meantime, hopefully it's helpful. >>> >>> Andrew >>> >>> >>> >>> On Mar 1, 2011, at 5:09 AM, Jonathan Warren wrote: >>> >>>> This is fixed now. >>>> >>>> On 28 Feb 2011, at 18:34, Andrew Roberts wrote: >>>> >>>>> Hello, >>>>> >>>>> I'm working on some validations, and I think I may have found >>>>> an error in sequence.rng (www.dasregistry.org/validation1.6/sequence.rng) >>>>> . It appears that this schema makes no provision for the actual >>>>> text data that belongs to the sequence element (i.e. the bases >>>>> themselves). Is this a known issue? >>>>> >>>>> If not, I think the solution is to add a element >>>>> inside the sequence element. Line 10 should read, >>>>> >>>>> >>>> attribute>>>>> attribute>>>>> attribute>>>>> text> >>>>> >>>>> Cheers, >>>>> >>>>> Andrew >>>>> _______________________________________________ >>>>> DAS mailing list >>>>> DAS at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/das >>>> >>>> Jonathan Warren >>>> Senior Developer and DAS coordinator >>>> blog: http://biodasman.wordpress.com/ >>>> jw12 at sanger.ac.uk >>>> Ext: 2314 >>>> Telephone: 01223 492314 >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> -- >>>> The Wellcome Trust Sanger Institute is operated by Genome >>>> ResearchLimited, a charity registered in England with number >>>> 1021457 and acompany registered in England with number 2742969, >>>> whose registeredoffice is 215 Euston Road, London, NW1 2BE. >>> >> >> Jonathan Warren >> Senior Developer and DAS coordinator >> blog: http://biodasman.wordpress.com/ >> jw12 at sanger.ac.uk >> Ext: 2314 >> Telephone: 01223 492314 >> >> >> >> >> >> >> >> >> >> -- >> The Wellcome Trust Sanger Institute is operated by Genome >> ResearchLimited, a charity registered in England with number >> 1021457 and acompany registered in England with number 2742969, >> whose registeredoffice is 215 Euston Road, London, NW1 2BE. Jonathan Warren Senior Developer and DAS coordinator blog: http://biodasman.wordpress.com/ jw12 at sanger.ac.uk Ext: 2314 Telephone: 01223 492314 -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From thomas.a.down at gmail.com Sat Mar 5 15:01:42 2011 From: thomas.a.down at gmail.com (Thomas Down) Date: Sat, 5 Mar 2011 15:01:42 +0000 Subject: [DAS] Adjacent feature extension Message-ID: Following on from a discussion at the DAS Workshop: DAS doesn't have offer any specific support for clients that provide mechanisms for skipping from the current position to the "next" or "previous" feature. I'd like to propose a small extension (one extra filter option on the existing "features" command) to facilitate this. Full details are here: https://github.com/dasmoth/dalliance/wiki/AdjacentFeatures ...but briefly, a request like: /das/features?adjacent=chr21:30000000 ...would be expected to return a standard DASGFF document containing the two features either side of the specified point. All comments are welcome! One open question is whether the query should * just* return the adjacent features, or should also return other features overlapping the adjacent feature. My preference is for them former, but the latter does have the merit of being quirk-wards compatible with the existing feature_id filter. Does anyone else have strong feelings one way or another. There isn't currently a full implementation of this, but if nobody comes up with major objections, I'm hoping to try implementations in Dazzle (server) and Dalliance (client) within the next couple of weeks. Thomas. From andy.jenkinson at ebi.ac.uk Mon Mar 7 10:04:08 2011 From: andy.jenkinson at ebi.ac.uk (Andy Jenkinson) Date: Mon, 7 Mar 2011 10:04:08 +0000 Subject: [DAS] Adjacent feature extension In-Reply-To: References: Message-ID: <4CB1186B-D5F0-4750-A297-88538BB196B5@ebi.ac.uk> Hi Thomas, Thanks for this. Regarding the option of whether to return just one feature per side or all overlapping features, the only other advantage that immediately springs to mind for the latter (in addition to some measure of consistency, as you mention) is that it allows the client to immediately render the exact region of that feature without triggering another request. It would generally mean changing zoom level. I'm can't say if clients are likely to follow this mechanism as opposed to, say, pan and centre on the feature, but if they wanted to it would be more efficient (and possibly a little bit more efficient anyway depending on how your client does its requests). Disadvantages I can think of: - "adjacent" request takes marginally longer - not quite as obvious what clients should put in their UI controls - need to pick a feature to be able to do "jump to BRCA1" - risk of servers not implementing it correctly and only returning one feature anyway (although I don't think this is likely as the concept is different to "feature-by-id") Some things to further define: - servers can't return a fake feature - should servers return features on different reference sequences if there are none one the current one? - how should servers treat features that overlap the adjacent range? Treat them as the adjacent feature to return, or only include features completely outside the query range? What if the next feature completely outside the query range is part of the same feature hierarchy (e.g. an exon outside the current window). Any thoughts from anyone on these? Cheers, Andy On 5 Mar 2011, at 15:01, Thomas Down wrote: > Following on from a discussion at the DAS Workshop: > > DAS doesn't have offer any specific support for clients that provide > mechanisms for skipping from the current position to the "next" or > "previous" feature. I'd like to propose a small extension (one extra filter > option on the existing "features" command) to facilitate this. Full details > are here: > > https://github.com/dasmoth/dalliance/wiki/AdjacentFeatures > > ...but briefly, a request like: > > /das/features?adjacent=chr21:30000000 > > ...would be expected to return a standard DASGFF document containing the two > features either side of the specified point. > > All comments are welcome! One open question is whether the query should * > just* return the adjacent features, or should also return other features > overlapping the adjacent feature. My preference is for them former, but the > latter does have the merit of being quirk-wards compatible with the existing > feature_id filter. Does anyone else have strong feelings one way or > another. > > There isn't currently a full implementation of this, but if nobody comes up > with major objections, I'm hoping to try implementations in Dazzle (server) > and Dalliance (client) within the next couple of weeks. > > Thomas. > _______________________________________________ > DAS mailing list > DAS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das From thomas.a.down at gmail.com Mon Mar 7 10:35:02 2011 From: thomas.a.down at gmail.com (Thomas Down) Date: Mon, 7 Mar 2011 10:35:02 +0000 Subject: [DAS] Adjacent feature extension In-Reply-To: <4CB1186B-D5F0-4750-A297-88538BB196B5@ebi.ac.uk> References: <4CB1186B-D5F0-4750-A297-88538BB196B5@ebi.ac.uk> Message-ID: On Mon, Mar 7, 2011 at 10:04 AM, Andy Jenkinson wrote: > Hi Thomas, > > Thanks for this. Regarding the option of whether to return just one feature > per side or all overlapping features, the only other advantage that > immediately springs to mind for the latter (in addition to some measure of > consistency, as you mention) is that it allows the client to immediately > render the exact region of that feature without triggering another request. > It would generally mean changing zoom level. I'm can't say if clients are > likely to follow this mechanism as opposed to, say, pan and centre on the > feature, but if they wanted to it would be more efficient (and possibly a > little bit more efficient anyway depending on how your client does its > requests). > Yep, I agree. I'd be interested to learn whether there are any clients that would seriously consider taking advantage of this. My own thinking is that even if we do adjust zoom level (as Dalliance sometimes does, e.g. in the "jump to gene..." navigation op), clients are much more likely to zoom to a view that contains the target feature plus a "sensible" amount of flanking sequence, rather than a view where the target feature is perfectly framed. Furthermore, this rather seems like optimizing for the case where only one annotation source is active. Surely we're talking about the *distributed*annotation system, and clients will still have to go off and query all the other annotation sources, even if they are able to skip the one which responded to the "adjacent" query. So long as there's some kind of query parallelization in place, this probably isn't a performance issue. Do any other client developers feel differently? > Disadvantages I can think of: > - "adjacent" request takes marginally longer > - not quite as obvious what clients should put in their UI controls - need > to pick a feature to be able to do "jump to BRCA1" > - risk of servers not implementing it correctly and only returning one > feature anyway (although I don't think this is likely as the concept is > different to "feature-by-id") > > Some things to further define: > - servers can't return a fake feature > Yep, will clarify this. > - should servers return features on different reference sequences if there > are none one the current one? > In my opinion, absolutely yes. Otherwise the "10 features in the genome" case remains a massive pain (and potentially a disaster, for inhomogeneous-dstributed data; won't someone think of the MHC tiling arrays? :-). And even worse for the "10 features in UniProt" case (where I can also see this feature being quite interesting). I've tried to be explicit about this in my proposal (see the penultimate paragraph + example 3), but any suggestions for further clarifications are welcome. > - how should servers treat features that overlap the adjacent range? Treat > them as the adjacent feature to return, or only include features completely > outside the query range? What if the next feature completely outside the > query range is part of the same feature hierarchy (e.g. an exon outside the > current window). > It's a point rather than a range, but yes I agree this is still an open question. I'd actually written the spec such that overlapping features do get returned (on the assumption that clients will do "trivial" cases of next/previous feature in-memory without a network round trip), but again if other client developers do things differently, I'd like to know. I think "include overlapping" will have less special-cases to worry about, though. e.g. the PART/PARENT issue you allude to. Let clients deal with that ("dumb servers, smart clients"). Thomas. From ljgarcia at ebi.ac.uk Mon Mar 7 10:49:09 2011 From: ljgarcia at ebi.ac.uk (Leyla Garcia) Date: Mon, 07 Mar 2011 10:49:09 +0000 Subject: [DAS] Adjacent feature extension In-Reply-To: References: <4CB1186B-D5F0-4750-A297-88538BB196B5@ebi.ac.uk> Message-ID: <4D74B825.6080509@ebi.ac.uk> On 07/03/2011 10:35, Thomas Down wrote: > On Mon, Mar 7, 2011 at 10:04 AM, Andy Jenkinsonwrote: > >> Hi Thomas, >> >> Thanks for this. Regarding the option of whether to return just one feature >> per side or all overlapping features, the only other advantage that >> immediately springs to mind for the latter (in addition to some measure of >> consistency, as you mention) is that it allows the client to immediately >> render the exact region of that feature without triggering another request. >> It would generally mean changing zoom level. I'm can't say if clients are >> likely to follow this mechanism as opposed to, say, pan and centre on the >> feature, but if they wanted to it would be more efficient (and possibly a >> little bit more efficient anyway depending on how your client does its >> requests). >> > Yep, I agree. I'd be interested to learn whether there are any clients that > would seriously consider taking advantage of this. My own thinking is that > even if we do adjust zoom level (as Dalliance sometimes does, e.g. in the > "jump to gene..." navigation op), clients are much more likely to zoom to a > view that contains the target feature plus a "sensible" amount of flanking > sequence, rather than a view where the target feature is perfectly framed. > > Furthermore, this rather seems like optimizing for the case where only one > annotation source is active. Surely we're talking about the > *distributed*annotation system, and clients will still have to go off > and query all the > other annotation sources, even if they are able to skip the one which > responded to the "adjacent" query. So long as there's some kind of query > parallelization in place, this probably isn't a performance issue. > > Do any other client developers feel differently? So we can include or exclude the overlapping features. What about adding another filter? "adjacent" would retrieve only those around the feature requested and "overlapping" would returns the overlaps? Using both would be interpreted as OR. Does is make sense? Would that be interesting for people using this type of filters? >> Disadvantages I can think of: >> - "adjacent" request takes marginally longer >> - not quite as obvious what clients should put in their UI controls - need >> to pick a feature to be able to do "jump to BRCA1" >> - risk of servers not implementing it correctly and only returning one >> feature anyway (although I don't think this is likely as the concept is >> different to "feature-by-id") >> >> Some things to further define: >> - servers can't return a fake feature > Yep, will clarify this. > >> - should servers return features on different reference sequences if there >> are none one the current one? > In my opinion, absolutely yes. Otherwise the "10 features in the genome" > case remains a massive pain (and potentially a disaster, for > inhomogeneous-dstributed data; won't someone think of the MHC tiling arrays? > :-). And even worse for the "10 features in UniProt" case (where I can also > see this feature being quite interesting). Mmm, I do not understand it, "10 features in UniProt" case? "10 features in the genome" case? Could you please some more information about it? > I've tried to be explicit about this in my proposal (see the penultimate > paragraph + example 3), but any suggestions for further clarifications are > welcome. > > >> - how should servers treat features that overlap the adjacent range? Treat >> them as the adjacent feature to return, or only include features completely >> outside the query range? What if the next feature completely outside the >> query range is part of the same feature hierarchy (e.g. an exon outside the >> current window). >> > It's a point rather than a range, but yes I agree this is still an open > question. I'd actually written the spec such that overlapping features do > get returned (on the assumption that clients will do "trivial" cases of > next/previous feature in-memory without a network round trip), but again if > other client developers do things differently, I'd like to know. > > I think "include overlapping" will have less special-cases to worry about, > though. e.g. the PART/PARENT issue you allude to. Let clients deal with > that ("dumb servers, smart clients"). > > Thomas. > _______________________________________________ > DAS mailing list > DAS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das From jw12 at sanger.ac.uk Mon Mar 7 10:57:48 2011 From: jw12 at sanger.ac.uk (Jonathan Warren) Date: Mon, 7 Mar 2011 10:57:48 +0000 Subject: [DAS] Adjacent feature extension In-Reply-To: References: <4CB1186B-D5F0-4750-A297-88538BB196B5@ebi.ac.uk> Message-ID: On 7 Mar 2011, at 10:35, Thomas Down wrote: > On Mon, Mar 7, 2011 at 10:04 AM, Andy Jenkinson >wrote: > >> Hi Thomas, >> >> Thanks for this. Regarding the option of whether to return just one >> feature >> per side or all overlapping features, the only other advantage that >> immediately springs to mind for the latter (in addition to some >> measure of >> consistency, as you mention) is that it allows the client to >> immediately >> render the exact region of that feature without triggering another >> request. >> It would generally mean changing zoom level. I'm can't say if >> clients are >> likely to follow this mechanism as opposed to, say, pan and centre >> on the >> feature, but if they wanted to it would be more efficient (and >> possibly a >> little bit more efficient anyway depending on how your client does >> its >> requests). >> > > Yep, I agree. I'd be interested to learn whether there are any > clients that > would seriously consider taking advantage of this. My own thinking > is that > even if we do adjust zoom level (as Dalliance sometimes does, e.g. > in the > "jump to gene..." navigation op), clients are much more likely to > zoom to a > view that contains the target feature plus a "sensible" amount of > flanking > sequence, rather than a view where the target feature is perfectly > framed. > > Furthermore, this rather seems like optimizing for the case where > only one > annotation source is active. Surely we're talking about the > *distributed*annotation system, and clients will still have to go off > and query all the > other annotation sources, even if they are able to skip the one which > responded to the "adjacent" query. So long as there's some kind of > query > parallelization in place, this probably isn't a performance issue. My vote would ideally to change feature_by_id to return one feature and have the adjacent_feature as returning one feature. This in my opinion would mean these capabilities on servers do "exactly as they say on the tin" and would be easier to implement for data providers and are thus more likely to be implemented? If the feature_id capability as it stands is needed it could be changed to something more akin to what it means like feature_id_region but I would bet no one would bother to change it/use it? However the reality is that we are too late to change the old feature_by_id, but I don't think we need to make the same mistake twice by repeating it for adjacent_features? > > Do any other client developers feel differently? > > >> Disadvantages I can think of: >> - "adjacent" request takes marginally longer >> - not quite as obvious what clients should put in their UI controls >> - need >> to pick a feature to be able to do "jump to BRCA1" >> - risk of servers not implementing it correctly and only returning >> one >> feature anyway (although I don't think this is likely as the >> concept is >> different to "feature-by-id") >> >> Some things to further define: >> - servers can't return a fake feature >> > > Yep, will clarify this. > > >> - should servers return features on different reference sequences >> if there >> are none one the current one? >> > > In my opinion, absolutely yes. Otherwise the "10 features in the > genome" > case remains a massive pain (and potentially a disaster, for > inhomogeneous-dstributed data; won't someone think of the MHC tiling > arrays? > :-). And even worse for the "10 features in UniProt" case (where I > can also > see this feature being quite interesting). > > I've tried to be explicit about this in my proposal (see the > penultimate > paragraph + example 3), but any suggestions for further > clarifications are > welcome. > > >> - how should servers treat features that overlap the adjacent >> range? Treat >> them as the adjacent feature to return, or only include features >> completely >> outside the query range? What if the next feature completely >> outside the >> query range is part of the same feature hierarchy (e.g. an exon >> outside the >> current window). >> > > It's a point rather than a range, but yes I agree this is still an > open > question. I'd actually written the spec such that overlapping > features do > get returned (on the assumption that clients will do "trivial" cases > of > next/previous feature in-memory without a network round trip), but > again if > other client developers do things differently, I'd like to know. > > I think "include overlapping" will have less special-cases to worry > about, > though. e.g. the PART/PARENT issue you allude to. Let clients deal > with > that ("dumb servers, smart clients"). > > Thomas. > _______________________________________________ > DAS mailing list > DAS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das Jonathan Warren Senior Developer and DAS coordinator blog: http://biodasman.wordpress.com/ jw12 at sanger.ac.uk Ext: 2314 Telephone: 01223 492314 -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From andy.jenkinson at ebi.ac.uk Mon Mar 7 11:19:22 2011 From: andy.jenkinson at ebi.ac.uk (Andy Jenkinson) Date: Mon, 7 Mar 2011 11:19:22 +0000 Subject: [DAS] Adjacent feature extension In-Reply-To: References: <4CB1186B-D5F0-4750-A297-88538BB196B5@ebi.ac.uk>

Message-ID: On 7 Mar 2011, at 10:57, Jonathan Warren wrote: > > My vote would ideally to change feature_by_id to return one feature and have the adjacent_feature as returning one feature. This in my opinion would mean these capabilities on servers do "exactly as they say on the tin" and would be easier to implement for data providers and are thus more likely to be implemented? > If the feature_id capability as it stands is needed it could be changed to something more akin to what it means like feature_id_region but I would bet no one would bother to change it/use it? > > However the reality is that we are too late to change the old feature_by_id, but I don't think we need to make the same mistake twice by repeating it for adjacent_features? I disagree. I think the problems with feature-by-id are that a) the name of the capability implies singular, and b) the concept itself (i.e. getting a feature by its ID) is such a common operation that is otherwise missing in DAS. I don't think either of those apply to an "adjacent" capability unless you specifically choose to call it "adjacent-feature" as opposed to "adjacent-features". I honestly don't think a capability called "adjacent-features" with a query structure like "/das/features?adjacent=foo:1" implies singular, rather the opposite in fact. To me that query suggests "get me the features adjacent to foo:1". True that 2 features is plural which still leaves a "one feature either side" interpretation possible, but IMO certainly not implicit enough to stop anyone implementing it to actually read the specification/documentation. Add to that the fact that this is an entirely new behaviour that we have the chance to properly document and make it clear exactly what the server must do. So IMO we have a clear choice. As to feature-by-id, I know changing behaviour is potentially a very disruptive change, but I think we can potentially do this purely because servers don't tend to implement it correctly anyway. Clients can happily filter out any additional features returned by old servers, and if any clients are reliant on the server including all overlapping features then as far as I am concerned they are either a) targeting specific servers rather than DAS-wide and thus unaffected, or b) already broken :) I have to admit that the feature-by-id capability is one of the (many) things I loathe having to explain and would love to change it. Doing so would be consistent with what we were trying to do with 1.6 (i.e. rationalise existing use of the spec) but I chickened out really. Cheers, Andy From thomas.a.down at gmail.com Mon Mar 7 11:31:06 2011 From: thomas.a.down at gmail.com (Thomas Down) Date: Mon, 7 Mar 2011 11:31:06 +0000 Subject: [DAS] Adjacent feature extension In-Reply-To: <4D74B825.6080509@ebi.ac.uk> References: <4CB1186B-D5F0-4750-A297-88538BB196B5@ebi.ac.uk> <4D74B825.6080509@ebi.ac.uk> Message-ID: On Mon, Mar 7, 2011 at 10:49 AM, Leyla Garcia wrote: > > So we can include or exclude the overlapping features. What about adding > another filter? "adjacent" would retrieve only those around the feature > requested and "overlapping" would returns the overlaps? Using both would be > interpreted as OR. Does is make sense? > Would that be interesting for people using this type of filters? > How does the "overlapping" filter differ from the existing "segment" filter? The proposal in its current form states that segment and adjacent filters may be combined, and servers should return the union of features selected by the individual filters (e.g. a logical OR, just as you suggest). > In my opinion, absolutely yes. Otherwise the "10 features in the genome" >> case remains a massive pain (and potentially a disaster, for >> inhomogeneous-dstributed data; won't someone think of the MHC tiling >> arrays? >> :-). And even worse for the "10 features in UniProt" case (where I can >> also >> see this feature being quite interesting). >> > Mmm, I do not understand it, "10 features in UniProt" case? "10 features in > the genome" case? Could you please some more information about it? > Yes, think of a track that is very sparsely, and potentially inhomogenously, populated. Things like "imprinted regions in the human genome" (okay, there are more than 10 of those, but maybe no more than 100, and they're clustered). Or "my favorite rare protein domain". Scanning through such tracks by scrolling isn't going to be terribly helpful! One of the use cases I always have in the back of my mind when discussing stuff like this is the biologist who has done some kind of analysis, has 50 "interesting" regions of the genome (or proteome), and wants to flip through them quickly to eyeball, sanity check, and see if they can see any contextual patterns. I work all the time with people who want this, and sometimes do it myself. Up until now, I've tended to solve it by creating an HTML frameset with a list of regions in one frame and a visualization tool (either Ensembl or Dalliance) on the other. It's far from an optional solution. Allowing people to flick from region to region *within the browser* is far smoother. Thomas. From gsalazar at cs.uct.ac.za Mon Mar 7 11:16:01 2011 From: gsalazar at cs.uct.ac.za (Gustavo Salazar) Date: Mon, 7 Mar 2011 11:16:01 +0000 Subject: [DAS] Adjacent feature extension In-Reply-To: References: <4CB1186B-D5F0-4750-A297-88538BB196B5@ebi.ac.uk>

Message-ID: On 7 Mar 2011, at 10:57, Jonathan Warren wrote: > > On 7 Mar 2011, at 10:35, Thomas Down wrote: > >> On Mon, Mar 7, 2011 at 10:04 AM, Andy Jenkinson wrote: >> >>> Hi Thomas, >>> >>> Thanks for this. Regarding the option of whether to return just one feature >>> per side or all overlapping features, the only other advantage that >>> immediately springs to mind for the latter (in addition to some measure of >>> consistency, as you mention) is that it allows the client to immediately >>> render the exact region of that feature without triggering another request. >>> It would generally mean changing zoom level. I'm can't say if clients are >>> likely to follow this mechanism as opposed to, say, pan and centre on the >>> feature, but if they wanted to it would be more efficient (and possibly a >>> little bit more efficient anyway depending on how your client does its >>> requests). >>> >> >> Yep, I agree. I'd be interested to learn whether there are any clients that >> would seriously consider taking advantage of this. My own thinking is that >> even if we do adjust zoom level (as Dalliance sometimes does, e.g. in the >> "jump to gene..." navigation op), clients are much more likely to zoom to a >> view that contains the target feature plus a "sensible" amount of flanking >> sequence, rather than a view where the target feature is perfectly framed. >> >> Furthermore, this rather seems like optimizing for the case where only one >> annotation source is active. Surely we're talking about the >> *distributed*annotation system, and clients will still have to go off >> and query all the >> other annotation sources, even if they are able to skip the one which >> responded to the "adjacent" query. So long as there's some kind of query >> parallelization in place, this probably isn't a performance issue. > > My vote would ideally to change feature_by_id to return one feature and have the adjacent_feature as returning one feature. This in my opinion would mean these capabilities on servers do "exactly as they say on the tin" and would be easier to implement for data providers and are thus more likely to be implemented? > If the feature_id capability as it stands is needed it could be changed to something more akin to what it means like feature_id_region but I would bet no one would bother to change it/use it? > > However the reality is that we are too late to change the old feature_by_id, but I don't think we need to make the same mistake twice by repeating it for adjacent_features? I agree with Jonathan, feature_by_id sounds like it gets the feature by the requested Id, and to be honest is the way I have implemented before, so if you ask me I will say the adjacent capability should just return one feature. I don't think we are too late to change the old feature_by_id behaviour and we can take this as the opportunity to make such a change. > > >> >> Do any other client developers feel differently? >> >> >>> Disadvantages I can think of: >>> - "adjacent" request takes marginally longer >>> - not quite as obvious what clients should put in their UI controls - need >>> to pick a feature to be able to do "jump to BRCA1" >>> - risk of servers not implementing it correctly and only returning one >>> feature anyway (although I don't think this is likely as the concept is >>> different to "feature-by-id") >>> >>> Some things to further define: >>> - servers can't return a fake feature >>> >> >> Yep, will clarify this. >> >> >>> - should servers return features on different reference sequences if there >>> are none one the current one? >>> >> >> In my opinion, absolutely yes. Otherwise the "10 features in the genome" >> case remains a massive pain (and potentially a disaster, for >> inhomogeneous-dstributed data; won't someone think of the MHC tiling arrays? >> :-). And even worse for the "10 features in UniProt" case (where I can also >> see this feature being quite interesting). >> >> I've tried to be explicit about this in my proposal (see the penultimate >> paragraph + example 3), but any suggestions for further clarifications are >> welcome. >> >> >>> - how should servers treat features that overlap the adjacent range? Treat >>> them as the adjacent feature to return, or only include features completely >>> outside the query range? What if the next feature completely outside the >>> query range is part of the same feature hierarchy (e.g. an exon outside the >>> current window). >>> >> >> It's a point rather than a range, but yes I agree this is still an open >> question. I'd actually written the spec such that overlapping features do >> get returned (on the assumption that clients will do "trivial" cases of >> next/previous feature in-memory without a network round trip), but again if >> other client developers do things differently, I'd like to know. >> >> I think "include overlapping" will have less special-cases to worry about, >> though. e.g. the PART/PARENT issue you allude to. Let clients deal with >> that ("dumb servers, smart clients"). >> >> Thomas. >> _______________________________________________ >> DAS mailing list >> DAS at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/das > > Jonathan Warren > Senior Developer and DAS coordinator > blog: http://biodasman.wordpress.com/ > jw12 at sanger.ac.uk > Ext: 2314 > Telephone: 01223 492314 > > > > > > > > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome ResearchLimited, a charity registered in England with number 1021457 and acompany registered in England with number 2742969, whose registeredoffice is 215 Euston Road, London, NW1 2BE._______________________________________________ > DAS mailing list > DAS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das From jw12 at sanger.ac.uk Mon Mar 7 11:51:56 2011 From: jw12 at sanger.ac.uk (Jonathan Warren) Date: Mon, 7 Mar 2011 11:51:56 +0000 Subject: [DAS] Adjacent feature extension In-Reply-To: References: <4CB1186B-D5F0-4750-A297-88538BB196B5@ebi.ac.uk>

Message-ID: On 7 Mar 2011, at 11:19, Andy Jenkinson wrote: > On 7 Mar 2011, at 10:57, Jonathan Warren wrote: > >> >> My vote would ideally to change feature_by_id to return one feature >> and have the adjacent_feature as returning one feature. This in my >> opinion would mean these capabilities on servers do "exactly as >> they say on the tin" and would be easier to implement for data >> providers and are thus more likely to be implemented? >> If the feature_id capability as it stands is needed it could be >> changed to something more akin to what it means like >> feature_id_region but I would bet no one would bother to change it/ >> use it? >> >> However the reality is that we are too late to change the old >> feature_by_id, but I don't think we need to make the same mistake >> twice by repeating it for adjacent_features? > > I disagree. I think the problems with feature-by-id are that a) the > name of the capability implies singular, and b) the concept itself > (i.e. getting a feature by its ID) is such a common operation that > is otherwise missing in DAS. I don't think either of those apply to > an "adjacent" capability unless you specifically choose to call it > "adjacent-feature" as opposed to "adjacent-features". I honestly > don't think a capability called "adjacent-features" with a query > structure like "/das/features?adjacent=foo:1" implies singular, > rather the opposite in fact. To me that query suggests "get me the > features adjacent to foo:1". True that 2 features is plural which > still leaves a "one feature either side" interpretation possible, > but IMO certainly not implicit enough to stop anyone implementing it > to actually read the specification/documentation. Add to that the > fact that this is an entirely new behaviour that we have the chance > to properly document and make it clear exactly what the server must > do. > > So IMO we have a clear choice. I still think it's simpler to implement it for one feature either side and keep complexity in the client. Generally how many people stay wake after line 10 when reading the spec? :) Lets see if there are more votes... > > As to feature-by-id, I know changing behaviour is potentially a very > disruptive change, but I think we can potentially do this purely > because servers don't tend to implement it correctly anyway. Clients > can happily filter out any additional features returned by old > servers, and if any clients are reliant on the server including all > overlapping features then as far as I am concerned they are either > a) targeting specific servers rather than DAS-wide and thus > unaffected, or b) already broken :) So you agree feature-by_id should be changed if we have the stomach for it? - good and Gustavo too. Well done Andy - You have just agreed to write Spec 1.7 or 3??? ;) Your argument above can be used for leaving the spec as it is then as well - but ideally I agree and guess we can call it spec 1.61 assuming other people agree. > > I have to admit that the feature-by-id capability is one of the > (many) things I loathe having to explain and would love to change > it. Doing so would be consistent with what we were trying to do with > 1.6 (i.e. rationalise existing use of the spec) but I chickened out > really. > > Cheers, > Andy Jonathan Warren Senior Developer and DAS coordinator blog: http://biodasman.wordpress.com/ jw12 at sanger.ac.uk Ext: 2314 Telephone: 01223 492314 -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From thomas.a.down at gmail.com Mon Mar 7 12:01:15 2011 From: thomas.a.down at gmail.com (Thomas Down) Date: Mon, 7 Mar 2011 12:01:15 +0000 Subject: [DAS] Adjacent feature extension In-Reply-To: References: <4CB1186B-D5F0-4750-A297-88538BB196B5@ebi.ac.uk>

Message-ID: On Mon, Mar 7, 2011 at 11:51 AM, Jonathan Warren wrote: > As to feature-by-id, I know changing behaviour is potentially a very >> disruptive change, but I think we can potentially do this purely because >> servers don't tend to implement it correctly anyway. Clients can happily >> filter out any additional features returned by old servers, and if any >> clients are reliant on the server including all overlapping features then as >> far as I am concerned they are either a) targeting specific servers rather >> than DAS-wide and thus unaffected, or b) already broken :) >> > So you agree feature-by_id should be changed if we have the stomach for it? > - good and Gustavo too. Well done Andy - You have just agreed to write Spec > 1.7 or 3??? ;) Your argument above can be used for leaving the spec as it is > then as well - but ideally I agree and guess we can call it spec 1.61 > assuming other people agree. > I don't know quite how productive this particular worm-can will be, but FWIW I'd be happy to see it change. The current spec has caused significant hassle (with both my client- and server-dev hats on!). Thomas. PS. One thing I would ask is: can we discuss the merits of the feature-adjacent extension independently of any possible changes to the core spec. It's addressing some specific needs, and if we can get the last few wrinkles sorted out I'm very eager to get it finished and deployed soon. From jw12 at sanger.ac.uk Mon Mar 7 12:16:05 2011 From: jw12 at sanger.ac.uk (Jonathan Warren) Date: Mon, 7 Mar 2011 12:16:05 +0000 Subject: [DAS] Adjacent feature extension In-Reply-To: References: <4CB1186B-D5F0-4750-A297-88538BB196B5@ebi.ac.uk>

Message-ID: <6E8B83BF-8B65-41F8-B697-696431D5176E@sanger.ac.uk> On 7 Mar 2011, at 12:01, Thomas Down wrote: > > > On Mon, Mar 7, 2011 at 11:51 AM, Jonathan Warren > wrote: > As to feature-by-id, I know changing behaviour is potentially a very > disruptive change, but I think we can potentially do this purely > because servers don't tend to implement it correctly anyway. Clients > can happily filter out any additional features returned by old > servers, and if any clients are reliant on the server including all > overlapping features then as far as I am concerned they are either > a) targeting specific servers rather than DAS-wide and thus > unaffected, or b) already broken :) > So you agree feature-by_id should be changed if we have the stomach > for it? - good and Gustavo too. Well done Andy - You have just > agreed to write Spec 1.7 or 3??? ;) Your argument above can be used > for leaving the spec as it is then as well - but ideally I agree and > guess we can call it spec 1.61 assuming other people agree. > > I don't know quite how productive this particular worm-can will be, > but FWIW I'd be happy to see it change. The current spec has caused > significant hassle (with both my client- and server-dev hats on!). > > Thomas. > > PS. One thing I would ask is: can we discuss the merits of the > feature-adjacent extension independently of any possible changes to > the core spec. It's addressing some specific needs, and if we can > get the last few wrinkles sorted out I'm very eager to get it > finished and deployed soon. Fair enough, adjacent features would be an extension and so not affect the core spec. As you know a similar adjacent_feature proposal has been posted before to the list and yes we should definitely come up with a solution now. I'd say if we don't have any more objections in the next couple of days then go with your proposal as is? I'll then put support into the registry this week if that is the case. If you could also then copy the proposal from here https://github.com/dasmoth/dalliance/wiki/AdjacentFeatures to the extensions page here: http://www.biodas.org/wiki/DAS1.6E#Adjacent_Feature_filter noting in large letters that it was agreed by the community on such a such a date? Jonathan Warren Senior Developer and DAS coordinator blog: http://biodasman.wordpress.com/ jw12 at sanger.ac.uk Ext: 2314 Telephone: 01223 492314 -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From andy.jenkinson at ebi.ac.uk Mon Mar 7 12:43:32 2011 From: andy.jenkinson at ebi.ac.uk (Andy Jenkinson) Date: Mon, 7 Mar 2011 12:43:32 +0000 Subject: [DAS] Adjacent feature extension In-Reply-To: References: <4CB1186B-D5F0-4750-A297-88538BB196B5@ebi.ac.uk>

Message-ID: <4796FEEA-3D6A-43A4-A76D-2212E2F690B6@ebi.ac.uk> On 7 Mar 2011, at 11:51, Jonathan Warren wrote: > On 7 Mar 2011, at 11:19, Andy Jenkinson wrote: > >> On 7 Mar 2011, at 10:57, Jonathan Warren wrote: >> >>> >>> My vote would ideally to change feature_by_id to return one feature and have the adjacent_feature as returning one feature. This in my opinion would mean these capabilities on servers do "exactly as they say on the tin" and would be easier to implement for data providers and are thus more likely to be implemented? >>> If the feature_id capability as it stands is needed it could be changed to something more akin to what it means like feature_id_region but I would bet no one would bother to change it/use it? >>> >>> However the reality is that we are too late to change the old feature_by_id, but I don't think we need to make the same mistake twice by repeating it for adjacent_features? >> >> I disagree. I think the problems with feature-by-id are that a) the name of the capability implies singular, and b) the concept itself (i.e. getting a feature by its ID) is such a common operation that is otherwise missing in DAS. I don't think either of those apply to an "adjacent" capability unless you specifically choose to call it "adjacent-feature" as opposed to "adjacent-features". I honestly don't think a capability called "adjacent-features" with a query structure like "/das/features?adjacent=foo:1" implies singular, rather the opposite in fact. To me that query suggests "get me the features adjacent to foo:1". True that 2 features is plural which still leaves a "one feature either side" interpretation possible, but IMO certainly not implicit enough to stop anyone implementing it to actually read the specification/documentation. Add to that the fact that this is an entirely new behaviour that we have the chance to properly document and make it clear exactly what the server must do. >> >> So IMO we have a clear choice. > I still think it's simpler to implement it for one feature either side and keep complexity in the client. Generally how many people stay wake after line 10 when reading the spec? :) Lets see if there are more votes... It probably is simpler to implement (well, to implement with maximum efficiency) and I am not advocating one over the other, but IMO the implementation considerations are a separate part of our choice and are orthogonal to whether it's confusing for those implementing it and consequently whether we see divergence from the spec like we do with feature-by-id. As Gustavo says, he'd implement feature-by-id as one feature because that's what he thinks it means, not because it's difficult. I'd posit that it'd be a one line change for any server maintainer to fix theirs to implement it correctly (i.e. use the feature's start/end to resubmit the query), it's just that it'd be more complicated to do it in a single step from the beginning. We should be under no illusions though that people are going to be able to implement this easily without reading the documentation carefully, no matter which option is chosen. In particular, I can foresee servers not interpreting the "type" filter appropriately, being likely to process the adjacent query then apply the type filter, which would be wrong. I have a feeling most sources implement the type filter as a passive "post filter" rather than an active one. I can tell you right now that it is going to be really quite difficult for me to implement "adjacent" correctly for the ASTD gene/transcript/exon sources, and I suspect the same will be true for retrofitting lots of other sources. >> >> As to feature-by-id, I know changing behaviour is potentially a very disruptive change, but I think we can potentially do this purely because servers don't tend to implement it correctly anyway. Clients can happily filter out any additional features returned by old servers, and if any clients are reliant on the server including all overlapping features then as far as I am concerned they are either a) targeting specific servers rather than DAS-wide and thus unaffected, or b) already broken :) > So you agree feature-by_id should be changed if we have the stomach for it? - good and Gustavo too. Well done Andy - You have just agreed to write Spec 1.7 or 3??? ;) Your argument above can be used for leaving the spec as it is then as well - but ideally I agree and guess we can call it spec 1.61 assuming other people agree. I already have a small list of changes for DAS 1.7 or whatever and think it's fine for that context. In any case, let's keep these two issues separate as Thomas says. >> >> I have to admit that the feature-by-id capability is one of the (many) things I loathe having to explain and would love to change it. Doing so would be consistent with what we were trying to do with 1.6 (i.e. rationalise existing use of the spec) but I chickened out really. >> >> Cheers, >> Andy > > Jonathan Warren > Senior Developer and DAS coordinator > blog: http://biodasman.wordpress.com/ > jw12 at sanger.ac.uk > Ext: 2314 > Telephone: 01223 492314 > > > > > > > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome ResearchLimited, a charity registered in England with number 1021457 and acompany registered in England with number 2742969, whose registeredoffice is 215 Euston Road, London, NW1 2BE. From thomas.a.down at gmail.com Mon Mar 7 12:57:37 2011 From: thomas.a.down at gmail.com (Thomas Down) Date: Mon, 7 Mar 2011 12:57:37 +0000 Subject: [DAS] Adjacent feature extension In-Reply-To: <4796FEEA-3D6A-43A4-A76D-2212E2F690B6@ebi.ac.uk> References: <4CB1186B-D5F0-4750-A297-88538BB196B5@ebi.ac.uk>

<4796FEEA-3D6A-43A4-A76D-2212E2F690B6@ebi.ac.uk> Message-ID: On Mon, Mar 7, 2011 at 12:43 PM, Andy Jenkinson wrote: > We should be under no illusions though that people are going to be able to > implement this easily without reading the documentation carefully, no matter > which option is chosen. In particular, I can foresee servers not > interpreting the "type" filter appropriately, being likely to process the > adjacent query then apply the type filter, which would be wrong. I have a > feeling most sources implement the type filter as a passive "post filter" > rather than an active one. I can tell you right now that it is going to be > really quite difficult for me to implement "adjacent" correctly for the ASTD > gene/transcript/exon sources, and I suspect the same will be true for > retrofitting lots of other sources. > I appreciate this is tough. However, given the existence of UCSC-like "super-sources" containing basically unrelated types, with potentially quite different densities and dispositions, I find it hard to see how this is going to be useful without some kind of interaction with the types filter along the lines of what I've proposed. Thomas. From andy.jenkinson at ebi.ac.uk Mon Mar 7 13:11:46 2011 From: andy.jenkinson at ebi.ac.uk (Andy Jenkinson) Date: Mon, 7 Mar 2011 13:11:46 +0000 Subject: [DAS] Adjacent feature extension In-Reply-To: References: <4CB1186B-D5F0-4750-A297-88538BB196B5@ebi.ac.uk>

<4796FEEA-3D6A-43A4-A76D-2212E2F690B6@ebi.ac.uk> Message-ID: <31F4DDA4-EFE4-467F-A2BE-A65CACD8BDD4@ebi.ac.uk> On 7 Mar 2011, at 12:57, Thomas Down wrote: > On Mon, Mar 7, 2011 at 12:43 PM, Andy Jenkinson wrote: > We should be under no illusions though that people are going to be able to implement this easily without reading the documentation carefully, no matter which option is chosen. In particular, I can foresee servers not interpreting the "type" filter appropriately, being likely to process the adjacent query then apply the type filter, which would be wrong. I have a feeling most sources implement the type filter as a passive "post filter" rather than an active one. I can tell you right now that it is going to be really quite difficult for me to implement "adjacent" correctly for the ASTD gene/transcript/exon sources, and I suspect the same will be true for retrofitting lots of other sources. > > I appreciate this is tough. However, given the existence of UCSC-like "super-sources" containing basically unrelated types, with potentially quite different densities and dispositions, I find it hard to see how this is going to be useful without some kind of interaction with the types filter along the lines of what I've proposed. Absolutely. From jw12 at sanger.ac.uk Mon Mar 7 14:11:20 2011 From: jw12 at sanger.ac.uk (Jonathan Warren) Date: Mon, 7 Mar 2011 14:11:20 +0000 Subject: [DAS] Adjacent feature extension In-Reply-To: <4796FEEA-3D6A-43A4-A76D-2212E2F690B6@ebi.ac.uk> References: <4CB1186B-D5F0-4750-A297-88538BB196B5@ebi.ac.uk>

<4796FEEA-3D6A-43A4-A76D-2212E2F690B6@ebi.ac.uk> Message-ID: On 7 Mar 2011, at 12:43, Andy Jenkinson wrote: > On 7 Mar 2011, at 11:51, Jonathan Warren wrote: > >> On 7 Mar 2011, at 11:19, Andy Jenkinson wrote: >> >>> On 7 Mar 2011, at 10:57, Jonathan Warren wrote: >>> >>>> >>>> My vote would ideally to change feature_by_id to return one >>>> feature and have the adjacent_feature as returning one feature. >>>> This in my opinion would mean these capabilities on servers do >>>> "exactly as they say on the tin" and would be easier to implement >>>> for data providers and are thus more likely to be implemented? >>>> If the feature_id capability as it stands is needed it could be >>>> changed to something more akin to what it means like >>>> feature_id_region but I would bet no one would bother to change >>>> it/use it? >>>> >>>> However the reality is that we are too late to change the old >>>> feature_by_id, but I don't think we need to make the same mistake >>>> twice by repeating it for adjacent_features? >>> >>> I disagree. I think the problems with feature-by-id are that a) >>> the name of the capability implies singular, and b) the concept >>> itself (i.e. getting a feature by its ID) is such a common >>> operation that is otherwise missing in DAS. I don't think either >>> of those apply to an "adjacent" capability unless you specifically >>> choose to call it "adjacent-feature" as opposed to "adjacent- >>> features". I honestly don't think a capability called "adjacent- >>> features" with a query structure like "/das/features?adjacent=foo: >>> 1" implies singular, rather the opposite in fact. To me that query >>> suggests "get me the features adjacent to foo:1". True that 2 >>> features is plural which still leaves a "one feature either side" >>> interpretation possible, but IMO certainly not implicit enough to >>> stop anyone implementing it to actually read the specification/ >>> documentation. Add to that the fact that this is an entirely new >>> behaviour that we have the chance to properly document and make it >>> clear exactly what the server must do. >>> >>> So IMO we have a clear choice. >> I still think it's simpler to implement it for one feature either >> side and keep complexity in the client. Generally how many people >> stay wake after line 10 when reading the spec? :) Lets see if there >> are more votes... > > It probably is simpler to implement (well, to implement with maximum > efficiency) and I am not advocating one over the other, but IMO the > implementation considerations are a separate part of our choice and > are orthogonal to whether it's confusing for those implementing it > and consequently whether we see divergence from the spec like we do > with feature-by-id. As Gustavo says, he'd implement feature-by-id as > one feature because that's what he thinks it means, not because it's > difficult. I'd posit that it'd be a one line change for any server > maintainer to fix theirs to implement it correctly (i.e. use the > feature's start/end to resubmit the query), it's just that it'd be > more complicated to do it in a single step from the beginning. > > We should be under no illusions though that people are going to be > able to implement this easily without reading the documentation > carefully, no matter which option is chosen. Good template methods and or examples in tutorials examples will encourage use of this command. > In particular, I can foresee servers not interpreting the "type" > filter appropriately, being likely to process the adjacent query > then apply the type filter, which would be wrong. I have a feeling > most sources implement the type filter as a passive "post filter" > rather than an active one. I can tell you right now that it is going > to be really quite difficult for me to implement "adjacent" > correctly for the ASTD gene/transcript/exon sources, and I suspect > the same will be true for retrofitting lots of other sources. This is an optional capability though right? > >>> >>> As to feature-by-id, I know changing behaviour is potentially a >>> very disruptive change, but I think we can potentially do this >>> purely because servers don't tend to implement it correctly >>> anyway. Clients can happily filter out any additional features >>> returned by old servers, and if any clients are reliant on the >>> server including all overlapping features then as far as I am >>> concerned they are either a) targeting specific servers rather >>> than DAS-wide and thus unaffected, or b) already broken :) >> So you agree feature-by_id should be changed if we have the stomach >> for it? - good and Gustavo too. Well done Andy - You have just >> agreed to write Spec 1.7 or 3??? ;) Your argument above can be used >> for leaving the spec as it is then as well - but ideally I agree >> and guess we can call it spec 1.61 assuming other people agree. > > I already have a small list of changes for DAS 1.7 or whatever and > think it's fine for that context. In any case, let's keep these two > issues separate as Thomas says. I was really hoping not to do another major spec revision for at least 3 years and to focus on extensions giving new capabilities- otherwise for the core capabilities everyone is always playing catch up! This maybe something to discuss at some point soon. > >>> >>> I have to admit that the feature-by-id capability is one of the >>> (many) things I loathe having to explain and would love to change >>> it. Doing so would be consistent with what we were trying to do >>> with 1.6 (i.e. rationalise existing use of the spec) but I >>> chickened out really. >>> >>> Cheers, >>> Andy >> >> Jonathan Warren >> Senior Developer and DAS coordinator >> blog: http://biodasman.wordpress.com/ >> jw12 at sanger.ac.uk >> Ext: 2314 >> Telephone: 01223 492314 >> >> >> >> >> >> >> >> >> >> -- >> The Wellcome Trust Sanger Institute is operated by Genome >> ResearchLimited, a charity registered in England with number >> 1021457 and acompany registered in England with number 2742969, >> whose registeredoffice is 215 Euston Road, London, NW1 2BE. > Jonathan Warren Senior Developer and DAS coordinator blog: http://biodasman.wordpress.com/ jw12 at sanger.ac.uk Ext: 2314 Telephone: 01223 492314 -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From thomas.a.down at gmail.com Mon Mar 7 14:27:12 2011 From: thomas.a.down at gmail.com (Thomas Down) Date: Mon, 7 Mar 2011 14:27:12 +0000 Subject: [DAS] Adjacent feature extension In-Reply-To: References: <4CB1186B-D5F0-4750-A297-88538BB196B5@ebi.ac.uk>

<4796FEEA-3D6A-43A4-A76D-2212E2F690B6@ebi.ac.uk> Message-ID: On Mon, Mar 7, 2011 at 2:11 PM, Jonathan Warren wrote: > In particular, I can foresee servers not interpreting the "type" filter >> appropriately, being likely to process the adjacent query then apply the >> type filter, which would be wrong. I have a feeling most sources implement >> the type filter as a passive "post filter" rather than an active one. I can >> tell you right now that it is going to be really quite difficult for me to >> implement "adjacent" correctly for the ASTD gene/transcript/exon sources, >> and I suspect the same will be true for retrofitting lots of other sources. >> > This is an optional capability though right? > That is definitely the intent of my proposal. Even if this eventually gets rolled into the spec., my suggestion would be to keep it optional (negotiated via a CAPABILITY). Thomas. From thomas.a.down at gmail.com Mon Mar 7 14:41:31 2011 From: thomas.a.down at gmail.com (Thomas Down) Date: Mon, 7 Mar 2011 14:41:31 +0000 Subject: [DAS] Locations between bases Message-ID: The day has come when I find I need to be able to distinguish between a feature which actually *covers* one or a small number of bases (e.g. a SNP) and a feature which refers to a point between two bases (e.g. an insertion). Have any other DAS folk dealt with this before, and if so how? One possible approach would be something like: 30000000 29999999 ...as a way to reference the position between the 29999999th and 30000000th bases in a sequence... but is this going to break lots of existing client code? (I also rather dislike it because there's currently sanity-checking code in Dazzle to stop you doing stuff like this...) Any better ideas? Thomas. PS. Yes, I'm aware this is a solved problem in DAS/2. Afraid I need a DAS/1 solution, though. From andy.jenkinson at ebi.ac.uk Mon Mar 7 15:04:32 2011 From: andy.jenkinson at ebi.ac.uk (Andy Jenkinson) Date: Mon, 7 Mar 2011 15:04:32 +0000 Subject: [DAS] Adjacent feature extension In-Reply-To: <6E8B83BF-8B65-41F8-B697-696431D5176E@sanger.ac.uk> References: <4CB1186B-D5F0-4750-A297-88538BB196B5@ebi.ac.uk>

<6E8B83BF-8B65-41F8-B697-696431D5176E@sanger.ac.uk> Message-ID: On 7 Mar 2011, at 12:16, Jonathan Warren wrote: > I'd say if we don't have any more objections in the next couple of days then go with your proposal as is? I'll then put support into the registry this week if that is the case. If you could also then copy the proposal from here https://github.com/dasmoth/dalliance/wiki/AdjacentFeatures to the extensions page here: > http://www.biodas.org/wiki/DAS1.6E#Adjacent_Feature_filter noting in large letters that it was agreed by the community on such a such a date? I think there is a lot left to be clarified so adopting it "as is" is a no go for me. In particular, take a look at this diagram and see if you can work out what will be returned with "adjacent" queries for either side of the viewing area, and do they make sense for what the client is trying to achieve? -------------- next part -------------- A non-text attachment was scrubbed... Name: DAS-Adjacent.png Type: image/png Size: 40385 bytes Desc: not available URL: -------------- next part -------------- The client has "seen" gene 2 and all its parts. If the client asks for features adjacent to the left/right sides of the viewing area, what should the server return? To the left: gene 1, transcript 1, exon 2, or SNP 1? Does it matter? Should gene 1 and all its parts be returned because that is what happens in a segment query? If not, is this confusing for the spec? To the right: gene 2, transcript 2, exon 4, transcript 3, exon 5, or SNP 2? If no special arrangements are made for excluding overlapping features (thus either gene 2, transcript 2 or exon 4 are returned above), what position should the client submit instead in its overlap query to get SNP 2 and SNP 3? (Hint: it's impossible to get SNP 2). What if the genes were nonpositional features? None of the above are unresolvable problems, the simplest way it to say that overlapping features should be returned, and that clients should not try to jump beyond them. It means you don't really get a "next feature" capability in the same way a user probably intends (i.e. "next gene"), but does make it impossible to miss transcript 3 and SNP 2 in the above diagram. It still needs to be defined which feature will be returned if multiple have the same position that case (ideally we want the one that extends the furthers in the adjacent direction), but that can be complicated by things like nonpositional parent features etc. The alternative is to exclude overlapping features, but that raises the question of whether parts are considered overlapping if their parents are. Implementing "next gene" means excluding all of gene 2's transcripts and exons (thus returning SNP 2), but is this even what the user meant? Maybe they meant "next transcript". So not simple. But the main questions are: 1. should overlapping features be excluded in overlap requests 2. if so, should non-overlapping features with overlapping parents/parts be excluded? 3. separate from 1 and 2, should the nearest feature's parents and parts also be returned? From andy.jenkinson at ebi.ac.uk Mon Mar 7 15:12:32 2011 From: andy.jenkinson at ebi.ac.uk (Andy Jenkinson) Date: Mon, 7 Mar 2011 15:12:32 +0000 Subject: [DAS] Adjacent feature extension In-Reply-To: References: <4CB1186B-D5F0-4750-A297-88538BB196B5@ebi.ac.uk>

<4796FEEA-3D6A-43A4-A76D-2212E2F690B6@ebi.ac.uk> Message-ID: On 7 Mar 2011, at 14:11, Jonathan Warren wrote: > > On 7 Mar 2011, at 12:43, Andy Jenkinson wrote: > >> In particular, I can foresee servers not interpreting the "type" filter appropriately, being likely to process the adjacent query then apply the type filter, which would be wrong. I have a feeling most sources implement the type filter as a passive "post filter" rather than an active one. I can tell you right now that it is going to be really quite difficult for me to implement "adjacent" correctly for the ASTD gene/transcript/exon sources, and I suspect the same will be true for retrofitting lots of other sources. > This is an optional capability though right? Of course, I'm just saying it's difficult for servers to implement so we should expect to see bad implementations exactly as we do with feature-by-id. >> >>>> >>>> As to feature-by-id, I know changing behaviour is potentially a very disruptive change, but I think we can potentially do this purely because servers don't tend to implement it correctly anyway. Clients can happily filter out any additional features returned by old servers, and if any clients are reliant on the server including all overlapping features then as far as I am concerned they are either a) targeting specific servers rather than DAS-wide and thus unaffected, or b) already broken :) >>> So you agree feature-by_id should be changed if we have the stomach for it? - good and Gustavo too. Well done Andy - You have just agreed to write Spec 1.7 or 3??? ;) Your argument above can be used for leaving the spec as it is then as well - but ideally I agree and guess we can call it spec 1.61 assuming other people agree. >> >> I already have a small list of changes for DAS 1.7 or whatever and think it's fine for that context. In any case, let's keep these two issues separate as Thomas says. > > I was really hoping not to do another major spec revision for at least 3 years and to focus on extensions giving new capabilities- otherwise for the core capabilities everyone is always playing catch up! This maybe something to discuss at some point soon. >> >>>> >>>> I have to admit that the feature-by-id capability is one of the (many) things I loathe having to explain and would love to change it. Doing so would be consistent with what we were trying to do with 1.6 (i.e. rationalise existing use of the spec) but I chickened out really. >>>> >>>> Cheers, >>>> Andy >>> >>> Jonathan Warren >>> Senior Developer and DAS coordinator >>> blog: http://biodasman.wordpress.com/ >>> jw12 at sanger.ac.uk >>> Ext: 2314 >>> Telephone: 01223 492314 >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> -- >>> The Wellcome Trust Sanger Institute is operated by Genome ResearchLimited, a charity registered in England with number 1021457 and acompany registered in England with number 2742969, whose registeredoffice is 215 Euston Road, London, NW1 2BE. >> > > Jonathan Warren > Senior Developer and DAS coordinator > blog: http://biodasman.wordpress.com/ > jw12 at sanger.ac.uk > Ext: 2314 > Telephone: 01223 492314 > > > > > > > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome ResearchLimited, a charity registered in England with number 1021457 and acompany registered in England with number 2742969, whose registeredoffice is 215 Euston Road, London, NW1 2BE. From andy.jenkinson at ebi.ac.uk Mon Mar 7 15:21:59 2011 From: andy.jenkinson at ebi.ac.uk (Andy Jenkinson) Date: Mon, 7 Mar 2011 15:21:59 +0000 Subject: [DAS] Locations between bases In-Reply-To: References: Message-ID: Oh Thomas, why do you do it to us? May I ask what the specific application is? Depending on what you need, an insertion can be modelled with a feature covering both bases with a specific glyph (or type). I think what you propose is a common approach, but it is very likely going to break things for some clients (or at least give undefined results). Some additional element would be more compatible. On 7 Mar 2011, at 14:41, Thomas Down wrote: > The day has come when I find I need to be able to distinguish between a > feature which actually *covers* one or a small number of bases (e.g. a SNP) > and a feature which refers to a point between two bases (e.g. an > insertion). Have any other DAS folk dealt with this before, and if so how? > One possible approach would be something like: > > 30000000 > 29999999 > > ...as a way to reference the position between the 29999999th and 30000000th > bases in a sequence... but is this going to break lots of existing client > code? > > (I also rather dislike it because there's currently sanity-checking code in > Dazzle to stop you doing stuff like this...) > > Any better ideas? > > Thomas. > > PS. Yes, I'm aware this is a solved problem in DAS/2. Afraid I need a DAS/1 > solution, though. > _______________________________________________ > DAS mailing list > DAS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das From birney at ebi.ac.uk Mon Mar 7 15:26:59 2011 From: birney at ebi.ac.uk (Ewan Birney) Date: Mon, 7 Mar 2011 15:26:59 +0000 Subject: [DAS] Locations between bases In-Reply-To: References:

Message-ID: I also note that this is stretching DAS's mindset away from "DAS is to support visualisation of data on clients for human users to read" to "DAS semantically represents things accurately for programmatic decisions" I think one is better off keeping these things separate. On 7 Mar 2011, at 15:21, Andy Jenkinson wrote: > Oh Thomas, why do you do it to us? > > May I ask what the specific application is? Depending on what you > need, an insertion can be modelled with a feature covering both > bases with a specific glyph (or type). > > I think what you propose is a common approach, but it is very likely > going to break things for some clients (or at least give undefined > results). Some additional element would be more compatible. > > On 7 Mar 2011, at 14:41, Thomas Down wrote: > >> The day has come when I find I need to be able to distinguish >> between a >> feature which actually *covers* one or a small number of bases >> (e.g. a SNP) >> and a feature which refers to a point between two bases (e.g. an >> insertion). Have any other DAS folk dealt with this before, and if >> so how? >> One possible approach would be something like: >> >> 30000000 >> 29999999 >> >> ...as a way to reference the position between the 29999999th and >> 30000000th >> bases in a sequence... but is this going to break lots of existing >> client >> code? >> >> (I also rather dislike it because there's currently sanity-checking >> code in >> Dazzle to stop you doing stuff like this...) >> >> Any better ideas? >> >> Thomas. >> >> PS. Yes, I'm aware this is a solved problem in DAS/2. Afraid I >> need a DAS/1 >> solution, though. >> _______________________________________________ >> DAS mailing list >> DAS at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/das > > > _______________________________________________ > DAS mailing list > DAS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das From jw12 at sanger.ac.uk Mon Mar 7 15:49:12 2011 From: jw12 at sanger.ac.uk (Jonathan Warren) Date: Mon, 7 Mar 2011 15:49:12 +0000 Subject: [DAS] Adjacent feature extension In-Reply-To: References: <4CB1186B-D5F0-4750-A297-88538BB196B5@ebi.ac.uk>

<6E8B83BF-8B65-41F8-B697-696431D5176E@sanger.ac.uk> Message-ID: <2869357C-159E-4D2D-A169-B91D63E31680@sanger.ac.uk> On 7 Mar 2011, at 15:04, Andy Jenkinson wrote: > On 7 Mar 2011, at 12:16, Jonathan Warren wrote: > >> I'd say if we don't have any more objections in the next couple of >> days then go with your proposal as is? I'll then put support into >> the registry this week if that is the case. If you could also then >> copy the proposal from here https://github.com/dasmoth/dalliance/wiki/AdjacentFeatures >> to the extensions page here: >> http://www.biodas.org/wiki/DAS1.6E#Adjacent_Feature_filter noting >> in large letters that it was agreed by the community on such a such >> a date? > > I think there is a lot left to be clarified so adopting it "as is" > is a no go for me. In particular, take a look at this diagram and > see if you can work out what will be returned with "adjacent" > queries for either side of the viewing area, and do they make sense > for what the client is trying to achieve? > > > The client has "seen" gene 2 and all its parts. > > If the client asks for features adjacent to the left/right sides of > the viewing area, what should the server return? I don't think it makes sense to ask for a next right in this case as there are features here already. This is for sparse data sources so it's ok just to return whats there if someone specifically wants to hit the next feature button or a client can blank the next right button out. It's up to the client. Next left should return SNP1 if asked for an adjacent request.... or genes and constituents if filtered on gene. If you take the intention of this as finding features where data is sparse then I don't think there are big issues. Part of the point of the extensions phase is to try these things out with examples and refine the specs. To leave acceptance of this will be a big mistake in my view. > To the left: gene 1, transcript 1, exon 2, or SNP 1? Does it matter? > Should gene 1 and all its parts be returned because that is what > happens in a segment query? If not, is this confusing for the spec? > To the right: gene 2, transcript 2, exon 4, transcript 3, exon 5, or > SNP 2? > > If no special arrangements are made for excluding overlapping > features (thus either gene 2, transcript 2 or exon 4 are returned > above), what position should the client submit instead in its > overlap query to get SNP 2 and SNP 3? (Hint: it's impossible to get > SNP 2). > > What if the genes were nonpositional features? > > > None of the above are unresolvable problems, the simplest way it to > say that overlapping features should be returned, and that clients > should not try to jump beyond them. It means you don't really get a > "next feature" capability in the same way a user probably intends > (i.e. "next gene"), but does make it impossible to miss transcript 3 > and SNP 2 in the above diagram. It still needs to be defined which > feature will be returned if multiple have the same position that > case (ideally we want the one that extends the furthers in the > adjacent direction), but that can be complicated by things like > nonpositional parent features etc. > > The alternative is to exclude overlapping features, but that raises > the question of whether parts are considered overlapping if their > parents are. Implementing "next gene" means excluding all of gene > 2's transcripts and exons (thus returning SNP 2), but is this even > what the user meant? Maybe they meant "next transcript". > > So not simple. But the main questions are: > 1. should overlapping features be excluded in overlap requests > 2. if so, should non-overlapping features with overlapping parents/ > parts be excluded? > 3. separate from 1 and 2, should the nearest feature's parents and > parts also be returned? Jonathan Warren Senior Developer and DAS coordinator blog: http://biodasman.wordpress.com/ jw12 at sanger.ac.uk Ext: 2314 Telephone: 01223 492314 -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From thomas.a.down at gmail.com Mon Mar 7 16:03:50 2011 From: thomas.a.down at gmail.com (Thomas Down) Date: Mon, 7 Mar 2011 16:03:50 +0000 Subject: [DAS] Locations between bases In-Reply-To: References:

Message-ID: Well, that's a separate debate, and I don't really want to open it at the moment. However, right now I *am* thinking quite specifically of visualization issues. Once you're zoomed in to base-pair resolution, there is a meaningful distinction between "on a base" and "between a base". Andy: what I'm trying to model is insertion sites. I guess that's the main use for this kind of thing. Thomas. On Mon, Mar 7, 2011 at 3:26 PM, Ewan Birney wrote: > > I also note that this is stretching DAS's mindset away from > > "DAS is to support visualisation of data on clients for human users to > read" > > to > > "DAS semantically represents things accurately for programmatic > decisions" > > > I think one is better off keeping these things separate. > > > > On 7 Mar 2011, at 15:21, Andy Jenkinson wrote: > > Oh Thomas, why do you do it to us? >> >> May I ask what the specific application is? Depending on what you need, an >> insertion can be modelled with a feature covering both bases with a specific >> glyph (or type). >> >> I think what you propose is a common approach, but it is very likely going >> to break things for some clients (or at least give undefined results). Some >> additional element would be more compatible. >> >> On 7 Mar 2011, at 14:41, Thomas Down wrote: >> >> The day has come when I find I need to be able to distinguish between a >>> feature which actually *covers* one or a small number of bases (e.g. a >>> SNP) >>> and a feature which refers to a point between two bases (e.g. an >>> insertion). Have any other DAS folk dealt with this before, and if so >>> how? >>> One possible approach would be something like: >>> >>> 30000000 >>> 29999999 >>> >>> ...as a way to reference the position between the 29999999th and >>> 30000000th >>> bases in a sequence... but is this going to break lots of existing client >>> code? >>> >>> (I also rather dislike it because there's currently sanity-checking code >>> in >>> Dazzle to stop you doing stuff like this...) >>> >>> Any better ideas? >>> >>> Thomas. >>> >>> PS. Yes, I'm aware this is a solved problem in DAS/2. Afraid I need a >>> DAS/1 >>> solution, though. >>> _______________________________________________ >>> DAS mailing list >>> DAS at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/das >>> >> >> >> _______________________________________________ >> DAS mailing list >> DAS at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/das >> > > From birney at ebi.ac.uk Mon Mar 7 16:07:15 2011 From: birney at ebi.ac.uk (Ewan Birney) Date: Mon, 7 Mar 2011 16:07:15 +0000 Subject: [DAS] Locations between bases In-Reply-To: References:

Message-ID: <3D11AC15-23B4-4AF1-AF90-136AFAF1540A@ebi.ac.uk> I'd say this is start=end, and a convention that this indicates the point to the left or the right with a glyph. With this converntion for the edge case one either has to allow the 0 base (right based insertion convention, insertion at the very start) or end+1 position (left based insertion convention, insertion at the every end). I know, this triggers doing things on inter-base coordinates, but that's just a big, big change. On 7 Mar 2011, at 16:03, Thomas Down wrote: > Well, that's a separate debate, and I don't really want to open it > at the moment. > > However, right now I am thinking quite specifically of visualization > issues. Once you're zoomed in to base-pair resolution, there is a > meaningful distinction between "on a base" and "between a base". > > Andy: what I'm trying to model is insertion sites. I guess that's > the main use for this kind of thing. > > Thomas. > > > > On Mon, Mar 7, 2011 at 3:26 PM, Ewan Birney wrote: > > I also note that this is stretching DAS's mindset away from > > "DAS is to support visualisation of data on clients for human > users to read" > > to > > "DAS semantically represents things accurately for programmatic > decisions" > > > I think one is better off keeping these things separate. > > > > On 7 Mar 2011, at 15:21, Andy Jenkinson wrote: > > Oh Thomas, why do you do it to us? > > May I ask what the specific application is? Depending on what you > need, an insertion can be modelled with a feature covering both > bases with a specific glyph (or type). > > I think what you propose is a common approach, but it is very likely > going to break things for some clients (or at least give undefined > results). Some additional element would be more compatible. > > On 7 Mar 2011, at 14:41, Thomas Down wrote: > > The day has come when I find I need to be able to distinguish > between a > feature which actually *covers* one or a small number of bases (e.g. > a SNP) > and a feature which refers to a point between two bases (e.g. an > insertion). Have any other DAS folk dealt with this before, and if > so how? > One possible approach would be something like: > > 30000000 > 29999999 > > ...as a way to reference the position between the 29999999th and > 30000000th > bases in a sequence... but is this going to break lots of existing > client > code? > > (I also rather dislike it because there's currently sanity-checking > code in > Dazzle to stop you doing stuff like this...) > > Any better ideas? > > Thomas. > > PS. Yes, I'm aware this is a solved problem in DAS/2. Afraid I need > a DAS/1 > solution, though. > _______________________________________________ > DAS mailing list > DAS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das > > > _______________________________________________ > DAS mailing list > DAS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das > > From thomas.a.down at gmail.com Mon Mar 7 16:10:35 2011 From: thomas.a.down at gmail.com (Thomas Down) Date: Mon, 7 Mar 2011 16:10:35 +0000 Subject: [DAS] Locations between bases In-Reply-To: <3D11AC15-23B4-4AF1-AF90-136AFAF1540A@ebi.ac.uk> References:

<3D11AC15-23B4-4AF1-AF90-136AFAF1540A@ebi.ac.uk> Message-ID: