From jw12 at sanger.ac.uk Tue Mar 1 05:09:28 2011
From: jw12 at sanger.ac.uk (Jonathan Warren)
Date: Tue, 1 Mar 2011 10:09:28 +0000
Subject: [DAS] RELAX NG schema for "sequence" command
In-Reply-To: <0F015AE7-09D7-4F4D-B547-353830FD82A2@broadinstitute.org>
References: <0F015AE7-09D7-4F4D-B547-353830FD82A2@broadinstitute.org>
Message-ID: <563D9C80-555A-4866-9E74-F04FF5458E27@sanger.ac.uk>
This is fixed now.
On 28 Feb 2011, at 18:34, Andrew Roberts wrote:
> Hello,
>
> I'm working on some validations, and I think I may have found an
> error in sequence.rng (www.dasregistry.org/validation1.6/
> sequence.rng). It appears that this schema makes no provision for
> the actual text data that belongs to the sequence element (i.e. the
> bases themselves). Is this a known issue?
>
> If not, I think the solution is to add a element inside the
> sequence element. Line 10 should read,
>
>
> attribute>
> attribute>
> attribute>
> text>
>
> Cheers,
>
> Andrew
> _______________________________________________
> DAS mailing list
> DAS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/das
Jonathan Warren
Senior Developer and DAS coordinator
blog: http://biodasman.wordpress.com/
jw12 at sanger.ac.uk
Ext: 2314
Telephone: 01223 492314
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
From aroberts at broadinstitute.org Tue Mar 1 10:19:13 2011
From: aroberts at broadinstitute.org (Andrew Roberts)
Date: Tue, 1 Mar 2011 10:19:13 -0500
Subject: [DAS] RELAX NG schema for "sequence" command
In-Reply-To: <563D9C80-555A-4866-9E74-F04FF5458E27@sanger.ac.uk>
References: <0F015AE7-09D7-4F4D-B547-353830FD82A2@broadinstitute.org>
<563D9C80-555A-4866-9E74-F04FF5458E27@sanger.ac.uk>
Message-ID: <1FD158A0-4032-48A2-AA86-383F9487BB52@broadinstitute.org>
Alright, now I'm hitting the lack of support for ERRORSEGMENT within
the sequence command response. I think there needs to be a choice
inside the oneOrMore, so that elements can be SEQUENCE or
ERRORSEGMENT. I attached what I'm using as a testing schema in the
meantime, hopefully it's helpful.
Andrew
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sequence.rng
Type: application/octet-stream
Size: 1267 bytes
Desc: not available
URL:
-------------- next part --------------
On Mar 1, 2011, at 5:09 AM, Jonathan Warren wrote:
> This is fixed now.
>
> On 28 Feb 2011, at 18:34, Andrew Roberts wrote:
>
>> Hello,
>>
>> I'm working on some validations, and I think I may have found an
>> error in sequence.rng (www.dasregistry.org/validation1.6/sequence.rng)
>> . It appears that this schema makes no provision for the actual
>> text data that belongs to the sequence element (i.e. the bases
>> themselves). Is this a known issue?
>>
>> If not, I think the solution is to add a element inside
>> the sequence element. Line 10 should read,
>>
>>
>> attribute>
>> attribute>
>> attribute>
>> text>
>>
>> Cheers,
>>
>> Andrew
>> _______________________________________________
>> DAS mailing list
>> DAS at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/das
>
> Jonathan Warren
> Senior Developer and DAS coordinator
> blog: http://biodasman.wordpress.com/
> jw12 at sanger.ac.uk
> Ext: 2314
> Telephone: 01223 492314
>
>
>
>
>
>
>
>
>
> --
> The Wellcome Trust Sanger Institute is operated by Genome
> ResearchLimited, a charity registered in England with number 1021457
> and acompany registered in England with number 2742969, whose
> registeredoffice is 215 Euston Road, London, NW1 2BE.
From jw12 at sanger.ac.uk Tue Mar 1 10:54:05 2011
From: jw12 at sanger.ac.uk (Jonathan Warren)
Date: Tue, 1 Mar 2011 15:54:05 +0000
Subject: [DAS] RELAX NG schema for "sequence" command
In-Reply-To: <1FD158A0-4032-48A2-AA86-383F9487BB52@broadinstitute.org>
References: <0F015AE7-09D7-4F4D-B547-353830FD82A2@broadinstitute.org>
<563D9C80-555A-4866-9E74-F04FF5458E27@sanger.ac.uk>
<1FD158A0-4032-48A2-AA86-383F9487BB52@broadinstitute.org>
Message-ID: <2516E6D2-3584-49AB-8648-14DECF48205D@sanger.ac.uk>
Hi Andrew
I guess this depends on what you are using the RNG for. What are you
using it for? The registry needs the result from a sequence request to
conform to the sequence.rng as it is on the registry as if a sequence
is requested the test region put into the registry should return a
sequence, if it returns an ERRORSEGMENT the the registry wants that
reported as invalid. If I add the errorsegment to that document as you
suggest the registry won't pick it up as invalid.
On 1 Mar 2011, at 15:19, Andrew Roberts wrote:
> Alright, now I'm hitting the lack of support for ERRORSEGMENT
> within the sequence command response. I think there needs to be a
> choice inside the oneOrMore, so that elements can be SEQUENCE or
> ERRORSEGMENT. I attached what I'm using as a testing schema in the
> meantime, hopefully it's helpful.
>
> Andrew
>
>
>
> On Mar 1, 2011, at 5:09 AM, Jonathan Warren wrote:
>
>> This is fixed now.
>>
>> On 28 Feb 2011, at 18:34, Andrew Roberts wrote:
>>
>>> Hello,
>>>
>>> I'm working on some validations, and I think I may have found an
>>> error in sequence.rng (www.dasregistry.org/validation1.6/sequence.rng)
>>> . It appears that this schema makes no provision for the actual
>>> text data that belongs to the sequence element (i.e. the bases
>>> themselves). Is this a known issue?
>>>
>>> If not, I think the solution is to add a element inside
>>> the sequence element. Line 10 should read,
>>>
>>>
>>> attribute>
>>> attribute>
>>> attribute>
>>> text>
>>>
>>> Cheers,
>>>
>>> Andrew
>>> _______________________________________________
>>> DAS mailing list
>>> DAS at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/das
>>
>> Jonathan Warren
>> Senior Developer and DAS coordinator
>> blog: http://biodasman.wordpress.com/
>> jw12 at sanger.ac.uk
>> Ext: 2314
>> Telephone: 01223 492314
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>> The Wellcome Trust Sanger Institute is operated by Genome
>> ResearchLimited, a charity registered in England with number
>> 1021457 and acompany registered in England with number 2742969,
>> whose registeredoffice is 215 Euston Road, London, NW1 2BE.
>
Jonathan Warren
Senior Developer and DAS coordinator
blog: http://biodasman.wordpress.com/
jw12 at sanger.ac.uk
Ext: 2314
Telephone: 01223 492314
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
From andy.jenkinson at ebi.ac.uk Tue Mar 1 11:19:12 2011
From: andy.jenkinson at ebi.ac.uk (Andy Jenkinson)
Date: Tue, 1 Mar 2011 16:19:12 +0000
Subject: [DAS] RELAX NG schema for "sequence" command
In-Reply-To: <2516E6D2-3584-49AB-8648-14DECF48205D@sanger.ac.uk>
References: <0F015AE7-09D7-4F4D-B547-353830FD82A2@broadinstitute.org>
<563D9C80-555A-4866-9E74-F04FF5458E27@sanger.ac.uk>
<1FD158A0-4032-48A2-AA86-383F9487BB52@broadinstitute.org>
<2516E6D2-3584-49AB-8648-14DECF48205D@sanger.ac.uk>
Message-ID: <45146209-B676-4307-B020-C87AFFFD0235@ebi.ac.uk>
To put another way, the ERRORSEGMENT element is only valid if the server supports the error-segment capability. I guess the issue is that the only use of the RNG schemas upto now for validation has been by the registry.
Personally from a technical point of view, I think including the errorsegment element in the RNG schemas (this will apply to many commands not just sequence) is the "correct" interpretation. However that means the registry would need to apply it's errorsegment test separately rather than relying on the rng to flag it as invalid. It already does the reverse of this test separately so this makes sense to me (i.e. if I ask for a bad segment do you give me an errorsegment element?).
Cheers,
Andy
On 1 Mar 2011, at 15:54, Jonathan Warren wrote:
> Hi Andrew
>
> I guess this depends on what you are using the RNG for. What are you using it for? The registry needs the result from a sequence request to conform to the sequence.rng as it is on the registry as if a sequence is requested the test region put into the registry should return a sequence, if it returns an ERRORSEGMENT the the registry wants that reported as invalid. If I add the errorsegment to that document as you suggest the registry won't pick it up as invalid.
>
> On 1 Mar 2011, at 15:19, Andrew Roberts wrote:
>
>> Alright, now I'm hitting the lack of support for ERRORSEGMENT within the sequence command response. I think there needs to be a choice inside the oneOrMore, so that elements can be SEQUENCE or ERRORSEGMENT. I attached what I'm using as a testing schema in the meantime, hopefully it's helpful.
>>
>> Andrew
>>
>>
>>
>> On Mar 1, 2011, at 5:09 AM, Jonathan Warren wrote:
>>
>>> This is fixed now.
>>>
>>> On 28 Feb 2011, at 18:34, Andrew Roberts wrote:
>>>
>>>> Hello,
>>>>
>>>> I'm working on some validations, and I think I may have found an error in sequence.rng (www.dasregistry.org/validation1.6/sequence.rng). It appears that this schema makes no provision for the actual text data that belongs to the sequence element (i.e. the bases themselves). Is this a known issue?
>>>>
>>>> If not, I think the solution is to add a element inside the sequence element. Line 10 should read,
>>>>
>>>>
>>>>
>>>> Cheers,
>>>>
>>>> Andrew
>>>> _______________________________________________
>>>> DAS mailing list
>>>> DAS at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/das
>>>
>>> Jonathan Warren
>>> Senior Developer and DAS coordinator
>>> blog: http://biodasman.wordpress.com/
>>> jw12 at sanger.ac.uk
>>> Ext: 2314
>>> Telephone: 01223 492314
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>> The Wellcome Trust Sanger Institute is operated by Genome ResearchLimited, a charity registered in England with number 1021457 and acompany registered in England with number 2742969, whose registeredoffice is 215 Euston Road, London, NW1 2BE.
>>
>
> Jonathan Warren
> Senior Developer and DAS coordinator
> blog: http://biodasman.wordpress.com/
> jw12 at sanger.ac.uk
> Ext: 2314
> Telephone: 01223 492314
>
>
>
>
>
>
>
>
>
> --
> The Wellcome Trust Sanger Institute is operated by Genome ResearchLimited, a charity registered in England with number 1021457 and acompany registered in England with number 2742969, whose registeredoffice is 215 Euston Road, London, NW1 2BE._______________________________________________
> DAS mailing list
> DAS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/das
From aroberts at broadinstitute.org Tue Mar 1 11:20:38 2011
From: aroberts at broadinstitute.org (Andrew Roberts)
Date: Tue, 1 Mar 2011 11:20:38 -0500
Subject: [DAS] RELAX NG schema for "sequence" command
In-Reply-To: <2516E6D2-3584-49AB-8648-14DECF48205D@sanger.ac.uk>
References: <0F015AE7-09D7-4F4D-B547-353830FD82A2@broadinstitute.org>
<563D9C80-555A-4866-9E74-F04FF5458E27@sanger.ac.uk>
<1FD158A0-4032-48A2-AA86-383F9487BB52@broadinstitute.org>
<2516E6D2-3584-49AB-8648-14DECF48205D@sanger.ac.uk>
Message-ID: <3E08E18F-CE0A-4B3F-A3FA-7A0EF5269E95@broadinstitute.org>
Ah, interesting. I'm exercising my implementations of the das commands, and then validating the XML against the rng as a correctness test. I had been assuming that the rng was a description of all possible server responses, including error cases, but I now see what you guys are using them for.
Thanks!
On Mar 1, 2011, at 10:54 AM, Jonathan Warren wrote:
> Hi Andrew
>
> I guess this depends on what you are using the RNG for. What are you using it for? The registry needs the result from a sequence request to conform to the sequence.rng as it is on the registry as if a sequence is requested the test region put into the registry should return a sequence, if it returns an ERRORSEGMENT the the registry wants that reported as invalid. If I add the errorsegment to that document as you suggest the registry won't pick it up as invalid.
>
> On 1 Mar 2011, at 15:19, Andrew Roberts wrote:
>
>> Alright, now I'm hitting the lack of support for ERRORSEGMENT within the sequence command response. I think there needs to be a choice inside the oneOrMore, so that elements can be SEQUENCE or ERRORSEGMENT. I attached what I'm using as a testing schema in the meantime, hopefully it's helpful.
>>
>> Andrew
>>
>>
>>
>> On Mar 1, 2011, at 5:09 AM, Jonathan Warren wrote:
>>
>>> This is fixed now.
>>>
>>> On 28 Feb 2011, at 18:34, Andrew Roberts wrote:
>>>
>>>> Hello,
>>>>
>>>> I'm working on some validations, and I think I may have found an error in sequence.rng (www.dasregistry.org/validation1.6/sequence.rng). It appears that this schema makes no provision for the actual text data that belongs to the sequence element (i.e. the bases themselves). Is this a known issue?
>>>>
>>>> If not, I think the solution is to add a element inside the sequence element. Line 10 should read,
>>>>
>>>>
>>>>
>>>> Cheers,
>>>>
>>>> Andrew
>>>> _______________________________________________
>>>> DAS mailing list
>>>> DAS at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/das
>>>
>>> Jonathan Warren
>>> Senior Developer and DAS coordinator
>>> blog: http://biodasman.wordpress.com/
>>> jw12 at sanger.ac.uk
>>> Ext: 2314
>>> Telephone: 01223 492314
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>> The Wellcome Trust Sanger Institute is operated by Genome ResearchLimited, a charity registered in England with number 1021457 and acompany registered in England with number 2742969, whose registeredoffice is 215 Euston Road, London, NW1 2BE.
>>
>
> Jonathan Warren
> Senior Developer and DAS coordinator
> blog: http://biodasman.wordpress.com/
> jw12 at sanger.ac.uk
> Ext: 2314
> Telephone: 01223 492314
>
>
>
>
>
>
>
>
>
> --
> The Wellcome Trust Sanger Institute is operated by Genome ResearchLimited, a charity registered in England with number 1021457 and acompany registered in England with number 2742969, whose registeredoffice is 215 Euston Road, London, NW1 2BE.
From jw12 at sanger.ac.uk Tue Mar 1 11:33:51 2011
From: jw12 at sanger.ac.uk (Jonathan Warren)
Date: Tue, 1 Mar 2011 16:33:51 +0000
Subject: [DAS] RELAX NG schema for "sequence" command
In-Reply-To: <3E08E18F-CE0A-4B3F-A3FA-7A0EF5269E95@broadinstitute.org>
References: <0F015AE7-09D7-4F4D-B547-353830FD82A2@broadinstitute.org>
<563D9C80-555A-4866-9E74-F04FF5458E27@sanger.ac.uk>
<1FD158A0-4032-48A2-AA86-383F9487BB52@broadinstitute.org>
<2516E6D2-3584-49AB-8648-14DECF48205D@sanger.ac.uk>
<3E08E18F-CE0A-4B3F-A3FA-7A0EF5269E95@broadinstitute.org>
Message-ID: <2CA04BEC-F6A4-4AC4-B30C-06F0D647F235@sanger.ac.uk>
If your data source is publicaly available you can test it by making a
request like this:
http://www.dasregistry.org/validate?url=http://das.sanger.ac.uk/das/otter_das/&testcode=22:19173435,19372173
which returns properties for the status of each command. Or you can
use the validation page on the registry interface to do the same thing.
Some people here are working on some XSD documents in order to use
JAXB which will probably be of use to you as they should be structured
more as you suggest.
We maybe need to make it more clear how we use these RNG documents in
the documentation.
Thanks for the feedback!
On 1 Mar 2011, at 16:20, Andrew Roberts wrote:
> Ah, interesting. I'm exercising my implementations of the das
> commands, and then validating the XML against the rng as a
> correctness test. I had been assuming that the rng was a description
> of all possible server responses, including error cases, but I now
> see what you guys are using them for.
>
> Thanks!
>
>
>
> On Mar 1, 2011, at 10:54 AM, Jonathan Warren
> wrote:
>
>> Hi Andrew
>>
>> I guess this depends on what you are using the RNG for. What are
>> you using it for? The registry needs the result from a sequence
>> request to conform to the sequence.rng as it is on the registry as
>> if a sequence is requested the test region put into the registry
>> should return a sequence, if it returns an ERRORSEGMENT the the
>> registry wants that reported as invalid. If I add the errorsegment
>> to that document as you suggest the registry won't pick it up as
>> invalid.
>>
>> On 1 Mar 2011, at 15:19, Andrew Roberts wrote:
>>
>>> Alright, now I'm hitting the lack of support for ERRORSEGMENT
>>> within the sequence command response. I think there needs to be a
>>> choice inside the oneOrMore, so that elements can be SEQUENCE or
>>> ERRORSEGMENT. I attached what I'm using as a testing schema in
>>> the meantime, hopefully it's helpful.
>>>
>>> Andrew
>>>
>>>
>>>
>>> On Mar 1, 2011, at 5:09 AM, Jonathan Warren wrote:
>>>
>>>> This is fixed now.
>>>>
>>>> On 28 Feb 2011, at 18:34, Andrew Roberts wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> I'm working on some validations, and I think I may have found
>>>>> an error in sequence.rng (www.dasregistry.org/validation1.6/sequence.rng)
>>>>> . It appears that this schema makes no provision for the actual
>>>>> text data that belongs to the sequence element (i.e. the bases
>>>>> themselves). Is this a known issue?
>>>>>
>>>>> If not, I think the solution is to add a element
>>>>> inside the sequence element. Line 10 should read,
>>>>>
>>>>>
>>>>> attribute>
>>>>> attribute>
>>>>> attribute>
>>>>> text>
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Andrew
>>>>> _______________________________________________
>>>>> DAS mailing list
>>>>> DAS at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/das
>>>>
>>>> Jonathan Warren
>>>> Senior Developer and DAS coordinator
>>>> blog: http://biodasman.wordpress.com/
>>>> jw12 at sanger.ac.uk
>>>> Ext: 2314
>>>> Telephone: 01223 492314
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> The Wellcome Trust Sanger Institute is operated by Genome
>>>> ResearchLimited, a charity registered in England with number
>>>> 1021457 and acompany registered in England with number 2742969,
>>>> whose registeredoffice is 215 Euston Road, London, NW1 2BE.
>>>
>>
>> Jonathan Warren
>> Senior Developer and DAS coordinator
>> blog: http://biodasman.wordpress.com/
>> jw12 at sanger.ac.uk
>> Ext: 2314
>> Telephone: 01223 492314
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>> The Wellcome Trust Sanger Institute is operated by Genome
>> ResearchLimited, a charity registered in England with number
>> 1021457 and acompany registered in England with number 2742969,
>> whose registeredoffice is 215 Euston Road, London, NW1 2BE.
Jonathan Warren
Senior Developer and DAS coordinator
blog: http://biodasman.wordpress.com/
jw12 at sanger.ac.uk
Ext: 2314
Telephone: 01223 492314
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
From thomas.a.down at gmail.com Sat Mar 5 10:01:42 2011
From: thomas.a.down at gmail.com (Thomas Down)
Date: Sat, 5 Mar 2011 15:01:42 +0000
Subject: [DAS] Adjacent feature extension
Message-ID:
Following on from a discussion at the DAS Workshop:
DAS doesn't have offer any specific support for clients that provide
mechanisms for skipping from the current position to the "next" or
"previous" feature. I'd like to propose a small extension (one extra filter
option on the existing "features" command) to facilitate this. Full details
are here:
https://github.com/dasmoth/dalliance/wiki/AdjacentFeatures
...but briefly, a request like:
/das/features?adjacent=chr21:30000000
...would be expected to return a standard DASGFF document containing the two
features either side of the specified point.
All comments are welcome! One open question is whether the query should *
just* return the adjacent features, or should also return other features
overlapping the adjacent feature. My preference is for them former, but the
latter does have the merit of being quirk-wards compatible with the existing
feature_id filter. Does anyone else have strong feelings one way or
another.
There isn't currently a full implementation of this, but if nobody comes up
with major objections, I'm hoping to try implementations in Dazzle (server)
and Dalliance (client) within the next couple of weeks.
Thomas.
From andy.jenkinson at ebi.ac.uk Mon Mar 7 05:04:08 2011
From: andy.jenkinson at ebi.ac.uk (Andy Jenkinson)
Date: Mon, 7 Mar 2011 10:04:08 +0000
Subject: [DAS] Adjacent feature extension
In-Reply-To:
References:
Message-ID: <4CB1186B-D5F0-4750-A297-88538BB196B5@ebi.ac.uk>
Hi Thomas,
Thanks for this. Regarding the option of whether to return just one feature per side or all overlapping features, the only other advantage that immediately springs to mind for the latter (in addition to some measure of consistency, as you mention) is that it allows the client to immediately render the exact region of that feature without triggering another request. It would generally mean changing zoom level. I'm can't say if clients are likely to follow this mechanism as opposed to, say, pan and centre on the feature, but if they wanted to it would be more efficient (and possibly a little bit more efficient anyway depending on how your client does its requests).
Disadvantages I can think of:
- "adjacent" request takes marginally longer
- not quite as obvious what clients should put in their UI controls - need to pick a feature to be able to do "jump to BRCA1"
- risk of servers not implementing it correctly and only returning one feature anyway (although I don't think this is likely as the concept is different to "feature-by-id")
Some things to further define:
- servers can't return a fake feature
- should servers return features on different reference sequences if there are none one the current one?
- how should servers treat features that overlap the adjacent range? Treat them as the adjacent feature to return, or only include features completely outside the query range? What if the next feature completely outside the query range is part of the same feature hierarchy (e.g. an exon outside the current window).
Any thoughts from anyone on these?
Cheers,
Andy
On 5 Mar 2011, at 15:01, Thomas Down wrote:
> Following on from a discussion at the DAS Workshop:
>
> DAS doesn't have offer any specific support for clients that provide
> mechanisms for skipping from the current position to the "next" or
> "previous" feature. I'd like to propose a small extension (one extra filter
> option on the existing "features" command) to facilitate this. Full details
> are here:
>
> https://github.com/dasmoth/dalliance/wiki/AdjacentFeatures
>
> ...but briefly, a request like:
>
> /das/features?adjacent=chr21:30000000
>
> ...would be expected to return a standard DASGFF document containing the two
> features either side of the specified point.
>
> All comments are welcome! One open question is whether the query should *
> just* return the adjacent features, or should also return other features
> overlapping the adjacent feature. My preference is for them former, but the
> latter does have the merit of being quirk-wards compatible with the existing
> feature_id filter. Does anyone else have strong feelings one way or
> another.
>
> There isn't currently a full implementation of this, but if nobody comes up
> with major objections, I'm hoping to try implementations in Dazzle (server)
> and Dalliance (client) within the next couple of weeks.
>
> Thomas.
> _______________________________________________
> DAS mailing list
> DAS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/das
From thomas.a.down at gmail.com Mon Mar 7 05:35:02 2011
From: thomas.a.down at gmail.com (Thomas Down)
Date: Mon, 7 Mar 2011 10:35:02 +0000
Subject: [DAS] Adjacent feature extension
In-Reply-To: <4CB1186B-D5F0-4750-A297-88538BB196B5@ebi.ac.uk>
References:
<4CB1186B-D5F0-4750-A297-88538BB196B5@ebi.ac.uk>
Message-ID:
On Mon, Mar 7, 2011 at 10:04 AM, Andy Jenkinson wrote:
> Hi Thomas,
>
> Thanks for this. Regarding the option of whether to return just one feature
> per side or all overlapping features, the only other advantage that
> immediately springs to mind for the latter (in addition to some measure of
> consistency, as you mention) is that it allows the client to immediately
> render the exact region of that feature without triggering another request.
> It would generally mean changing zoom level. I'm can't say if clients are
> likely to follow this mechanism as opposed to, say, pan and centre on the
> feature, but if they wanted to it would be more efficient (and possibly a
> little bit more efficient anyway depending on how your client does its
> requests).
>
Yep, I agree. I'd be interested to learn whether there are any clients that
would seriously consider taking advantage of this. My own thinking is that
even if we do adjust zoom level (as Dalliance sometimes does, e.g. in the
"jump to gene..." navigation op), clients are much more likely to zoom to a
view that contains the target feature plus a "sensible" amount of flanking
sequence, rather than a view where the target feature is perfectly framed.
Furthermore, this rather seems like optimizing for the case where only one
annotation source is active. Surely we're talking about the
*distributed*annotation system, and clients will still have to go off
and query all the
other annotation sources, even if they are able to skip the one which
responded to the "adjacent" query. So long as there's some kind of query
parallelization in place, this probably isn't a performance issue.
Do any other client developers feel differently?
> Disadvantages I can think of:
> - "adjacent" request takes marginally longer
> - not quite as obvious what clients should put in their UI controls - need
> to pick a feature to be able to do "jump to BRCA1"
> - risk of servers not implementing it correctly and only returning one
> feature anyway (although I don't think this is likely as the concept is
> different to "feature-by-id")
>
> Some things to further define:
> - servers can't return a fake feature
>
Yep, will clarify this.
> - should servers return features on different reference sequences if there
> are none one the current one?
>
In my opinion, absolutely yes. Otherwise the "10 features in the genome"
case remains a massive pain (and potentially a disaster, for
inhomogeneous-dstributed data; won't someone think of the MHC tiling arrays?
:-). And even worse for the "10 features in UniProt" case (where I can also
see this feature being quite interesting).
I've tried to be explicit about this in my proposal (see the penultimate
paragraph + example 3), but any suggestions for further clarifications are
welcome.
> - how should servers treat features that overlap the adjacent range? Treat
> them as the adjacent feature to return, or only include features completely
> outside the query range? What if the next feature completely outside the
> query range is part of the same feature hierarchy (e.g. an exon outside the
> current window).
>
It's a point rather than a range, but yes I agree this is still an open
question. I'd actually written the spec such that overlapping features do
get returned (on the assumption that clients will do "trivial" cases of
next/previous feature in-memory without a network round trip), but again if
other client developers do things differently, I'd like to know.
I think "include overlapping" will have less special-cases to worry about,
though. e.g. the PART/PARENT issue you allude to. Let clients deal with
that ("dumb servers, smart clients").
Thomas.
From ljgarcia at ebi.ac.uk Mon Mar 7 05:49:09 2011
From: ljgarcia at ebi.ac.uk (Leyla Garcia)
Date: Mon, 07 Mar 2011 10:49:09 +0000
Subject: [DAS] Adjacent feature extension
In-Reply-To:
References: <4CB1186B-D5F0-4750-A297-88538BB196B5@ebi.ac.uk>
Message-ID: <4D74B825.6080509@ebi.ac.uk>
On 07/03/2011 10:35, Thomas Down wrote:
> On Mon, Mar 7, 2011 at 10:04 AM, Andy Jenkinsonwrote:
>
>> Hi Thomas,
>>
>> Thanks for this. Regarding the option of whether to return just one feature
>> per side or all overlapping features, the only other advantage that
>> immediately springs to mind for the latter (in addition to some measure of
>> consistency, as you mention) is that it allows the client to immediately
>> render the exact region of that feature without triggering another request.
>> It would generally mean changing zoom level. I'm can't say if clients are
>> likely to follow this mechanism as opposed to, say, pan and centre on the
>> feature, but if they wanted to it would be more efficient (and possibly a
>> little bit more efficient anyway depending on how your client does its
>> requests).
>>
> Yep, I agree. I'd be interested to learn whether there are any clients that
> would seriously consider taking advantage of this. My own thinking is that
> even if we do adjust zoom level (as Dalliance sometimes does, e.g. in the
> "jump to gene..." navigation op), clients are much more likely to zoom to a
> view that contains the target feature plus a "sensible" amount of flanking
> sequence, rather than a view where the target feature is perfectly framed.
>
> Furthermore, this rather seems like optimizing for the case where only one
> annotation source is active. Surely we're talking about the
> *distributed*annotation system, and clients will still have to go off
> and query all the
> other annotation sources, even if they are able to skip the one which
> responded to the "adjacent" query. So long as there's some kind of query
> parallelization in place, this probably isn't a performance issue.
>
> Do any other client developers feel differently?
So we can include or exclude the overlapping features. What about adding
another filter? "adjacent" would retrieve only those around the feature
requested and "overlapping" would returns the overlaps? Using both would
be interpreted as OR. Does is make sense?
Would that be interesting for people using this type of filters?
>> Disadvantages I can think of:
>> - "adjacent" request takes marginally longer
>> - not quite as obvious what clients should put in their UI controls - need
>> to pick a feature to be able to do "jump to BRCA1"
>> - risk of servers not implementing it correctly and only returning one
>> feature anyway (although I don't think this is likely as the concept is
>> different to "feature-by-id")
>>
>> Some things to further define:
>> - servers can't return a fake feature
> Yep, will clarify this.
>
>> - should servers return features on different reference sequences if there
>> are none one the current one?
> In my opinion, absolutely yes. Otherwise the "10 features in the genome"
> case remains a massive pain (and potentially a disaster, for
> inhomogeneous-dstributed data; won't someone think of the MHC tiling arrays?
> :-). And even worse for the "10 features in UniProt" case (where I can also
> see this feature being quite interesting).
Mmm, I do not understand it, "10 features in UniProt" case? "10 features
in the genome" case? Could you please some more information about it?
> I've tried to be explicit about this in my proposal (see the penultimate
> paragraph + example 3), but any suggestions for further clarifications are
> welcome.
>
>
>> - how should servers treat features that overlap the adjacent range? Treat
>> them as the adjacent feature to return, or only include features completely
>> outside the query range? What if the next feature completely outside the
>> query range is part of the same feature hierarchy (e.g. an exon outside the
>> current window).
>>
> It's a point rather than a range, but yes I agree this is still an open
> question. I'd actually written the spec such that overlapping features do
> get returned (on the assumption that clients will do "trivial" cases of
> next/previous feature in-memory without a network round trip), but again if
> other client developers do things differently, I'd like to know.
>
> I think "include overlapping" will have less special-cases to worry about,
> though. e.g. the PART/PARENT issue you allude to. Let clients deal with
> that ("dumb servers, smart clients").
>
> Thomas.
> _______________________________________________
> DAS mailing list
> DAS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/das
From jw12 at sanger.ac.uk Mon Mar 7 05:57:48 2011
From: jw12 at sanger.ac.uk (Jonathan Warren)
Date: Mon, 7 Mar 2011 10:57:48 +0000
Subject: [DAS] Adjacent feature extension
In-Reply-To:
References:
<4CB1186B-D5F0-4750-A297-88538BB196B5@ebi.ac.uk>
Message-ID:
On 7 Mar 2011, at 10:35, Thomas Down wrote:
> On Mon, Mar 7, 2011 at 10:04 AM, Andy Jenkinson >wrote:
>
>> Hi Thomas,
>>
>> Thanks for this. Regarding the option of whether to return just one
>> feature
>> per side or all overlapping features, the only other advantage that
>> immediately springs to mind for the latter (in addition to some
>> measure of
>> consistency, as you mention) is that it allows the client to
>> immediately
>> render the exact region of that feature without triggering another
>> request.
>> It would generally mean changing zoom level. I'm can't say if
>> clients are
>> likely to follow this mechanism as opposed to, say, pan and centre
>> on the
>> feature, but if they wanted to it would be more efficient (and
>> possibly a
>> little bit more efficient anyway depending on how your client does
>> its
>> requests).
>>
>
> Yep, I agree. I'd be interested to learn whether there are any
> clients that
> would seriously consider taking advantage of this. My own thinking
> is that
> even if we do adjust zoom level (as Dalliance sometimes does, e.g.
> in the
> "jump to gene..." navigation op), clients are much more likely to
> zoom to a
> view that contains the target feature plus a "sensible" amount of
> flanking
> sequence, rather than a view where the target feature is perfectly
> framed.
>
> Furthermore, this rather seems like optimizing for the case where
> only one
> annotation source is active. Surely we're talking about the
> *distributed*annotation system, and clients will still have to go off
> and query all the
> other annotation sources, even if they are able to skip the one which
> responded to the "adjacent" query. So long as there's some kind of
> query
> parallelization in place, this probably isn't a performance issue.
My vote would ideally to change feature_by_id to return one feature
and have the adjacent_feature as returning one feature. This in my
opinion would mean these capabilities on servers do "exactly as they
say on the tin" and would be easier to implement for data providers
and are thus more likely to be implemented?
If the feature_id capability as it stands is needed it could be
changed to something more akin to what it means like feature_id_region
but I would bet no one would bother to change it/use it?
However the reality is that we are too late to change the old
feature_by_id, but I don't think we need to make the same mistake
twice by repeating it for adjacent_features?
>
> Do any other client developers feel differently?
>
>
>> Disadvantages I can think of:
>> - "adjacent" request takes marginally longer
>> - not quite as obvious what clients should put in their UI controls
>> - need
>> to pick a feature to be able to do "jump to BRCA1"
>> - risk of servers not implementing it correctly and only returning
>> one
>> feature anyway (although I don't think this is likely as the
>> concept is
>> different to "feature-by-id")
>>
>> Some things to further define:
>> - servers can't return a fake feature
>>
>
> Yep, will clarify this.
>
>
>> - should servers return features on different reference sequences
>> if there
>> are none one the current one?
>>
>
> In my opinion, absolutely yes. Otherwise the "10 features in the
> genome"
> case remains a massive pain (and potentially a disaster, for
> inhomogeneous-dstributed data; won't someone think of the MHC tiling
> arrays?
> :-). And even worse for the "10 features in UniProt" case (where I
> can also
> see this feature being quite interesting).
>
> I've tried to be explicit about this in my proposal (see the
> penultimate
> paragraph + example 3), but any suggestions for further
> clarifications are
> welcome.
>
>
>> - how should servers treat features that overlap the adjacent
>> range? Treat
>> them as the adjacent feature to return, or only include features
>> completely
>> outside the query range? What if the next feature completely
>> outside the
>> query range is part of the same feature hierarchy (e.g. an exon
>> outside the
>> current window).
>>
>
> It's a point rather than a range, but yes I agree this is still an
> open
> question. I'd actually written the spec such that overlapping
> features do
> get returned (on the assumption that clients will do "trivial" cases
> of
> next/previous feature in-memory without a network round trip), but
> again if
> other client developers do things differently, I'd like to know.
>
> I think "include overlapping" will have less special-cases to worry
> about,
> though. e.g. the PART/PARENT issue you allude to. Let clients deal
> with
> that ("dumb servers, smart clients").
>
> Thomas.
> _______________________________________________
> DAS mailing list
> DAS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/das
Jonathan Warren
Senior Developer and DAS coordinator
blog: http://biodasman.wordpress.com/
jw12 at sanger.ac.uk
Ext: 2314
Telephone: 01223 492314
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
From andy.jenkinson at ebi.ac.uk Mon Mar 7 06:19:22 2011
From: andy.jenkinson at ebi.ac.uk (Andy Jenkinson)
Date: Mon, 7 Mar 2011 11:19:22 +0000
Subject: [DAS] Adjacent feature extension
In-Reply-To:
References:
<4CB1186B-D5F0-4750-A297-88538BB196B5@ebi.ac.uk>
Message-ID:
On 7 Mar 2011, at 10:57, Jonathan Warren wrote:
>
> My vote would ideally to change feature_by_id to return one feature and have the adjacent_feature as returning one feature. This in my opinion would mean these capabilities on servers do "exactly as they say on the tin" and would be easier to implement for data providers and are thus more likely to be implemented?
> If the feature_id capability as it stands is needed it could be changed to something more akin to what it means like feature_id_region but I would bet no one would bother to change it/use it?
>
> However the reality is that we are too late to change the old feature_by_id, but I don't think we need to make the same mistake twice by repeating it for adjacent_features?
I disagree. I think the problems with feature-by-id are that a) the name of the capability implies singular, and b) the concept itself (i.e. getting a feature by its ID) is such a common operation that is otherwise missing in DAS. I don't think either of those apply to an "adjacent" capability unless you specifically choose to call it "adjacent-feature" as opposed to "adjacent-features". I honestly don't think a capability called "adjacent-features" with a query structure like "/das/features?adjacent=foo:1" implies singular, rather the opposite in fact. To me that query suggests "get me the features adjacent to foo:1". True that 2 features is plural which still leaves a "one feature either side" interpretation possible, but IMO certainly not implicit enough to stop anyone implementing it to actually read the specification/documentation. Add to that the fact that this is an entirely new behaviour that we have the chance to properly document and make it clear exactly what the server must do.
So IMO we have a clear choice.
As to feature-by-id, I know changing behaviour is potentially a very disruptive change, but I think we can potentially do this purely because servers don't tend to implement it correctly anyway. Clients can happily filter out any additional features returned by old servers, and if any clients are reliant on the server including all overlapping features then as far as I am concerned they are either a) targeting specific servers rather than DAS-wide and thus unaffected, or b) already broken :)
I have to admit that the feature-by-id capability is one of the (many) things I loathe having to explain and would love to change it. Doing so would be consistent with what we were trying to do with 1.6 (i.e. rationalise existing use of the spec) but I chickened out really.
Cheers,
Andy
From thomas.a.down at gmail.com Mon Mar 7 06:31:06 2011
From: thomas.a.down at gmail.com (Thomas Down)
Date: Mon, 7 Mar 2011 11:31:06 +0000
Subject: [DAS] Adjacent feature extension
In-Reply-To: <4D74B825.6080509@ebi.ac.uk>
References:
<4CB1186B-D5F0-4750-A297-88538BB196B5@ebi.ac.uk>
<4D74B825.6080509@ebi.ac.uk>
Message-ID:
On Mon, Mar 7, 2011 at 10:49 AM, Leyla Garcia wrote:
>
> So we can include or exclude the overlapping features. What about adding
> another filter? "adjacent" would retrieve only those around the feature
> requested and "overlapping" would returns the overlaps? Using both would be
> interpreted as OR. Does is make sense?
> Would that be interesting for people using this type of filters?
>
How does the "overlapping" filter differ from the existing "segment" filter?
The proposal in its current form states that segment and adjacent filters
may be combined, and servers should return the union of features selected by
the individual filters (e.g. a logical OR, just as you suggest).
> In my opinion, absolutely yes. Otherwise the "10 features in the genome"
>> case remains a massive pain (and potentially a disaster, for
>> inhomogeneous-dstributed data; won't someone think of the MHC tiling
>> arrays?
>> :-). And even worse for the "10 features in UniProt" case (where I can
>> also
>> see this feature being quite interesting).
>>
> Mmm, I do not understand it, "10 features in UniProt" case? "10 features in
> the genome" case? Could you please some more information about it?
>
Yes, think of a track that is very sparsely, and potentially inhomogenously,
populated. Things like "imprinted regions in the human genome" (okay, there
are more than 10 of those, but maybe no more than 100, and they're
clustered). Or "my favorite rare protein domain". Scanning through such
tracks by scrolling isn't going to be terribly helpful!
One of the use cases I always have in the back of my mind when discussing
stuff like this is the biologist who has done some kind of analysis, has 50
"interesting" regions of the genome (or proteome), and wants to flip through
them quickly to eyeball, sanity check, and see if they can see any
contextual patterns. I work all the time with people who want this, and
sometimes do it myself. Up until now, I've tended to solve it by creating
an HTML frameset with a list of regions in one frame and a visualization
tool (either Ensembl or Dalliance) on the other. It's far from an optional
solution. Allowing people to flick from region to region *within the
browser* is far smoother.
Thomas.
From gsalazar at cs.uct.ac.za Mon Mar 7 06:16:01 2011
From: gsalazar at cs.uct.ac.za (Gustavo Salazar)
Date: Mon, 7 Mar 2011 11:16:01 +0000
Subject: [DAS] Adjacent feature extension
In-Reply-To:
References:
<4CB1186B-D5F0-4750-A297-88538BB196B5@ebi.ac.uk>
Message-ID:
On 7 Mar 2011, at 10:57, Jonathan Warren wrote:
>
> On 7 Mar 2011, at 10:35, Thomas Down wrote:
>
>> On Mon, Mar 7, 2011 at 10:04 AM, Andy Jenkinson wrote:
>>
>>> Hi Thomas,
>>>
>>> Thanks for this. Regarding the option of whether to return just one feature
>>> per side or all overlapping features, the only other advantage that
>>> immediately springs to mind for the latter (in addition to some measure of
>>> consistency, as you mention) is that it allows the client to immediately
>>> render the exact region of that feature without triggering another request.
>>> It would generally mean changing zoom level. I'm can't say if clients are
>>> likely to follow this mechanism as opposed to, say, pan and centre on the
>>> feature, but if they wanted to it would be more efficient (and possibly a
>>> little bit more efficient anyway depending on how your client does its
>>> requests).
>>>
>>
>> Yep, I agree. I'd be interested to learn whether there are any clients that
>> would seriously consider taking advantage of this. My own thinking is that
>> even if we do adjust zoom level (as Dalliance sometimes does, e.g. in the
>> "jump to gene..." navigation op), clients are much more likely to zoom to a
>> view that contains the target feature plus a "sensible" amount of flanking
>> sequence, rather than a view where the target feature is perfectly framed.
>>
>> Furthermore, this rather seems like optimizing for the case where only one
>> annotation source is active. Surely we're talking about the
>> *distributed*annotation system, and clients will still have to go off
>> and query all the
>> other annotation sources, even if they are able to skip the one which
>> responded to the "adjacent" query. So long as there's some kind of query
>> parallelization in place, this probably isn't a performance issue.
>
> My vote would ideally to change feature_by_id to return one feature and have the adjacent_feature as returning one feature. This in my opinion would mean these capabilities on servers do "exactly as they say on the tin" and would be easier to implement for data providers and are thus more likely to be implemented?
> If the feature_id capability as it stands is needed it could be changed to something more akin to what it means like feature_id_region but I would bet no one would bother to change it/use it?
>
> However the reality is that we are too late to change the old feature_by_id, but I don't think we need to make the same mistake twice by repeating it for adjacent_features?
I agree with Jonathan, feature_by_id sounds like it gets the feature by the requested Id, and to be honest is the way I have implemented before, so if you ask me I will say the adjacent capability should just return one feature. I don't think we are too late to change the old feature_by_id behaviour and we can take this as the opportunity to make such a change.
>
>
>>
>> Do any other client developers feel differently?
>>
>>
>>> Disadvantages I can think of:
>>> - "adjacent" request takes marginally longer
>>> - not quite as obvious what clients should put in their UI controls - need
>>> to pick a feature to be able to do "jump to BRCA1"
>>> - risk of servers not implementing it correctly and only returning one
>>> feature anyway (although I don't think this is likely as the concept is
>>> different to "feature-by-id")
>>>
>>> Some things to further define:
>>> - servers can't return a fake feature
>>>
>>
>> Yep, will clarify this.
>>
>>
>>> - should servers return features on different reference sequences if there
>>> are none one the current one?
>>>
>>
>> In my opinion, absolutely yes. Otherwise the "10 features in the genome"
>> case remains a massive pain (and potentially a disaster, for
>> inhomogeneous-dstributed data; won't someone think of the MHC tiling arrays?
>> :-). And even worse for the "10 features in UniProt" case (where I can also
>> see this feature being quite interesting).
>>
>> I've tried to be explicit about this in my proposal (see the penultimate
>> paragraph + example 3), but any suggestions for further clarifications are
>> welcome.
>>
>>
>>> - how should servers treat features that overlap the adjacent range? Treat
>>> them as the adjacent feature to return, or only include features completely
>>> outside the query range? What if the next feature completely outside the
>>> query range is part of the same feature hierarchy (e.g. an exon outside the
>>> current window).
>>>
>>
>> It's a point rather than a range, but yes I agree this is still an open
>> question. I'd actually written the spec such that overlapping features do
>> get returned (on the assumption that clients will do "trivial" cases of
>> next/previous feature in-memory without a network round trip), but again if
>> other client developers do things differently, I'd like to know.
>>
>> I think "include overlapping" will have less special-cases to worry about,
>> though. e.g. the PART/PARENT issue you allude to. Let clients deal with
>> that ("dumb servers, smart clients").
>>
>> Thomas.
>> _______________________________________________
>> DAS mailing list
>> DAS at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/das
>
> Jonathan Warren
> Senior Developer and DAS coordinator
> blog: http://biodasman.wordpress.com/
> jw12 at sanger.ac.uk
> Ext: 2314
> Telephone: 01223 492314
>
>
>
>
>
>
>
>
>
>
> --
> The Wellcome Trust Sanger Institute is operated by Genome ResearchLimited, a charity registered in England with number 1021457 and acompany registered in England with number 2742969, whose registeredoffice is 215 Euston Road, London, NW1 2BE._______________________________________________
> DAS mailing list
> DAS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/das
From jw12 at sanger.ac.uk Mon Mar 7 06:51:56 2011
From: jw12 at sanger.ac.uk (Jonathan Warren)
Date: Mon, 7 Mar 2011 11:51:56 +0000
Subject: [DAS] Adjacent feature extension
In-Reply-To:
References:
<4CB1186B-D5F0-4750-A297-88538BB196B5@ebi.ac.uk>
Message-ID:
On 7 Mar 2011, at 11:19, Andy Jenkinson wrote:
> On 7 Mar 2011, at 10:57, Jonathan Warren wrote:
>
>>
>> My vote would ideally to change feature_by_id to return one feature
>> and have the adjacent_feature as returning one feature. This in my
>> opinion would mean these capabilities on servers do "exactly as
>> they say on the tin" and would be easier to implement for data
>> providers and are thus more likely to be implemented?
>> If the feature_id capability as it stands is needed it could be
>> changed to something more akin to what it means like
>> feature_id_region but I would bet no one would bother to change it/
>> use it?
>>
>> However the reality is that we are too late to change the old
>> feature_by_id, but I don't think we need to make the same mistake
>> twice by repeating it for adjacent_features?
>
> I disagree. I think the problems with feature-by-id are that a) the
> name of the capability implies singular, and b) the concept itself
> (i.e. getting a feature by its ID) is such a common operation that
> is otherwise missing in DAS. I don't think either of those apply to
> an "adjacent" capability unless you specifically choose to call it
> "adjacent-feature" as opposed to "adjacent-features". I honestly
> don't think a capability called "adjacent-features" with a query
> structure like "/das/features?adjacent=foo:1" implies singular,
> rather the opposite in fact. To me that query suggests "get me the
> features adjacent to foo:1". True that 2 features is plural which
> still leaves a "one feature either side" interpretation possible,
> but IMO certainly not implicit enough to stop anyone implementing it
> to actually read the specification/documentation. Add to that the
> fact that this is an entirely new behaviour that we have the chance
> to properly document and make it clear exactly what the server must
> do.
>
> So IMO we have a clear choice.
I still think it's simpler to implement it for one feature either side
and keep complexity in the client. Generally how many people stay wake
after line 10 when reading the spec? :) Lets see if there are more
votes...
>
> As to feature-by-id, I know changing behaviour is potentially a very
> disruptive change, but I think we can potentially do this purely
> because servers don't tend to implement it correctly anyway. Clients
> can happily filter out any additional features returned by old
> servers, and if any clients are reliant on the server including all
> overlapping features then as far as I am concerned they are either
> a) targeting specific servers rather than DAS-wide and thus
> unaffected, or b) already broken :)
So you agree feature-by_id should be changed if we have the stomach
for it? - good and Gustavo too. Well done Andy - You have just agreed
to write Spec 1.7 or 3??? ;) Your argument above can be used for
leaving the spec as it is then as well - but ideally I agree and guess
we can call it spec 1.61 assuming other people agree.
>
> I have to admit that the feature-by-id capability is one of the
> (many) things I loathe having to explain and would love to change
> it. Doing so would be consistent with what we were trying to do with
> 1.6 (i.e. rationalise existing use of the spec) but I chickened out
> really.
>
> Cheers,
> Andy
Jonathan Warren
Senior Developer and DAS coordinator
blog: http://biodasman.wordpress.com/
jw12 at sanger.ac.uk
Ext: 2314
Telephone: 01223 492314
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
From thomas.a.down at gmail.com Mon Mar 7 07:01:15 2011
From: thomas.a.down at gmail.com (Thomas Down)
Date: Mon, 7 Mar 2011 12:01:15 +0000
Subject: [DAS] Adjacent feature extension
In-Reply-To:
References:
<4CB1186B-D5F0-4750-A297-88538BB196B5@ebi.ac.uk>
Message-ID:
On Mon, Mar 7, 2011 at 11:51 AM, Jonathan Warren wrote:
> As to feature-by-id, I know changing behaviour is potentially a very
>> disruptive change, but I think we can potentially do this purely because
>> servers don't tend to implement it correctly anyway. Clients can happily
>> filter out any additional features returned by old servers, and if any
>> clients are reliant on the server including all overlapping features then as
>> far as I am concerned they are either a) targeting specific servers rather
>> than DAS-wide and thus unaffected, or b) already broken :)
>>
> So you agree feature-by_id should be changed if we have the stomach for it?
> - good and Gustavo too. Well done Andy - You have just agreed to write Spec
> 1.7 or 3??? ;) Your argument above can be used for leaving the spec as it is
> then as well - but ideally I agree and guess we can call it spec 1.61
> assuming other people agree.
>
I don't know quite how productive this particular worm-can will be, but FWIW
I'd be happy to see it change. The current spec has caused significant
hassle (with both my client- and server-dev hats on!).
Thomas.
PS. One thing I would ask is: can we discuss the merits of the
feature-adjacent extension independently of any possible changes to the core
spec. It's addressing some specific needs, and if we can get the last few
wrinkles sorted out I'm very eager to get it finished and deployed soon.
From jw12 at sanger.ac.uk Mon Mar 7 07:16:05 2011
From: jw12 at sanger.ac.uk (Jonathan Warren)
Date: Mon, 7 Mar 2011 12:16:05 +0000
Subject: [DAS] Adjacent feature extension
In-Reply-To:
References:
<4CB1186B-D5F0-4750-A297-88538BB196B5@ebi.ac.uk>
Message-ID: <6E8B83BF-8B65-41F8-B697-696431D5176E@sanger.ac.uk>
On 7 Mar 2011, at 12:01, Thomas Down wrote:
>
>
> On Mon, Mar 7, 2011 at 11:51 AM, Jonathan Warren
> wrote:
> As to feature-by-id, I know changing behaviour is potentially a very
> disruptive change, but I think we can potentially do this purely
> because servers don't tend to implement it correctly anyway. Clients
> can happily filter out any additional features returned by old
> servers, and if any clients are reliant on the server including all
> overlapping features then as far as I am concerned they are either
> a) targeting specific servers rather than DAS-wide and thus
> unaffected, or b) already broken :)
> So you agree feature-by_id should be changed if we have the stomach
> for it? - good and Gustavo too. Well done Andy - You have just
> agreed to write Spec 1.7 or 3??? ;) Your argument above can be used
> for leaving the spec as it is then as well - but ideally I agree and
> guess we can call it spec 1.61 assuming other people agree.
>
> I don't know quite how productive this particular worm-can will be,
> but FWIW I'd be happy to see it change. The current spec has caused
> significant hassle (with both my client- and server-dev hats on!).
>
> Thomas.
>
> PS. One thing I would ask is: can we discuss the merits of the
> feature-adjacent extension independently of any possible changes to
> the core spec. It's addressing some specific needs, and if we can
> get the last few wrinkles sorted out I'm very eager to get it
> finished and deployed soon.
Fair enough, adjacent features would be an extension and so not affect
the core spec. As you know a similar adjacent_feature proposal has
been posted before to the list and yes we should definitely come up
with a solution now.
I'd say if we don't have any more objections in the next couple of
days then go with your proposal as is? I'll then put support into the
registry this week if that is the case. If you could also then copy
the proposal from here https://github.com/dasmoth/dalliance/wiki/AdjacentFeatures
to the extensions page here:
http://www.biodas.org/wiki/DAS1.6E#Adjacent_Feature_filter noting in
large letters that it was agreed by the community on such a such a date?
Jonathan Warren
Senior Developer and DAS coordinator
blog: http://biodasman.wordpress.com/
jw12 at sanger.ac.uk
Ext: 2314
Telephone: 01223 492314
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
From andy.jenkinson at ebi.ac.uk Mon Mar 7 07:43:32 2011
From: andy.jenkinson at ebi.ac.uk (Andy Jenkinson)
Date: Mon, 7 Mar 2011 12:43:32 +0000
Subject: [DAS] Adjacent feature extension
In-Reply-To:
References:
<4CB1186B-D5F0-4750-A297-88538BB196B5@ebi.ac.uk>
Message-ID: <4796FEEA-3D6A-43A4-A76D-2212E2F690B6@ebi.ac.uk>
On 7 Mar 2011, at 11:51, Jonathan Warren wrote:
> On 7 Mar 2011, at 11:19, Andy Jenkinson wrote:
>
>> On 7 Mar 2011, at 10:57, Jonathan Warren wrote:
>>
>>>
>>> My vote would ideally to change feature_by_id to return one feature and have the adjacent_feature as returning one feature. This in my opinion would mean these capabilities on servers do "exactly as they say on the tin" and would be easier to implement for data providers and are thus more likely to be implemented?
>>> If the feature_id capability as it stands is needed it could be changed to something more akin to what it means like feature_id_region but I would bet no one would bother to change it/use it?
>>>
>>> However the reality is that we are too late to change the old feature_by_id, but I don't think we need to make the same mistake twice by repeating it for adjacent_features?
>>
>> I disagree. I think the problems with feature-by-id are that a) the name of the capability implies singular, and b) the concept itself (i.e. getting a feature by its ID) is such a common operation that is otherwise missing in DAS. I don't think either of those apply to an "adjacent" capability unless you specifically choose to call it "adjacent-feature" as opposed to "adjacent-features". I honestly don't think a capability called "adjacent-features" with a query structure like "/das/features?adjacent=foo:1" implies singular, rather the opposite in fact. To me that query suggests "get me the features adjacent to foo:1". True that 2 features is plural which still leaves a "one feature either side" interpretation possible, but IMO certainly not implicit enough to stop anyone implementing it to actually read the specification/documentation. Add to that the fact that this is an entirely new behaviour that we have the chance to properly document and make it clear exactly what the server must do.
>>
>> So IMO we have a clear choice.
> I still think it's simpler to implement it for one feature either side and keep complexity in the client. Generally how many people stay wake after line 10 when reading the spec? :) Lets see if there are more votes...
It probably is simpler to implement (well, to implement with maximum efficiency) and I am not advocating one over the other, but IMO the implementation considerations are a separate part of our choice and are orthogonal to whether it's confusing for those implementing it and consequently whether we see divergence from the spec like we do with feature-by-id. As Gustavo says, he'd implement feature-by-id as one feature because that's what he thinks it means, not because it's difficult. I'd posit that it'd be a one line change for any server maintainer to fix theirs to implement it correctly (i.e. use the feature's start/end to resubmit the query), it's just that it'd be more complicated to do it in a single step from the beginning.
We should be under no illusions though that people are going to be able to implement this easily without reading the documentation carefully, no matter which option is chosen. In particular, I can foresee servers not interpreting the "type" filter appropriately, being likely to process the adjacent query then apply the type filter, which would be wrong. I have a feeling most sources implement the type filter as a passive "post filter" rather than an active one. I can tell you right now that it is going to be really quite difficult for me to implement "adjacent" correctly for the ASTD gene/transcript/exon sources, and I suspect the same will be true for retrofitting lots of other sources.
>>
>> As to feature-by-id, I know changing behaviour is potentially a very disruptive change, but I think we can potentially do this purely because servers don't tend to implement it correctly anyway. Clients can happily filter out any additional features returned by old servers, and if any clients are reliant on the server including all overlapping features then as far as I am concerned they are either a) targeting specific servers rather than DAS-wide and thus unaffected, or b) already broken :)
> So you agree feature-by_id should be changed if we have the stomach for it? - good and Gustavo too. Well done Andy - You have just agreed to write Spec 1.7 or 3??? ;) Your argument above can be used for leaving the spec as it is then as well - but ideally I agree and guess we can call it spec 1.61 assuming other people agree.
I already have a small list of changes for DAS 1.7 or whatever and think it's fine for that context. In any case, let's keep these two issues separate as Thomas says.
>>
>> I have to admit that the feature-by-id capability is one of the (many) things I loathe having to explain and would love to change it. Doing so would be consistent with what we were trying to do with 1.6 (i.e. rationalise existing use of the spec) but I chickened out really.
>>
>> Cheers,
>> Andy
>
> Jonathan Warren
> Senior Developer and DAS coordinator
> blog: http://biodasman.wordpress.com/
> jw12 at sanger.ac.uk
> Ext: 2314
> Telephone: 01223 492314
>
>
>
>
>
>
>
>
>
> --
> The Wellcome Trust Sanger Institute is operated by Genome ResearchLimited, a charity registered in England with number 1021457 and acompany registered in England with number 2742969, whose registeredoffice is 215 Euston Road, London, NW1 2BE.
From thomas.a.down at gmail.com Mon Mar 7 07:57:37 2011
From: thomas.a.down at gmail.com (Thomas Down)
Date: Mon, 7 Mar 2011 12:57:37 +0000
Subject: [DAS] Adjacent feature extension
In-Reply-To: <4796FEEA-3D6A-43A4-A76D-2212E2F690B6@ebi.ac.uk>
References:
<4CB1186B-D5F0-4750-A297-88538BB196B5@ebi.ac.uk>
<4796FEEA-3D6A-43A4-A76D-2212E2F690B6@ebi.ac.uk>
Message-ID:
On Mon, Mar 7, 2011 at 12:43 PM, Andy Jenkinson wrote:
> We should be under no illusions though that people are going to be able to
> implement this easily without reading the documentation carefully, no matter
> which option is chosen. In particular, I can foresee servers not
> interpreting the "type" filter appropriately, being likely to process the
> adjacent query then apply the type filter, which would be wrong. I have a
> feeling most sources implement the type filter as a passive "post filter"
> rather than an active one. I can tell you right now that it is going to be
> really quite difficult for me to implement "adjacent" correctly for the ASTD
> gene/transcript/exon sources, and I suspect the same will be true for
> retrofitting lots of other sources.
>
I appreciate this is tough. However, given the existence of UCSC-like
"super-sources" containing basically unrelated types, with potentially quite
different densities and dispositions, I find it hard to see how this is
going to be useful without some kind of interaction with the types filter
along the lines of what I've proposed.
Thomas.
From andy.jenkinson at ebi.ac.uk Mon Mar 7 08:11:46 2011
From: andy.jenkinson at ebi.ac.uk (Andy Jenkinson)
Date: Mon, 7 Mar 2011 13:11:46 +0000
Subject: [DAS] Adjacent feature extension
In-Reply-To:
References:
<4CB1186B-D5F0-4750-A297-88538BB196B5@ebi.ac.uk>
<4796FEEA-3D6A-43A4-A76D-2212E2F690B6@ebi.ac.uk>
Message-ID: <31F4DDA4-EFE4-467F-A2BE-A65CACD8BDD4@ebi.ac.uk>
On 7 Mar 2011, at 12:57, Thomas Down wrote:
> On Mon, Mar 7, 2011 at 12:43 PM, Andy Jenkinson wrote:
> We should be under no illusions though that people are going to be able to implement this easily without reading the documentation carefully, no matter which option is chosen. In particular, I can foresee servers not interpreting the "type" filter appropriately, being likely to process the adjacent query then apply the type filter, which would be wrong. I have a feeling most sources implement the type filter as a passive "post filter" rather than an active one. I can tell you right now that it is going to be really quite difficult for me to implement "adjacent" correctly for the ASTD gene/transcript/exon sources, and I suspect the same will be true for retrofitting lots of other sources.
>
> I appreciate this is tough. However, given the existence of UCSC-like "super-sources" containing basically unrelated types, with potentially quite different densities and dispositions, I find it hard to see how this is going to be useful without some kind of interaction with the types filter along the lines of what I've proposed.
Absolutely.
From jw12 at sanger.ac.uk Mon Mar 7 09:11:20 2011
From: jw12 at sanger.ac.uk (Jonathan Warren)
Date: Mon, 7 Mar 2011 14:11:20 +0000
Subject: [DAS] Adjacent feature extension
In-Reply-To: <4796FEEA-3D6A-43A4-A76D-2212E2F690B6@ebi.ac.uk>
References:
<4CB1186B-D5F0-4750-A297-88538BB196B5@ebi.ac.uk>
<4796FEEA-3D6A-43A4-A76D-2212E2F690B6@ebi.ac.uk>
Message-ID:
On 7 Mar 2011, at 12:43, Andy Jenkinson wrote:
> On 7 Mar 2011, at 11:51, Jonathan Warren wrote:
>
>> On 7 Mar 2011, at 11:19, Andy Jenkinson wrote:
>>
>>> On 7 Mar 2011, at 10:57, Jonathan Warren wrote:
>>>
>>>>
>>>> My vote would ideally to change feature_by_id to return one
>>>> feature and have the adjacent_feature as returning one feature.
>>>> This in my opinion would mean these capabilities on servers do
>>>> "exactly as they say on the tin" and would be easier to implement
>>>> for data providers and are thus more likely to be implemented?
>>>> If the feature_id capability as it stands is needed it could be
>>>> changed to something more akin to what it means like
>>>> feature_id_region but I would bet no one would bother to change
>>>> it/use it?
>>>>
>>>> However the reality is that we are too late to change the old
>>>> feature_by_id, but I don't think we need to make the same mistake
>>>> twice by repeating it for adjacent_features?
>>>
>>> I disagree. I think the problems with feature-by-id are that a)
>>> the name of the capability implies singular, and b) the concept
>>> itself (i.e. getting a feature by its ID) is such a common
>>> operation that is otherwise missing in DAS. I don't think either
>>> of those apply to an "adjacent" capability unless you specifically
>>> choose to call it "adjacent-feature" as opposed to "adjacent-
>>> features". I honestly don't think a capability called "adjacent-
>>> features" with a query structure like "/das/features?adjacent=foo:
>>> 1" implies singular, rather the opposite in fact. To me that query
>>> suggests "get me the features adjacent to foo:1". True that 2
>>> features is plural which still leaves a "one feature either side"
>>> interpretation possible, but IMO certainly not implicit enough to
>>> stop anyone implementing it to actually read the specification/
>>> documentation. Add to that the fact that this is an entirely new
>>> behaviour that we have the chance to properly document and make it
>>> clear exactly what the server must do.
>>>
>>> So IMO we have a clear choice.
>> I still think it's simpler to implement it for one feature either
>> side and keep complexity in the client. Generally how many people
>> stay wake after line 10 when reading the spec? :) Lets see if there
>> are more votes...
>
> It probably is simpler to implement (well, to implement with maximum
> efficiency) and I am not advocating one over the other, but IMO the
> implementation considerations are a separate part of our choice and
> are orthogonal to whether it's confusing for those implementing it
> and consequently whether we see divergence from the spec like we do
> with feature-by-id. As Gustavo says, he'd implement feature-by-id as
> one feature because that's what he thinks it means, not because it's
> difficult. I'd posit that it'd be a one line change for any server
> maintainer to fix theirs to implement it correctly (i.e. use the
> feature's start/end to resubmit the query), it's just that it'd be
> more complicated to do it in a single step from the beginning.
>
> We should be under no illusions though that people are going to be
> able to implement this easily without reading the documentation
> carefully, no matter which option is chosen.
Good template methods and or examples in tutorials examples will
encourage use of this command.
> In particular, I can foresee servers not interpreting the "type"
> filter appropriately, being likely to process the adjacent query
> then apply the type filter, which would be wrong. I have a feeling
> most sources implement the type filter as a passive "post filter"
> rather than an active one. I can tell you right now that it is going
> to be really quite difficult for me to implement "adjacent"
> correctly for the ASTD gene/transcript/exon sources, and I suspect
> the same will be true for retrofitting lots of other sources.
This is an optional capability though right?
>
>>>
>>> As to feature-by-id, I know changing behaviour is potentially a
>>> very disruptive change, but I think we can potentially do this
>>> purely because servers don't tend to implement it correctly
>>> anyway. Clients can happily filter out any additional features
>>> returned by old servers, and if any clients are reliant on the
>>> server including all overlapping features then as far as I am
>>> concerned they are either a) targeting specific servers rather
>>> than DAS-wide and thus unaffected, or b) already broken :)
>> So you agree feature-by_id should be changed if we have the stomach
>> for it? - good and Gustavo too. Well done Andy - You have just
>> agreed to write Spec 1.7 or 3??? ;) Your argument above can be used
>> for leaving the spec as it is then as well - but ideally I agree
>> and guess we can call it spec 1.61 assuming other people agree.
>
> I already have a small list of changes for DAS 1.7 or whatever and
> think it's fine for that context. In any case, let's keep these two
> issues separate as Thomas says.
I was really hoping not to do another major spec revision for at least
3 years and to focus on extensions giving new capabilities- otherwise
for the core capabilities everyone is always playing catch up! This
maybe something to discuss at some point soon.
>
>>>
>>> I have to admit that the feature-by-id capability is one of the
>>> (many) things I loathe having to explain and would love to change
>>> it. Doing so would be consistent with what we were trying to do
>>> with 1.6 (i.e. rationalise existing use of the spec) but I
>>> chickened out really.
>>>
>>> Cheers,
>>> Andy
>>
>> Jonathan Warren
>> Senior Developer and DAS coordinator
>> blog: http://biodasman.wordpress.com/
>> jw12 at sanger.ac.uk
>> Ext: 2314
>> Telephone: 01223 492314
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>> The Wellcome Trust Sanger Institute is operated by Genome
>> ResearchLimited, a charity registered in England with number
>> 1021457 and acompany registered in England with number 2742969,
>> whose registeredoffice is 215 Euston Road, London, NW1 2BE.
>
Jonathan Warren
Senior Developer and DAS coordinator
blog: http://biodasman.wordpress.com/
jw12 at sanger.ac.uk
Ext: 2314
Telephone: 01223 492314
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
From thomas.a.down at gmail.com Mon Mar 7 09:27:12 2011
From: thomas.a.down at gmail.com (Thomas Down)
Date: Mon, 7 Mar 2011 14:27:12 +0000
Subject: [DAS] Adjacent feature extension
In-Reply-To:
References:
<4CB1186B-D5F0-4750-A297-88538BB196B5@ebi.ac.uk>
<4796FEEA-3D6A-43A4-A76D-2212E2F690B6@ebi.ac.uk>
Message-ID:
On Mon, Mar 7, 2011 at 2:11 PM, Jonathan Warren wrote:
> In particular, I can foresee servers not interpreting the "type" filter
>> appropriately, being likely to process the adjacent query then apply the
>> type filter, which would be wrong. I have a feeling most sources implement
>> the type filter as a passive "post filter" rather than an active one. I can
>> tell you right now that it is going to be really quite difficult for me to
>> implement "adjacent" correctly for the ASTD gene/transcript/exon sources,
>> and I suspect the same will be true for retrofitting lots of other sources.
>>
> This is an optional capability though right?
>
That is definitely the intent of my proposal.
Even if this eventually gets rolled into the spec., my suggestion would be
to keep it optional (negotiated via a CAPABILITY).
Thomas.
From thomas.a.down at gmail.com Mon Mar 7 09:41:31 2011
From: thomas.a.down at gmail.com (Thomas Down)
Date: Mon, 7 Mar 2011 14:41:31 +0000
Subject: [DAS] Locations between bases
Message-ID:
The day has come when I find I need to be able to distinguish between a
feature which actually *covers* one or a small number of bases (e.g. a SNP)
and a feature which refers to a point between two bases (e.g. an
insertion). Have any other DAS folk dealt with this before, and if so how?
One possible approach would be something like:
30000000
29999999
...as a way to reference the position between the 29999999th and 30000000th
bases in a sequence... but is this going to break lots of existing client
code?
(I also rather dislike it because there's currently sanity-checking code in
Dazzle to stop you doing stuff like this...)
Any better ideas?
Thomas.
PS. Yes, I'm aware this is a solved problem in DAS/2. Afraid I need a DAS/1
solution, though.
From andy.jenkinson at ebi.ac.uk Mon Mar 7 10:04:32 2011
From: andy.jenkinson at ebi.ac.uk (Andy Jenkinson)
Date: Mon, 7 Mar 2011 15:04:32 +0000
Subject: [DAS] Adjacent feature extension
In-Reply-To: <6E8B83BF-8B65-41F8-B697-696431D5176E@sanger.ac.uk>
References:
<4CB1186B-D5F0-4750-A297-88538BB196B5@ebi.ac.uk>
<6E8B83BF-8B65-41F8-B697-696431D5176E@sanger.ac.uk>
Message-ID:
On 7 Mar 2011, at 12:16, Jonathan Warren wrote:
> I'd say if we don't have any more objections in the next couple of days then go with your proposal as is? I'll then put support into the registry this week if that is the case. If you could also then copy the proposal from here https://github.com/dasmoth/dalliance/wiki/AdjacentFeatures to the extensions page here:
> http://www.biodas.org/wiki/DAS1.6E#Adjacent_Feature_filter noting in large letters that it was agreed by the community on such a such a date?
I think there is a lot left to be clarified so adopting it "as is" is a no go for me. In particular, take a look at this diagram and see if you can work out what will be returned with "adjacent" queries for either side of the viewing area, and do they make sense for what the client is trying to achieve?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: DAS-Adjacent.png
Type: image/png
Size: 40385 bytes
Desc: not available
URL:
-------------- next part --------------
The client has "seen" gene 2 and all its parts.
If the client asks for features adjacent to the left/right sides of the viewing area, what should the server return?
To the left: gene 1, transcript 1, exon 2, or SNP 1? Does it matter? Should gene 1 and all its parts be returned because that is what happens in a segment query? If not, is this confusing for the spec?
To the right: gene 2, transcript 2, exon 4, transcript 3, exon 5, or SNP 2?
If no special arrangements are made for excluding overlapping features (thus either gene 2, transcript 2 or exon 4 are returned above), what position should the client submit instead in its overlap query to get SNP 2 and SNP 3? (Hint: it's impossible to get SNP 2).
What if the genes were nonpositional features?
None of the above are unresolvable problems, the simplest way it to say that overlapping features should be returned, and that clients should not try to jump beyond them. It means you don't really get a "next feature" capability in the same way a user probably intends (i.e. "next gene"), but does make it impossible to miss transcript 3 and SNP 2 in the above diagram. It still needs to be defined which feature will be returned if multiple have the same position that case (ideally we want the one that extends the furthers in the adjacent direction), but that can be complicated by things like nonpositional parent features etc.
The alternative is to exclude overlapping features, but that raises the question of whether parts are considered overlapping if their parents are. Implementing "next gene" means excluding all of gene 2's transcripts and exons (thus returning SNP 2), but is this even what the user meant? Maybe they meant "next transcript".
So not simple. But the main questions are:
1. should overlapping features be excluded in overlap requests
2. if so, should non-overlapping features with overlapping parents/parts be excluded?
3. separate from 1 and 2, should the nearest feature's parents and parts also be returned?
From andy.jenkinson at ebi.ac.uk Mon Mar 7 10:12:32 2011
From: andy.jenkinson at ebi.ac.uk (Andy Jenkinson)
Date: Mon, 7 Mar 2011 15:12:32 +0000
Subject: [DAS] Adjacent feature extension
In-Reply-To:
References:
<4CB1186B-D5F0-4750-A297-88538BB196B5@ebi.ac.uk>
<4796FEEA-3D6A-43A4-A76D-2212E2F690B6@ebi.ac.uk>
Message-ID:
On 7 Mar 2011, at 14:11, Jonathan Warren wrote:
>
> On 7 Mar 2011, at 12:43, Andy Jenkinson wrote:
>
>> In particular, I can foresee servers not interpreting the "type" filter appropriately, being likely to process the adjacent query then apply the type filter, which would be wrong. I have a feeling most sources implement the type filter as a passive "post filter" rather than an active one. I can tell you right now that it is going to be really quite difficult for me to implement "adjacent" correctly for the ASTD gene/transcript/exon sources, and I suspect the same will be true for retrofitting lots of other sources.
> This is an optional capability though right?
Of course, I'm just saying it's difficult for servers to implement so we should expect to see bad implementations exactly as we do with feature-by-id.
>>
>>>>
>>>> As to feature-by-id, I know changing behaviour is potentially a very disruptive change, but I think we can potentially do this purely because servers don't tend to implement it correctly anyway. Clients can happily filter out any additional features returned by old servers, and if any clients are reliant on the server including all overlapping features then as far as I am concerned they are either a) targeting specific servers rather than DAS-wide and thus unaffected, or b) already broken :)
>>> So you agree feature-by_id should be changed if we have the stomach for it? - good and Gustavo too. Well done Andy - You have just agreed to write Spec 1.7 or 3??? ;) Your argument above can be used for leaving the spec as it is then as well - but ideally I agree and guess we can call it spec 1.61 assuming other people agree.
>>
>> I already have a small list of changes for DAS 1.7 or whatever and think it's fine for that context. In any case, let's keep these two issues separate as Thomas says.
>
> I was really hoping not to do another major spec revision for at least 3 years and to focus on extensions giving new capabilities- otherwise for the core capabilities everyone is always playing catch up! This maybe something to discuss at some point soon.
>>
>>>>
>>>> I have to admit that the feature-by-id capability is one of the (many) things I loathe having to explain and would love to change it. Doing so would be consistent with what we were trying to do with 1.6 (i.e. rationalise existing use of the spec) but I chickened out really.
>>>>
>>>> Cheers,
>>>> Andy
>>>
>>> Jonathan Warren
>>> Senior Developer and DAS coordinator
>>> blog: http://biodasman.wordpress.com/
>>> jw12 at sanger.ac.uk
>>> Ext: 2314
>>> Telephone: 01223 492314
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>> The Wellcome Trust Sanger Institute is operated by Genome ResearchLimited, a charity registered in England with number 1021457 and acompany registered in England with number 2742969, whose registeredoffice is 215 Euston Road, London, NW1 2BE.
>>
>
> Jonathan Warren
> Senior Developer and DAS coordinator
> blog: http://biodasman.wordpress.com/
> jw12 at sanger.ac.uk
> Ext: 2314
> Telephone: 01223 492314
>
>
>
>
>
>
>
>
>
> --
> The Wellcome Trust Sanger Institute is operated by Genome ResearchLimited, a charity registered in England with number 1021457 and acompany registered in England with number 2742969, whose registeredoffice is 215 Euston Road, London, NW1 2BE.
From andy.jenkinson at ebi.ac.uk Mon Mar 7 10:21:59 2011
From: andy.jenkinson at ebi.ac.uk (Andy Jenkinson)
Date: Mon, 7 Mar 2011 15:21:59 +0000
Subject: [DAS] Locations between bases
In-Reply-To:
References:
Message-ID:
Oh Thomas, why do you do it to us?
May I ask what the specific application is? Depending on what you need, an insertion can be modelled with a feature covering both bases with a specific glyph (or type).
I think what you propose is a common approach, but it is very likely going to break things for some clients (or at least give undefined results). Some additional element would be more compatible.
On 7 Mar 2011, at 14:41, Thomas Down wrote:
> The day has come when I find I need to be able to distinguish between a
> feature which actually *covers* one or a small number of bases (e.g. a SNP)
> and a feature which refers to a point between two bases (e.g. an
> insertion). Have any other DAS folk dealt with this before, and if so how?
> One possible approach would be something like:
>
> 30000000
> 29999999
>
> ...as a way to reference the position between the 29999999th and 30000000th
> bases in a sequence... but is this going to break lots of existing client
> code?
>
> (I also rather dislike it because there's currently sanity-checking code in
> Dazzle to stop you doing stuff like this...)
>
> Any better ideas?
>
> Thomas.
>
> PS. Yes, I'm aware this is a solved problem in DAS/2. Afraid I need a DAS/1
> solution, though.
> _______________________________________________
> DAS mailing list
> DAS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/das
From birney at ebi.ac.uk Mon Mar 7 10:26:59 2011
From: birney at ebi.ac.uk (Ewan Birney)
Date: Mon, 7 Mar 2011 15:26:59 +0000
Subject: [DAS] Locations between bases
In-Reply-To:
References:
Message-ID:
I also note that this is stretching DAS's mindset away from
"DAS is to support visualisation of data on clients for human
users to read"
to
"DAS semantically represents things accurately for programmatic
decisions"
I think one is better off keeping these things separate.
On 7 Mar 2011, at 15:21, Andy Jenkinson wrote:
> Oh Thomas, why do you do it to us?
>
> May I ask what the specific application is? Depending on what you
> need, an insertion can be modelled with a feature covering both
> bases with a specific glyph (or type).
>
> I think what you propose is a common approach, but it is very likely
> going to break things for some clients (or at least give undefined
> results). Some additional element would be more compatible.
>
> On 7 Mar 2011, at 14:41, Thomas Down wrote:
>
>> The day has come when I find I need to be able to distinguish
>> between a
>> feature which actually *covers* one or a small number of bases
>> (e.g. a SNP)
>> and a feature which refers to a point between two bases (e.g. an
>> insertion). Have any other DAS folk dealt with this before, and if
>> so how?
>> One possible approach would be something like:
>>
>> 30000000
>> 29999999
>>
>> ...as a way to reference the position between the 29999999th and
>> 30000000th
>> bases in a sequence... but is this going to break lots of existing
>> client
>> code?
>>
>> (I also rather dislike it because there's currently sanity-checking
>> code in
>> Dazzle to stop you doing stuff like this...)
>>
>> Any better ideas?
>>
>> Thomas.
>>
>> PS. Yes, I'm aware this is a solved problem in DAS/2. Afraid I
>> need a DAS/1
>> solution, though.
>> _______________________________________________
>> DAS mailing list
>> DAS at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/das
>
>
> _______________________________________________
> DAS mailing list
> DAS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/das
From jw12 at sanger.ac.uk Mon Mar 7 10:49:12 2011
From: jw12 at sanger.ac.uk (Jonathan Warren)
Date: Mon, 7 Mar 2011 15:49:12 +0000
Subject: [DAS] Adjacent feature extension
In-Reply-To:
References:
<4CB1186B-D5F0-4750-A297-88538BB196B5@ebi.ac.uk>
<6E8B83BF-8B65-41F8-B697-696431D5176E@sanger.ac.uk>
Message-ID: <2869357C-159E-4D2D-A169-B91D63E31680@sanger.ac.uk>
On 7 Mar 2011, at 15:04, Andy Jenkinson wrote:
> On 7 Mar 2011, at 12:16, Jonathan Warren wrote:
>
>> I'd say if we don't have any more objections in the next couple of
>> days then go with your proposal as is? I'll then put support into
>> the registry this week if that is the case. If you could also then
>> copy the proposal from here https://github.com/dasmoth/dalliance/wiki/AdjacentFeatures
>> to the extensions page here:
>> http://www.biodas.org/wiki/DAS1.6E#Adjacent_Feature_filter noting
>> in large letters that it was agreed by the community on such a such
>> a date?
>
> I think there is a lot left to be clarified so adopting it "as is"
> is a no go for me. In particular, take a look at this diagram and
> see if you can work out what will be returned with "adjacent"
> queries for either side of the viewing area, and do they make sense
> for what the client is trying to achieve?
>
>
> The client has "seen" gene 2 and all its parts.
>
> If the client asks for features adjacent to the left/right sides of
> the viewing area, what should the server return?
I don't think it makes sense to ask for a next right in this case as
there are features here already. This is for sparse data sources so
it's ok just to return whats there if someone specifically wants to
hit the next feature button or a client can blank the next right
button out. It's up to the client.
Next left should return SNP1 if asked for an adjacent request.... or
genes and constituents if filtered on gene.
If you take the intention of this as finding features where data is
sparse then I don't think there are big issues.
Part of the point of the extensions phase is to try these things out
with examples and refine the specs. To leave acceptance of this will
be a big mistake in my view.
> To the left: gene 1, transcript 1, exon 2, or SNP 1? Does it matter?
> Should gene 1 and all its parts be returned because that is what
> happens in a segment query? If not, is this confusing for the spec?
> To the right: gene 2, transcript 2, exon 4, transcript 3, exon 5, or
> SNP 2?
>
> If no special arrangements are made for excluding overlapping
> features (thus either gene 2, transcript 2 or exon 4 are returned
> above), what position should the client submit instead in its
> overlap query to get SNP 2 and SNP 3? (Hint: it's impossible to get
> SNP 2).
>
> What if the genes were nonpositional features?
>
>
> None of the above are unresolvable problems, the simplest way it to
> say that overlapping features should be returned, and that clients
> should not try to jump beyond them. It means you don't really get a
> "next feature" capability in the same way a user probably intends
> (i.e. "next gene"), but does make it impossible to miss transcript 3
> and SNP 2 in the above diagram. It still needs to be defined which
> feature will be returned if multiple have the same position that
> case (ideally we want the one that extends the furthers in the
> adjacent direction), but that can be complicated by things like
> nonpositional parent features etc.
>
> The alternative is to exclude overlapping features, but that raises
> the question of whether parts are considered overlapping if their
> parents are. Implementing "next gene" means excluding all of gene
> 2's transcripts and exons (thus returning SNP 2), but is this even
> what the user meant? Maybe they meant "next transcript".
>
> So not simple. But the main questions are:
> 1. should overlapping features be excluded in overlap requests
> 2. if so, should non-overlapping features with overlapping parents/
> parts be excluded?
> 3. separate from 1 and 2, should the nearest feature's parents and
> parts also be returned?
Jonathan Warren
Senior Developer and DAS coordinator
blog: http://biodasman.wordpress.com/
jw12 at sanger.ac.uk
Ext: 2314
Telephone: 01223 492314
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
From thomas.a.down at gmail.com Mon Mar 7 11:03:50 2011
From: thomas.a.down at gmail.com (Thomas Down)
Date: Mon, 7 Mar 2011 16:03:50 +0000
Subject: [DAS] Locations between bases
In-Reply-To:
References:
Message-ID:
Well, that's a separate debate, and I don't really want to open it at the
moment.
However, right now I *am* thinking quite specifically of visualization
issues. Once you're zoomed in to base-pair resolution, there is a
meaningful distinction between "on a base" and "between a base".
Andy: what I'm trying to model is insertion sites. I guess that's the main
use for this kind of thing.
Thomas.
On Mon, Mar 7, 2011 at 3:26 PM, Ewan Birney wrote:
>
> I also note that this is stretching DAS's mindset away from
>
> "DAS is to support visualisation of data on clients for human users to
> read"
>
> to
>
> "DAS semantically represents things accurately for programmatic
> decisions"
>
>
> I think one is better off keeping these things separate.
>
>
>
> On 7 Mar 2011, at 15:21, Andy Jenkinson wrote:
>
> Oh Thomas, why do you do it to us?
>>
>> May I ask what the specific application is? Depending on what you need, an
>> insertion can be modelled with a feature covering both bases with a specific
>> glyph (or type).
>>
>> I think what you propose is a common approach, but it is very likely going
>> to break things for some clients (or at least give undefined results). Some
>> additional element would be more compatible.
>>
>> On 7 Mar 2011, at 14:41, Thomas Down wrote:
>>
>> The day has come when I find I need to be able to distinguish between a
>>> feature which actually *covers* one or a small number of bases (e.g. a
>>> SNP)
>>> and a feature which refers to a point between two bases (e.g. an
>>> insertion). Have any other DAS folk dealt with this before, and if so
>>> how?
>>> One possible approach would be something like:
>>>
>>> 30000000
>>> 29999999
>>>
>>> ...as a way to reference the position between the 29999999th and
>>> 30000000th
>>> bases in a sequence... but is this going to break lots of existing client
>>> code?
>>>
>>> (I also rather dislike it because there's currently sanity-checking code
>>> in
>>> Dazzle to stop you doing stuff like this...)
>>>
>>> Any better ideas?
>>>
>>> Thomas.
>>>
>>> PS. Yes, I'm aware this is a solved problem in DAS/2. Afraid I need a
>>> DAS/1
>>> solution, though.
>>> _______________________________________________
>>> DAS mailing list
>>> DAS at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/das
>>>
>>
>>
>> _______________________________________________
>> DAS mailing list
>> DAS at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/das
>>
>
>
From birney at ebi.ac.uk Mon Mar 7 11:07:15 2011
From: birney at ebi.ac.uk (Ewan Birney)
Date: Mon, 7 Mar 2011 16:07:15 +0000
Subject: [DAS] Locations between bases
In-Reply-To:
References:
Message-ID: <3D11AC15-23B4-4AF1-AF90-136AFAF1540A@ebi.ac.uk>
I'd say this is start=end, and a convention that this indicates the
point
to the left or the right with a glyph. With this converntion for the
edge case one either
has to allow the 0 base (right based insertion convention, insertion
at the very
start) or end+1 position (left based insertion convention, insertion
at the
every end).
I know, this triggers doing things on inter-base coordinates, but
that's just
a big, big change.
On 7 Mar 2011, at 16:03, Thomas Down wrote:
> Well, that's a separate debate, and I don't really want to open it
> at the moment.
>
> However, right now I am thinking quite specifically of visualization
> issues. Once you're zoomed in to base-pair resolution, there is a
> meaningful distinction between "on a base" and "between a base".
>
> Andy: what I'm trying to model is insertion sites. I guess that's
> the main use for this kind of thing.
>
> Thomas.
>
>
>
> On Mon, Mar 7, 2011 at 3:26 PM, Ewan Birney wrote:
>
> I also note that this is stretching DAS's mindset away from
>
> "DAS is to support visualisation of data on clients for human
> users to read"
>
> to
>
> "DAS semantically represents things accurately for programmatic
> decisions"
>
>
> I think one is better off keeping these things separate.
>
>
>
> On 7 Mar 2011, at 15:21, Andy Jenkinson wrote:
>
> Oh Thomas, why do you do it to us?
>
> May I ask what the specific application is? Depending on what you
> need, an insertion can be modelled with a feature covering both
> bases with a specific glyph (or type).
>
> I think what you propose is a common approach, but it is very likely
> going to break things for some clients (or at least give undefined
> results). Some additional element would be more compatible.
>
> On 7 Mar 2011, at 14:41, Thomas Down wrote:
>
> The day has come when I find I need to be able to distinguish
> between a
> feature which actually *covers* one or a small number of bases (e.g.
> a SNP)
> and a feature which refers to a point between two bases (e.g. an
> insertion). Have any other DAS folk dealt with this before, and if
> so how?
> One possible approach would be something like:
>
> 30000000
> 29999999
>
> ...as a way to reference the position between the 29999999th and
> 30000000th
> bases in a sequence... but is this going to break lots of existing
> client
> code?
>
> (I also rather dislike it because there's currently sanity-checking
> code in
> Dazzle to stop you doing stuff like this...)
>
> Any better ideas?
>
> Thomas.
>
> PS. Yes, I'm aware this is a solved problem in DAS/2. Afraid I need
> a DAS/1
> solution, though.
> _______________________________________________
> DAS mailing list
> DAS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/das
>
>
> _______________________________________________
> DAS mailing list
> DAS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/das
>
>
From thomas.a.down at gmail.com Mon Mar 7 11:10:35 2011
From: thomas.a.down at gmail.com (Thomas Down)
Date: Mon, 7 Mar 2011 16:10:35 +0000
Subject: [DAS] Locations between bases
In-Reply-To: <3D11AC15-23B4-4AF1-AF90-136AFAF1540A@ebi.ac.uk>
References:
<3D11AC15-23B4-4AF1-AF90-136AFAF1540A@ebi.ac.uk>
Message-ID:
On Mon, Mar 7, 2011 at 4:07 PM, Ewan Birney wrote:
>
> I'd say this is start=end, and a convention that this indicates the point
> to the left or the right with a glyph. With this converntion for the edge
> case one either
> has to allow the 0 base (right based insertion convention, insertion at the
> very
> start) or end+1 position (left based insertion convention, insertion at the
> every end).
>
Thanks Ewan (+Ed and Andy), that sounds best. I'm going to need a
non-standard stylesheet glyph to do what I'm after as well, so perhaps it's
best to attach the "interbaseness" to that, rather than the feature itself.
> I know, this triggers doing things on inter-base coordinates, but that's
> just
> a big, big change.
>
Yep, that's not really an option.
Thomas.
From andy.jenkinson at ebi.ac.uk Mon Mar 7 11:19:03 2011
From: andy.jenkinson at ebi.ac.uk (Andy Jenkinson)
Date: Mon, 7 Mar 2011 16:19:03 +0000
Subject: [DAS] Adjacent feature extension
In-Reply-To: <2869357C-159E-4D2D-A169-B91D63E31680@sanger.ac.uk>
References:
<4CB1186B-D5F0-4750-A297-88538BB196B5@ebi.ac.uk>
<6E8B83BF-8B65-41F8-B697-696431D5176E@sanger.ac.uk>
<2869357C-159E-4D2D-A169-B91D63E31680@sanger.ac.uk>
Message-ID: <3F20D5FA-A369-4045-A865-728D2D7312ED@ebi.ac.uk>
On 7 Mar 2011, at 15:49, Jonathan Warren wrote:
>
> On 7 Mar 2011, at 15:04, Andy Jenkinson wrote:
>
>> On 7 Mar 2011, at 12:16, Jonathan Warren wrote:
>>
>>> I'd say if we don't have any more objections in the next couple of days then go with your proposal as is? I'll then put support into the registry this week if that is the case. If you could also then copy the proposal from here https://github.com/dasmoth/dalliance/wiki/AdjacentFeatures to the extensions page here:
>>> http://www.biodas.org/wiki/DAS1.6E#Adjacent_Feature_filter noting in large letters that it was agreed by the community on such a such a date?
>>
>> I think there is a lot left to be clarified so adopting it "as is" is a no go for me. In particular, take a look at this diagram and see if you can work out what will be returned with "adjacent" queries for either side of the viewing area, and do they make sense for what the client is trying to achieve?
>>
>>
>> The client has "seen" gene 2 and all its parts.
>>
>> If the client asks for features adjacent to the left/right sides of the viewing area, what should the server return?
> I don't think it makes sense to ask for a next right in this case as there are features here already. This is for sparse data sources so it's ok just to return whats there if someone specifically wants to hit the next feature button or a client can blank the next right button out. It's up to the client.
Agree. But what I don't want to see is clients implementing some weird hybrid where they offer a "next right" button that bypasses SNP 2. If we expect clients to behave in a certain way, we should say so.
> Next left should return SNP1 if asked for an adjacent request.... or genes and constituents if filtered on gene.
Why SNP 1, and not any of the others at the same position? How is the server supposed to decide? Does it matter? How would this be worded in the spec?
> If you take the intention of this as finding features where data is sparse then I don't think there are big issues.
These aren't big issues (taken in that context), but I absolutely want to make sure we don't make the mistakes of the past by leaving ambiguity in the spec - whether it's an extension or otherwise. It's all very well us knowing what we designed it for, but if it isn't written down then it's going to cause problems. For the avoidance of doubt, I am very keen to get this done, but I see no sense in doing it in a way that we're not going to regret later (of which there are already countless examples).
> Part of the point of the extensions phase is to try these things out with examples and refine the specs. To leave acceptance of this will be a big mistake in my view.
I'm not sure what you mean by "leave acceptance". I'm trying to work through these things, not put blocks in the way. I am trying right now to implement it and these are the things I have immediately come up against so I need to get input right now on how to do it. Or to put another way, I can't create my example without refining the specs.
From jw12 at sanger.ac.uk Mon Mar 7 11:31:13 2011
From: jw12 at sanger.ac.uk (Jonathan Warren)
Date: Mon, 7 Mar 2011 16:31:13 +0000
Subject: [DAS] Adjacent feature extension
In-Reply-To: <3F20D5FA-A369-4045-A865-728D2D7312ED@ebi.ac.uk>
References:
<4CB1186B-D5F0-4750-A297-88538BB196B5@ebi.ac.uk>
<6E8B83BF-8B65-41F8-B697-696431D5176E@sanger.ac.uk>
<2869357C-159E-4D2D-A169-B91D63E31680@sanger.ac.uk>
<3F20D5FA-A369-4045-A865-728D2D7312ED@ebi.ac.uk>
Message-ID: <15C814D8-14CD-487B-9AD9-2D5ECAABE6E0@sanger.ac.uk>
On 7 Mar 2011, at 16:19, Andy Jenkinson wrote:
> On 7 Mar 2011, at 15:49, Jonathan Warren wrote:
>
>>
>> On 7 Mar 2011, at 15:04, Andy Jenkinson wrote:
>>
>>> On 7 Mar 2011, at 12:16, Jonathan Warren wrote:
>>>
>>>> I'd say if we don't have any more objections in the next couple
>>>> of days then go with your proposal as is? I'll then put support
>>>> into the registry this week if that is the case. If you could
>>>> also then copy the proposal from here https://github.com/dasmoth/dalliance/wiki/AdjacentFeatures
>>>> to the extensions page here:
>>>> http://www.biodas.org/wiki/DAS1.6E#Adjacent_Feature_filter noting
>>>> in large letters that it was agreed by the community on such a
>>>> such a date?
>>>
>>> I think there is a lot left to be clarified so adopting it "as is"
>>> is a no go for me. In particular, take a look at this diagram and
>>> see if you can work out what will be returned with "adjacent"
>>> queries for either side of the viewing area, and do they make
>>> sense for what the client is trying to achieve?
>>>
>>>
>>> The client has "seen" gene 2 and all its parts.
>>>
>>> If the client asks for features adjacent to the left/right sides
>>> of the viewing area, what should the server return?
>> I don't think it makes sense to ask for a next right in this case
>> as there are features here already. This is for sparse data sources
>> so it's ok just to return whats there if someone specifically wants
>> to hit the next feature button or a client can blank the next right
>> button out. It's up to the client.
>
> Agree. But what I don't want to see is clients implementing some
> weird hybrid where they offer a "next right" button that bypasses
> SNP 2. If we expect clients to behave in a certain way, we should
> say so.
>
>> Next left should return SNP1 if asked for an adjacent request....
>> or genes and constituents if filtered on gene.
>
> Why SNP 1, and not any of the others at the same position? How is
> the server supposed to decide? Does it matter? How would this be
> worded in the spec?
It doesn't matter unless filtered.
>
>> If you take the intention of this as finding features where data is
>> sparse then I don't think there are big issues.
>
> These aren't big issues (taken in that context), but I absolutely
> want to make sure we don't make the mistakes of the past by leaving
> ambiguity in the spec - whether it's an extension or otherwise. It's
> all very well us knowing what we designed it for, but if it isn't
> written down then it's going to cause problems. For the avoidance of
> doubt, I am very keen to get this done, but I see no sense in doing
> it in a way that we're not going to regret later (of which there are
> already countless examples).
>
>> Part of the point of the extensions phase is to try these things
>> out with examples and refine the specs. To leave acceptance of this
>> will be a big mistake in my view.
>
> I'm not sure what you mean by "leave acceptance". I'm trying to work
> through these things, not put blocks in the way. I am trying right
> now to implement it and these are the things I have immediately come
> up against so I need to get input right now on how to do it.
Cool! I'll shut up then.
> Or to put another way, I can't create my example without refining
> the specs.
Jonathan Warren
Senior Developer and DAS coordinator
blog: http://biodasman.wordpress.com/
jw12 at sanger.ac.uk
Ext: 2314
Telephone: 01223 492314
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
From thomas.a.down at gmail.com Mon Mar 7 11:37:06 2011
From: thomas.a.down at gmail.com (Thomas Down)
Date: Mon, 7 Mar 2011 16:37:06 +0000
Subject: [DAS] Adjacent feature extension
In-Reply-To:
References:
<4CB1186B-D5F0-4750-A297-88538BB196B5@ebi.ac.uk>
<6E8B83BF-8B65-41F8-B697-696431D5176E@sanger.ac.uk>
Message-ID:
On Mon, Mar 7, 2011 at 3:04 PM, Andy Jenkinson wrote:
> What if the genes were nonpositional features?
>
Addressing this one separately: I don't believe non-positional features are
adjacent to anything, and => they should never be included as the result of
an adjacent= filter. Happy to clarify this.
Anyone want to argue this the other way?
Thomas.
From andy.jenkinson at ebi.ac.uk Mon Mar 7 11:44:16 2011
From: andy.jenkinson at ebi.ac.uk (Andy Jenkinson)
Date: Mon, 7 Mar 2011 16:44:16 +0000
Subject: [DAS] Adjacent feature extension
In-Reply-To: <15C814D8-14CD-487B-9AD9-2D5ECAABE6E0@sanger.ac.uk>
References:
<4CB1186B-D5F0-4750-A297-88538BB196B5@ebi.ac.uk>
<6E8B83BF-8B65-41F8-B697-696431D5176E@sanger.ac.uk>
<2869357C-159E-4D2D-A169-B91D63E31680@sanger.ac.uk>
<3F20D5FA-A369-4045-A865-728D2D7312ED@ebi.ac.uk>
<15C814D8-14CD-487B-9AD9-2D5ECAABE6E0@sanger.ac.uk>
Message-ID:
On 7 Mar 2011, at 16:31, Jonathan Warren wrote:
>
> On 7 Mar 2011, at 16:19, Andy Jenkinson wrote:
>
>> On 7 Mar 2011, at 15:49, Jonathan Warren wrote:
>>
>>>
>>> On 7 Mar 2011, at 15:04, Andy Jenkinson wrote:
>>>
>>>> On 7 Mar 2011, at 12:16, Jonathan Warren wrote:
>>>>
>>>>> I'd say if we don't have any more objections in the next couple of days then go with your proposal as is? I'll then put support into the registry this week if that is the case. If you could also then copy the proposal from here https://github.com/dasmoth/dalliance/wiki/AdjacentFeatures to the extensions page here:
>>>>> http://www.biodas.org/wiki/DAS1.6E#Adjacent_Feature_filter noting in large letters that it was agreed by the community on such a such a date?
>>>>
>>>> I think there is a lot left to be clarified so adopting it "as is" is a no go for me. In particular, take a look at this diagram and see if you can work out what will be returned with "adjacent" queries for either side of the viewing area, and do they make sense for what the client is trying to achieve?
>>>>
>>>>
>>>> The client has "seen" gene 2 and all its parts.
>>>>
>>>> If the client asks for features adjacent to the left/right sides of the viewing area, what should the server return?
>>> I don't think it makes sense to ask for a next right in this case as there are features here already. This is for sparse data sources so it's ok just to return whats there if someone specifically wants to hit the next feature button or a client can blank the next right button out. It's up to the client.
>>
>> Agree. But what I don't want to see is clients implementing some weird hybrid where they offer a "next right" button that bypasses SNP 2. If we expect clients to behave in a certain way, we should say so.
>>
>>> Next left should return SNP1 if asked for an adjacent request.... or genes and constituents if filtered on gene.
>>
>> Why SNP 1, and not any of the others at the same position? How is the server supposed to decide? Does it matter? How would this be worded in the spec?
> It doesn't matter unless filtered.
OK. Unless there are any objections, Thomas can you add this to the wiki page? Something like "If there is a choice of features at the same position, the server may return any one of them."?
As for the overlap question, let's say that features overlapping the "adjacent" parameter can be included, as it's too complicated otherwise (I just thought of another edge case and it's not pretty!). We should also include a "design note" for client developers.
For returning parents/parts, let's explicitly limit it to one feature. It does make it behave unlike the "normal" segment-based request, but it makes lots of the other issues, like nonpositional parents, moot.
Everyone happy?
From edgrif at sanger.ac.uk Mon Mar 7 11:02:29 2011
From: edgrif at sanger.ac.uk (Ed Griffiths)
Date: Mon, 7 Mar 2011 16:02:29 +0000 (GMT)
Subject: [DAS] Locations between bases
In-Reply-To:
References:
Message-ID:
Hi Thomas,
A while ago I spent some time trying to persuade the GFF version 3 people not to
use base positions to code this difference as it leads to ambiguity. I failed
but I would urge you not to use positions to do it but instead have some other
data that indicates that what you want is the position between two bases.
Ed
On Mon, 7 Mar 2011, Thomas Down wrote:
> The day has come when I find I need to be able to distinguish between a
> feature which actually *covers* one or a small number of bases (e.g. a SNP)
> and a feature which refers to a point between two bases (e.g. an
> insertion). Have any other DAS folk dealt with this before, and if so how?
> One possible approach would be something like:
>
> 30000000
> 29999999
>
> ...as a way to reference the position between the 29999999th and 30000000th
> bases in a sequence... but is this going to break lots of existing client
> code?
>
> (I also rather dislike it because there's currently sanity-checking code in
> Dazzle to stop you doing stuff like this...)
>
> Any better ideas?
>
> Thomas.
>
> PS. Yes, I'm aware this is a solved problem in DAS/2. Afraid I need a DAS/1
> solution, though.
> _______________________________________________
> DAS mailing list
> DAS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/das
>
--
------------------------------------------------------------------------
| Ed Griffiths, Acedb/ZMap development, Informatics Group, |
| The Morgan Building, Sanger Institute, Wellcome Trust Genome Campus |
| Hinxton, Cambridge CB10 1HH |
| |
| email: edgrif at sanger.ac.uk Tel: +44-1223-496844 Fax: +44-1223-494919 |
------------------------------------------------------------------------
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
From thomas.a.down at gmail.com Mon Mar 7 11:51:10 2011
From: thomas.a.down at gmail.com (Thomas Down)
Date: Mon, 7 Mar 2011 16:51:10 +0000
Subject: [DAS] Adjacent feature extension
In-Reply-To:
References:
<4CB1186B-D5F0-4750-A297-88538BB196B5@ebi.ac.uk>
<6E8B83BF-8B65-41F8-B697-696431D5176E@sanger.ac.uk>
Message-ID:
There are several more-or-less separate issues tied up here that I see:
1. Handling of PART/PARENT. I neglected this completely in my
original proposal. I can think of a couple of solutions:
a) Treat a whole PART/PARENT graph as a single
"complex" feature. This means that you'll always get full graphs back from
any kind of feature query (modulo type filtering). If any part of the
complex feature counts as "adjacent", then you'll get the whole thing back.
b) Specify the adjacent= filter as ignoring any
features with a PARENT attribute set.
Jonathan/Andy, do either of you have an opinion on either of
these two. I guess a) is probably the least likely to cause surprise?
2. My idea that an overlapping feature can count as adjacent.
This initially seemed nice and simple but Andy's example of "SNP 2" has
shown why it's broken when you have overlapping features. I'm quite happy
to have the adjacent filter only select features that don't overlap the
query position.
3. Features with matching start/end positions (therefore "equally
adjacent"). I'm going to say "server picks" in this case. The client at
least knows there's something there! I actually think this situation will
be pretty rare in practice (for truly separate features, rather than having
a transcript and exon starting in the same position).
Does that tighten things up?
Andy, thanks for thrashing this out. As you can probably work out, the use
cases I've been working to involve rather sparsely-distributed features, but
it's good to sort out the corner cases that arise as the density increases.
Thomas.
On Mon, Mar 7, 2011 at 3:04 PM, Andy Jenkinson wrote:
> On 7 Mar 2011, at 12:16, Jonathan Warren wrote:
>
> > I'd say if we don't have any more objections in the next couple of days
> then go with your proposal as is? I'll then put support into the registry
> this week if that is the case. If you could also then copy the proposal from
> here https://github.com/dasmoth/dalliance/wiki/AdjacentFeatures to the
> extensions page here:
> > http://www.biodas.org/wiki/DAS1.6E#Adjacent_Feature_filter noting in
> large letters that it was agreed by the community on such a such a date?
>
> I think there is a lot left to be clarified so adopting it "as is" is a no
> go for me. In particular, take a look at this diagram and see if you can
> work out what will be returned with "adjacent" queries for either side of
> the viewing area, and do they make sense for what the client is trying to
> achieve?
>
>
>
> The client has "seen" gene 2 and all its parts.
>
> If the client asks for features adjacent to the left/right sides of the
> viewing area, what should the server return?
> To the left: gene 1, transcript 1, exon 2, or SNP 1? Does it matter? Should
> gene 1 and all its parts be returned because that is what happens in a
> segment query? If not, is this confusing for the spec?
> To the right: gene 2, transcript 2, exon 4, transcript 3, exon 5, or SNP 2?
>
> If no special arrangements are made for excluding overlapping features
> (thus either gene 2, transcript 2 or exon 4 are returned above), what
> position should the client submit instead in its overlap query to get SNP 2
> and SNP 3? (Hint: it's impossible to get SNP 2).
>
> What if the genes were nonpositional features?
>
>
> None of the above are unresolvable problems, the simplest way it to say
> that overlapping features should be returned, and that clients should not
> try to jump beyond them. It means you don't really get a "next feature"
> capability in the same way a user probably intends (i.e. "next gene"), but
> does make it impossible to miss transcript 3 and SNP 2 in the above diagram.
> It still needs to be defined which feature will be returned if multiple have
> the same position that case (ideally we want the one that extends the
> furthers in the adjacent direction), but that can be complicated by things
> like nonpositional parent features etc.
>
> The alternative is to exclude overlapping features, but that raises the
> question of whether parts are considered overlapping if their parents are.
> Implementing "next gene" means excluding all of gene 2's transcripts and
> exons (thus returning SNP 2), but is this even what the user meant? Maybe
> they meant "next transcript".
>
> So not simple. But the main questions are:
> 1. should overlapping features be excluded in overlap requests
> 2. if so, should non-overlapping features with overlapping parents/parts be
> excluded?
> 3. separate from 1 and 2, should the nearest feature's parents and parts
> also be returned?
>
From rmb32 at cornell.edu Mon Mar 7 11:38:13 2011
From: rmb32 at cornell.edu (Robert Buels)
Date: Mon, 07 Mar 2011 11:38:13 -0500
Subject: [DAS] Google Summer of Code project ideas
Message-ID: <4D7509F5.6010703@cornell.edu>
Hi all,
I'm going to be OBF project admin again this year for Google Summer of
code. OBF's application is due later this week, and we need to update
our project ideas on the OBF wiki page and on each project's individual
wiki pages.
So, for each of the OBF projects that wants to do GSoC again this year,
please:
a.) Update the list of project ideas on your project's GSoC page
(BioPython, BioPerl, BioRuby, etc). Add new ones, remove ones that have
already been done or no longer relevant, etc.
b.) Update the list of project ideas on the main OBF GSoC page
(http://www.open-bio.org/wiki/Google_Summer_of_Code) to match.
c.) Let me know via email that you have done so and it's ready for
Google to peruse.
Please have the updates done, if possible, by this Friday (March 11).
The number and quality of the project ideas are part of the evaluation
process for whether OBF is accepted as a Summer of Code organization
again this year, so let's come up with some good ones. :-)
Rob
----
Robert Buels
(prospective) 2011 OBF GSoC Organization Admin
From lincoln.stein at gmail.com Mon Mar 7 12:29:16 2011
From: lincoln.stein at gmail.com (Lincoln Stein)
Date: Mon, 7 Mar 2011 09:29:16 -0800
Subject: [DAS] Locations between bases
In-Reply-To:
References:
Message-ID:
Hi All,
The GFF3 convention is to make start==end and to have the flagged insertion
occur to the right of the designated base.
The elegant way to do this is to use Jim Kent's convention of half-open
spans, in which case the coordinates apply to the position BETWEEN bases,
but this causes too much confusion to satisfy a relatively rare requirement.
Lincoln
On Mon, Mar 7, 2011 at 8:02 AM, Ed Griffiths wrote:
> Hi Thomas,
>
> A while ago I spent some time trying to persuade the GFF version 3 people
> not to use base positions to code this difference as it leads to ambiguity.
> I failed but I would urge you not to use positions to do it but instead have
> some other data that indicates that what you want is the position between
> two bases.
>
> Ed
>
>
>
> On Mon, 7 Mar 2011, Thomas Down wrote:
>
> The day has come when I find I need to be able to distinguish between a
>> feature which actually *covers* one or a small number of bases (e.g. a
>> SNP)
>> and a feature which refers to a point between two bases (e.g. an
>> insertion). Have any other DAS folk dealt with this before, and if so
>> how?
>> One possible approach would be something like:
>>
>> 30000000
>> 29999999
>>
>> ...as a way to reference the position between the 29999999th and
>> 30000000th
>> bases in a sequence... but is this going to break lots of existing client
>> code?
>>
>> (I also rather dislike it because there's currently sanity-checking code
>> in
>> Dazzle to stop you doing stuff like this...)
>>
>> Any better ideas?
>>
>> Thomas.
>>
>> PS. Yes, I'm aware this is a solved problem in DAS/2. Afraid I need a
>> DAS/1
>> solution, though.
>> _______________________________________________
>> DAS mailing list
>> DAS at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/das
>>
>>
> --
> ------------------------------------------------------------------------
> | Ed Griffiths, Acedb/ZMap development, Informatics Group, |
> | The Morgan Building, Sanger Institute, Wellcome Trust Genome Campus |
> | Hinxton, Cambridge CB10 1HH |
> | |
> | email: edgrif at sanger.ac.uk Tel: +44-1223-496844 Fax: +44-1223-494919 |
> ------------------------------------------------------------------------
>
>
> --
> The Wellcome Trust Sanger Institute is operated by Genome Research Limited,
> a charity registered in England with number 1021457 and a company registered
> in England with number 2742969, whose registered office is 215 Euston Road,
> London, NW1 2BE. _______________________________________________
>
> DAS mailing list
> DAS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/das
>
--
Lincoln D. Stein
Director, Informatics and Biocomputing Platform
Ontario Institute for Cancer Research
101 College St., Suite 800
Toronto, ON, Canada M5G0A3
416 673-8514
Assistant: Renata Musa
From David.Nix at hci.utah.edu Mon Mar 7 11:38:10 2011
From: David.Nix at hci.utah.edu (David Nix)
Date: Mon, 7 Mar 2011 09:38:10 -0700
Subject: [DAS] Locations between bases
In-Reply-To:
Message-ID:
My two cents...
I'm coming late to this thread so pardon if I missing something but I'd strongly recommend adopting interbase coordinates for DAS/1 (maybe this is already the case?). This is explicitly specified in DAS/2 spec. It's pretty critical when it comes to calling SNPs and INDELs.
-cheers, D
On 3/7/11 9:10 AM, "Thomas Down" wrote:
On Mon, Mar 7, 2011 at 4:07 PM, Ewan Birney wrote:
>
> I'd say this is start=end, and a convention that this indicates the point
> to the left or the right with a glyph. With this converntion for the edge
> case one either
> has to allow the 0 base (right based insertion convention, insertion at the
> very
> start) or end+1 position (left based insertion convention, insertion at the
> every end).
>
Thanks Ewan (+Ed and Andy), that sounds best. I'm going to need a
non-standard stylesheet glyph to do what I'm after as well, so perhaps it's
best to attach the "interbaseness" to that, rather than the feature itself.
> I know, this triggers doing things on inter-base coordinates, but that's
> just
> a big, big change.
>
Yep, that's not really an option.
Thomas.
_______________________________________________
DAS mailing list
DAS at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/das
From rmb32 at cornell.edu Thu Mar 10 12:18:34 2011
From: rmb32 at cornell.edu (Robert Buels)
Date: Thu, 10 Mar 2011 12:18:34 -0500
Subject: [DAS] update Google Summer of Code project ideas
Message-ID: <4D7907EA.202@cornell.edu>
Hi all,
Please make sure the BioDAS information is up to date for 2011 on both
the OBF and BioDAS wikis. The current page is not fully updated for 2011.
You might want to consider making the page name non-year-specific, and
update the link to it.
Please get this done in the next few days, I'm sending the OBF's
application in to Google today, and they will be evaluating us to see
that we're fully prepared.
OBF wiki page: http://www.open-bio.org/wiki/Google_Summer_of_Code
BioDAS wiki: http://www.biodas.org/wiki/GSoC2010
Rob
----
Robert Buels
(prospective) 2011 OBF GSoC Organization Admin
From andy.jenkinson at ebi.ac.uk Fri Mar 11 05:15:14 2011
From: andy.jenkinson at ebi.ac.uk (Andy Jenkinson)
Date: Fri, 11 Mar 2011 10:15:14 +0000
Subject: [DAS] update Google Summer of Code project ideas
In-Reply-To: <4D7907EA.202@cornell.edu>
References: <4D7907EA.202@cornell.edu>
Message-ID: <94994A04-3521-41C6-9BE3-417903453FB9@ebi.ac.uk>
Thanks Robert, I've created a 2011 version (with the same projects!). I think they're both still doable. Is it too late to add more?
On 10 Mar 2011, at 17:18, Robert Buels wrote:
> Hi all,
>
> Please make sure the BioDAS information is up to date for 2011 on both the OBF and BioDAS wikis. The current page is not fully updated for 2011.
>
> You might want to consider making the page name non-year-specific, and update the link to it.
>
> Please get this done in the next few days, I'm sending the OBF's application in to Google today, and they will be evaluating us to see that we're fully prepared.
>
> OBF wiki page: http://www.open-bio.org/wiki/Google_Summer_of_Code
> BioDAS wiki: http://www.biodas.org/wiki/GSoC2010
>
> Rob
>
> ----
> Robert Buels
> (prospective) 2011 OBF GSoC Organization Admin
> _______________________________________________
> DAS mailing list
> DAS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/das
From rmb32 at cornell.edu Fri Mar 11 11:02:47 2011
From: rmb32 at cornell.edu (Robert Buels)
Date: Fri, 11 Mar 2011 11:02:47 -0500
Subject: [DAS] update Google Summer of Code project ideas
In-Reply-To: <94994A04-3521-41C6-9BE3-417903453FB9@ebi.ac.uk>
References: <4D7907EA.202@cornell.edu>
<94994A04-3521-41C6-9BE3-417903453FB9@ebi.ac.uk>
Message-ID: <4D7A47A7.2010105@cornell.edu>
On 03/11/2011 05:15 AM, Andy Jenkinson wrote:
> Thanks Robert, I've created a 2011 version (with the same projects!). I think they're both still doable. Is it too late to add more?
No, not too late at all. You can edit it continuously, I just want to
make sure that the pages look presentable by the time Google looks at
them. They're not frozen or anything.
Rob
From suzi at berkeleybop.org Fri Mar 11 14:00:09 2011
From: suzi at berkeleybop.org (Suzanna Lewis)
Date: Fri, 11 Mar 2011 11:00:09 -0800
Subject: [DAS] Locations between bases
In-Reply-To:
References:
Message-ID:
+1
On Mar 7, 2011, at 8:38 AM, David Nix wrote:
> My two cents...
>
> I'm coming late to this thread so pardon if I missing something but I'd strongly recommend adopting interbase coordinates for DAS/1 (maybe this is already the case?). This is explicitly specified in DAS/2 spec. It's pretty critical when it comes to calling SNPs and INDELs.
>
> -cheers, D
>
>
> On 3/7/11 9:10 AM, "Thomas Down" wrote:
>
> On Mon, Mar 7, 2011 at 4:07 PM, Ewan Birney wrote:
>
>>
>> I'd say this is start=end, and a convention that this indicates the point
>> to the left or the right with a glyph. With this converntion for the edge
>> case one either
>> has to allow the 0 base (right based insertion convention, insertion at the
>> very
>> start) or end+1 position (left based insertion convention, insertion at the
>> every end).
>>
>
> Thanks Ewan (+Ed and Andy), that sounds best. I'm going to need a
> non-standard stylesheet glyph to do what I'm after as well, so perhaps it's
> best to attach the "interbaseness" to that, rather than the feature itself.
>
>
>> I know, this triggers doing things on inter-base coordinates, but that's
>> just
>> a big, big change.
>>
>
> Yep, that's not really an option.
>
> Thomas.
> _______________________________________________
> DAS mailing list
> DAS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/das
>
>
> _______________________________________________
> DAS mailing list
> DAS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/das
From jw12 at sanger.ac.uk Mon Mar 14 13:36:18 2011
From: jw12 at sanger.ac.uk (Jonathan Warren)
Date: Mon, 14 Mar 2011 17:36:18 +0000
Subject: [DAS] DAS Registry update
Message-ID:
The registry has been updated to include:
-data source response times per capability (ms) - hopefully this
information will be useful to clients?
-adjacent-feature capability support - new extension support to find
next feature in sparsely populated das sources.
-sources queried by coordinate system uri or id (interface also has
links) rather than having to construct a query based on all components
of coordinate systems i.e. authority, type, version, taxonomy.
For more info read the history notes or just ask:
Registry V3.0.6 14 03 2011
* response times for each capability for each source are now
optionally returned in the sources document. Add the paramter
times=true and these are listed e.g. http://www.dasregistry.org/das/sources?times=true
or for a single source http://www.dasregistry.org/das/DS_409?times=true
* all sources that have a particular coordinate system will be
returned if you query using the coordinateid param and the url or just
CS_DS identifier: http://www.dasregistry.org/das/sources?rows=1-10;coordinateid=CS_DS10
or the full coordinatesystem uri such as for sources with a
specific coordinatesystem http://www.dasregistry/das/sources?rows=1-10;coordinateid=http://www.dasregistry.org/dasregistry/coordsys/CS_DS10
* added adjacent-feature capability validation
* Capabilities now have correct name with "-" instead of "_"
Jonathan Warren
Senior Developer and DAS coordinator
blog: http://biodasman.wordpress.com/
jw12 at sanger.ac.uk
Ext: 2314
Telephone: 01223 492314
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
From jw12 at sanger.ac.uk Thu Mar 17 06:01:08 2011
From: jw12 at sanger.ac.uk (Jonathan Warren)
Date: Thu, 17 Mar 2011 10:01:08 +0000
Subject: [DAS] change to registry ids
Message-ID: <0541B8A0-5501-4C3C-B4B9-B48CB7B66D44@sanger.ac.uk>
Very soon I'd like to change the data source Ids in the registry that
have a prefix other than DS_ such as ENSEMBL_ or GRAMENE_ to a normal
DS_ id. Over the last 4-6 months new ensembl and gramene sources been
registered with normal DS_ ids anyway so I'd be surprised if this
change would effect anything, so this change just affects sources
registered a long time ago.
If anyone has a client that relies on these or thinks that this would
break anything please let me know in the next couple of days.
Thanks
Jonathan.
Jonathan Warren
Senior Developer and DAS coordinator
blog: http://biodasman.wordpress.com/
jw12 at sanger.ac.uk
Ext: 2314
Telephone: 01223 492314
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
From David.Nix at hci.utah.edu Thu Mar 17 11:13:43 2011
From: David.Nix at hci.utah.edu (David Nix)
Date: Thu, 17 Mar 2011 09:13:43 -0600
Subject: [DAS] change to registry ids
In-Reply-To: <0541B8A0-5501-4C3C-B4B9-B48CB7B66D44@sanger.ac.uk>
Message-ID:
Any chance of getting the registry to work with DAS/2 servers? Even a stripped down entry would help. -cheers, D
On 3/17/11 4:01 AM, "Jonathan Warren" wrote:
Very soon I'd like to change the data source Ids in the registry that
have a prefix other than DS_ such as ENSEMBL_ or GRAMENE_ to a normal
DS_ id. Over the last 4-6 months new ensembl and gramene sources been
registered with normal DS_ ids anyway so I'd be surprised if this
change would effect anything, so this change just affects sources
registered a long time ago.
If anyone has a client that relies on these or thinks that this would
break anything please let me know in the next couple of days.
Thanks
Jonathan.
Jonathan Warren
Senior Developer and DAS coordinator
blog: http://biodasman.wordpress.com/
jw12 at sanger.ac.uk
Ext: 2314
Telephone: 01223 492314
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
_______________________________________________
DAS mailing list
DAS at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/das
From rmb32 at cornell.edu Fri Mar 18 15:24:57 2011
From: rmb32 at cornell.edu (Robert Buels)
Date: Fri, 18 Mar 2011 15:24:57 -0400
Subject: [DAS] Google Summer of Code is *ON* for OBF projects!
Message-ID: <4D83B189.7040701@cornell.edu>
Hi all,
Great news: Google announced today that the Open Bioinformatics
Foundation has been accepted as a mentoring organization for this
summer's Google Summer of Code!
GSoC is a Google-sponsored student internship program for open-source
projects, open to students from around the world (not just US
residents). Students are paid a $5000 USD stipend to work as a
developer on an open-source project for the summer. For more on GSoC,
see GSoC 2011 FAQ at http://bit.ly/hpoz8W
Student applications are due April 8, 2011 at 19:00 UTC. Students who
are interested in participating should look at the OBF's GSoC page at
http://open-bio.org/wiki/Google_Summer_of_Code, which lists project
ideas, and whom to contact about applying.
For current developers on OBF projects, please consider volunteering to
be a mentor if you have not already, and contribute project ideas. Just
list your name and project ideas on OBF wiki and on the relevant
project's GSoC wiki page.
Thanks to all who helped make OBF's application to GSoC a success, and
let's have a great, productive summer of code!
Rob Buels
OBF GSoC 2011 Administrator
From jprocter at compbio.dundee.ac.uk Sat Mar 19 15:48:37 2011
From: jprocter at compbio.dundee.ac.uk (Jim Procter)
Date: Sat, 19 Mar 2011 19:48:37 +0000
Subject: [DAS] Google Summer of Code is *ON* for OBF projects!
In-Reply-To: <4D83B189.7040701@cornell.edu>
References: <4D83B189.7040701@cornell.edu>
Message-ID: <4D850895.1070706@compbio.dundee.ac.uk>
On 18/03/2011 19:24, Robert Buels wrote:
> Hi all,
>
> Great news: Google announced today that the Open Bioinformatics
> Foundation has been accepted as a mentoring organization for this
> summer's Google Summer of Code!
W000t!
You might all be interested see the other Bio-related orgs who got in:
Genome Informatics
http://gmod.org/wiki/GSoC google/gsoc2011/genomeinformatics
National Evolutionary Synthesis Center (NESCent)
http://phyloinformatics.net/Phyloinformatics_Summer_of_Code_2011
National Resource for Network Biology (GenMAPP, Cytoscape and WikiPathways)
http://nrnb.org/gsoc
Orange: Data Mining Fruitful & Fun
http://orange.biolab.si/trac/wiki/GSoC/Ideas
Theoretical Biophysics @ Humboldt University
http://rumo.biologie.hu-berlin.de/gsoc
Its going to be another busy summer :)
Jim.
--
-------------------------------------------------------------------
J. B. Procter (JALVIEW/ENFIN) Barton Bioinformatics Research Group
Phone/Fax:+44(0)1382 388734/345764 http://www.compbio.dundee.ac.uk
The University of Dundee is a Scottish Registered Charity, No. SC015096.
From gsalazar at cs.uct.ac.za Wed Mar 23 08:08:14 2011
From: gsalazar at cs.uct.ac.za (Gustavo Salazar)
Date: Wed, 23 Mar 2011 12:08:14 +0000
Subject: [DAS] Pagination Proposal
Message-ID: <6EF91EC3-3A7F-4E9B-AD26-043BCC6D04A3@cs.uct.ac.za>
Hello all,
Following the momentum that the DAS workshop let us I started tackling one of the many projects that we defined during the 3rd day: The pagination for the features command.
I added the proposal for the extension in the wiki:
http://www.biodas.org/wiki/DAS1.6E#Pagination_for_DAS
I have implemented it in MyDas and is included in the snapshot version of the repository in case anyone wants to play with it
http://mydas.googlecode.com/svn/snapshot-repository/uk/ac/ebi/mydas/mydas/1.6.5-SNAPSHOT/
As a nightly version it may have changes whenever we do a release after some proper testing of it
Looking for your feedback about it!
Cheers,
Gustavo.
From andy.jenkinson at ebi.ac.uk Wed Mar 23 11:01:47 2011
From: andy.jenkinson at ebi.ac.uk (Andy Jenkinson)
Date: Wed, 23 Mar 2011 15:01:47 +0000
Subject: [DAS] Pagination Proposal
In-Reply-To: <6EF91EC3-3A7F-4E9B-AD26-043BCC6D04A3@cs.uct.ac.za>
References: <6EF91EC3-3A7F-4E9B-AD26-043BCC6D04A3@cs.uct.ac.za>
Message-ID: <79671B06-2235-4D7C-84B1-DD0CD7CA6569@ebi.ac.uk>
Hi Gustavo,
Thanks for this, looks fine to me mostly.
One question: should segments that contain no features (due to the pagination limit) be included in the response?
/das/foo/features?search=a*;rows=1-2
...
Also, I can't remember if we decided at the workshop whether servers would be allowed to overrule the client's requested range and return (for example) a smaller number of rows. This is how entry_points works, which is why it has "start" and "end" attributes in the response in addition to the "total" attribute.
On 23 Mar 2011, at 12:08, Gustavo Salazar wrote:
> Hello all,
>
> Following the momentum that the DAS workshop let us I started tackling one of the many projects that we defined during the 3rd day: The pagination for the features command.
> I added the proposal for the extension in the wiki:
> http://www.biodas.org/wiki/DAS1.6E#Pagination_for_DAS
> I have implemented it in MyDas and is included in the snapshot version of the repository in case anyone wants to play with it
> http://mydas.googlecode.com/svn/snapshot-repository/uk/ac/ebi/mydas/mydas/1.6.5-SNAPSHOT/
> As a nightly version it may have changes whenever we do a release after some proper testing of it
>
> Looking for your feedback about it!
>
> Cheers,
>
> Gustavo.
> _______________________________________________
> DAS mailing list
> DAS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/das
From gsalazar at cs.uct.ac.za Wed Mar 23 11:43:20 2011
From: gsalazar at cs.uct.ac.za (Gustavo Salazar)
Date: Wed, 23 Mar 2011 15:43:20 +0000
Subject: [DAS] Pagination Proposal
In-Reply-To: <79671B06-2235-4D7C-84B1-DD0CD7CA6569@ebi.ac.uk>
References: <6EF91EC3-3A7F-4E9B-AD26-043BCC6D04A3@cs.uct.ac.za>
<79671B06-2235-4D7C-84B1-DD0CD7CA6569@ebi.ac.uk>
Message-ID: <262F92F2-AB74-43B8-88E6-824B87F1B139@cs.uct.ac.za>
Hey Andy,
I think we didn't discussed if empty segments should be included. In the implementation I got I am not including those, I think is not necessary.
I think the agreement was to report the error as an HTTP error, but that was though more in the perspective of clients that are not supporting this capability, trying to avoid servers responding paginated segments when the rows attribute was not included.
Cheers,
Gustavo.
On 23 Mar 2011, at 15:01, Andy Jenkinson wrote:
> Hi Gustavo,
>
> Thanks for this, looks fine to me mostly.
>
> One question: should segments that contain no features (due to the pagination limit) be included in the response?
>
> /das/foo/features?search=a*;rows=1-2
>
>
>
>
>
>
>
>
> ...
>
>
> Also, I can't remember if we decided at the workshop whether servers would be allowed to overrule the client's requested range and return (for example) a smaller number of rows. This is how entry_points works, which is why it has "start" and "end" attributes in the response in addition to the "total" attribute.
>
> On 23 Mar 2011, at 12:08, Gustavo Salazar wrote:
>
>> Hello all,
>>
>> Following the momentum that the DAS workshop let us I started tackling one of the many projects that we defined during the 3rd day: The pagination for the features command.
>> I added the proposal for the extension in the wiki:
>> http://www.biodas.org/wiki/DAS1.6E#Pagination_for_DAS
>> I have implemented it in MyDas and is included in the snapshot version of the repository in case anyone wants to play with it
>> http://mydas.googlecode.com/svn/snapshot-repository/uk/ac/ebi/mydas/mydas/1.6.5-SNAPSHOT/
>> As a nightly version it may have changes whenever we do a release after some proper testing of it
>>
>> Looking for your feedback about it!
>>
>> Cheers,
>>
>> Gustavo.
>> _______________________________________________
>> DAS mailing list
>> DAS at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/das
>
>
> _______________________________________________
> DAS mailing list
> DAS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/das
From andy.jenkinson at ebi.ac.uk Wed Mar 23 12:23:07 2011
From: andy.jenkinson at ebi.ac.uk (Andy Jenkinson)
Date: Wed, 23 Mar 2011 16:23:07 +0000
Subject: [DAS] Pagination Proposal
In-Reply-To: <262F92F2-AB74-43B8-88E6-824B87F1B139@cs.uct.ac.za>
References: <6EF91EC3-3A7F-4E9B-AD26-043BCC6D04A3@cs.uct.ac.za>
<79671B06-2235-4D7C-84B1-DD0CD7CA6569@ebi.ac.uk>
<262F92F2-AB74-43B8-88E6-824B87F1B139@cs.uct.ac.za>
Message-ID:
On 23 Mar 2011, at 15:43, Gustavo Salazar wrote:
> Hey Andy,
> I think we didn't discussed if empty segments should be included. In the implementation I got I am not including those, I think is not necessary.
That's fine, it makes the implementation easier too.
> I think the agreement was to report the error as an HTTP error, but that was though more in the perspective of clients that are not supporting this capability, trying to avoid servers responding paginated segments when the rows attribute was not included.
Absolutely, that's good.
I've updated the wiki with clarifications of these points, and an example response illustrating how the server should choose which rows to return.
> Cheers,
> Gustavo.
>
>
> On 23 Mar 2011, at 15:01, Andy Jenkinson wrote:
>
>> Hi Gustavo,
>>
>> Thanks for this, looks fine to me mostly.
>>
>> One question: should segments that contain no features (due to the pagination limit) be included in the response?
>>
>> /das/foo/features?search=a*;rows=1-2
>>
>>
>>
>>
>>
>>
>>
>>
>> ...
>>
>>
>> Also, I can't remember if we decided at the workshop whether servers would be allowed to overrule the client's requested range and return (for example) a smaller number of rows. This is how entry_points works, which is why it has "start" and "end" attributes in the response in addition to the "total" attribute.
>>
>> On 23 Mar 2011, at 12:08, Gustavo Salazar wrote:
>>
>>> Hello all,
>>>
>>> Following the momentum that the DAS workshop let us I started tackling one of the many projects that we defined during the 3rd day: The pagination for the features command.
>>> I added the proposal for the extension in the wiki:
>>> http://www.biodas.org/wiki/DAS1.6E#Pagination_for_DAS
>>> I have implemented it in MyDas and is included in the snapshot version of the repository in case anyone wants to play with it
>>> http://mydas.googlecode.com/svn/snapshot-repository/uk/ac/ebi/mydas/mydas/1.6.5-SNAPSHOT/
>>> As a nightly version it may have changes whenever we do a release after some proper testing of it
>>>
>>> Looking for your feedback about it!
>>>
>>> Cheers,
>>>
>>> Gustavo.
>>> _______________________________________________
>>> DAS mailing list
>>> DAS at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/das
>>
>>
>> _______________________________________________
>> DAS mailing list
>> DAS at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/das
>
From ljgarcia at ebi.ac.uk Mon Mar 28 05:36:09 2011
From: ljgarcia at ebi.ac.uk (Leyla Garcia)
Date: Mon, 28 Mar 2011 10:36:09 +0100
Subject: [DAS] Pagination Proposal
In-Reply-To:
References: <6EF91EC3-3A7F-4E9B-AD26-043BCC6D04A3@cs.uct.ac.za> <79671B06-2235-4D7C-84B1-DD0CD7CA6569@ebi.ac.uk> <262F92F2-AB74-43B8-88E6-824B87F1B139@cs.uct.ac.za>
Message-ID: <4D905689.6030302@ebi.ac.uk>
On 23/03/2011 16:23, Andy Jenkinson wrote:
> On 23 Mar 2011, at 15:43, Gustavo Salazar wrote:
>
>> Hey Andy,
>> I think we didn't discussed if empty segments should be included. In the implementation I got I am not including those, I think is not necessary.
> That's fine, it makes the implementation easier too.
>
>> I think the agreement was to report the error as an HTTP error, but that was though more in the perspective of clients that are not supporting this capability, trying to avoid servers responding paginated segments when the rows attribute was not included.
Gustavo, do you mean segment= and nothing after the =? For features and
segment those are reported as an ERRRORSEGMENT for myDAS. We will keep
it in this way, right?
Leyla
> Absolutely, that's good.
>
> I've updated the wiki with clarifications of these points, and an example response illustrating how the server should choose which rows to return.
>
>> Cheers,
>> Gustavo.
>>
>>
>> On 23 Mar 2011, at 15:01, Andy Jenkinson wrote:
>>
>>> Hi Gustavo,
>>>
>>> Thanks for this, looks fine to me mostly.
>>>
>>> One question: should segments that contain no features (due to the pagination limit) be included in the response?
>>>
>>> /das/foo/features?search=a*;rows=1-2
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> ...
>>>
>>>
>>> Also, I can't remember if we decided at the workshop whether servers would be allowed to overrule the client's requested range and return (for example) a smaller number of rows. This is how entry_points works, which is why it has "start" and "end" attributes in the response in addition to the "total" attribute.
>>>
>>> On 23 Mar 2011, at 12:08, Gustavo Salazar wrote:
>>>
>>>> Hello all,
>>>>
>>>> Following the momentum that the DAS workshop let us I started tackling one of the many projects that we defined during the 3rd day: The pagination for the features command.
>>>> I added the proposal for the extension in the wiki:
>>>> http://www.biodas.org/wiki/DAS1.6E#Pagination_for_DAS
>>>> I have implemented it in MyDas and is included in the snapshot version of the repository in case anyone wants to play with it
>>>> http://mydas.googlecode.com/svn/snapshot-repository/uk/ac/ebi/mydas/mydas/1.6.5-SNAPSHOT/
>>>> As a nightly version it may have changes whenever we do a release after some proper testing of it
>>>>
>>>> Looking for your feedback about it!
>>>>
>>>> Cheers,
>>>>
>>>> Gustavo.
>>>> _______________________________________________
>>>> DAS mailing list
>>>> DAS at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/das
>>>
>>> _______________________________________________
>>> DAS mailing list
>>> DAS at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/das
>
> _______________________________________________
> DAS mailing list
> DAS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/das
From gsalazar at cs.uct.ac.za Mon Mar 28 06:26:58 2011
From: gsalazar at cs.uct.ac.za (Gustavo Salazar)
Date: Mon, 28 Mar 2011 11:26:58 +0100
Subject: [DAS] Pagination Proposal
In-Reply-To: <4D905689.6030302@ebi.ac.uk>
References: <6EF91EC3-3A7F-4E9B-AD26-043BCC6D04A3@cs.uct.ac.za> <79671B06-2235-4D7C-84B1-DD0CD7CA6569@ebi.ac.uk> <262F92F2-AB74-43B8-88E6-824B87F1B139@cs.uct.ac.za>
<4D905689.6030302@ebi.ac.uk>
Message-ID: <1A1EB553-60CD-427E-9BEB-6B202C33148A@cs.uct.ac.za>
On 28 Mar 2011, at 10:36, Leyla Garcia wrote:
> On 23/03/2011 16:23, Andy Jenkinson wrote:
>> On 23 Mar 2011, at 15:43, Gustavo Salazar wrote:
>>
>>> Hey Andy,
>>> I think we didn't discussed if empty segments should be included. In the implementation I got I am not including those, I think is not necessary.
>> That's fine, it makes the implementation easier too.
>>
>>> I think the agreement was to report the error as an HTTP error, but that was though more in the perspective of clients that are not supporting this capability, trying to avoid servers responding paginated segments when the rows attribute was not included.
> Gustavo, do you mean segment= and nothing after the =? For features and segment those are reported as an ERRRORSEGMENT for myDAS. We will keep it in this way, right?
No, I'm talking about something different here... this is for the cases that the response is too big that the server wont be able to response. We considered the option of doing a pagination forced by the server, even if is not requested, however it might confuse clients that are not supporting this capability and these may think they are getting all the results. Thats why we decide to go with a HTTP error approach + X-DAS Status.
The "segment=" case was not discussed here, but Andy told me a while ago that he will include some descriptions about this case in a future amendment of the spec 1.6.1
> Leyla
>> Absolutely, that's good.
>>
>> I've updated the wiki with clarifications of these points, and an example response illustrating how the server should choose which rows to return.
>>
>>> Cheers,
>>> Gustavo.
>>>
>>>
>>> On 23 Mar 2011, at 15:01, Andy Jenkinson wrote:
>>>
>>>> Hi Gustavo,
>>>>
>>>> Thanks for this, looks fine to me mostly.
>>>>
>>>> One question: should segments that contain no features (due to the pagination limit) be included in the response?
>>>>
>>>> /das/foo/features?search=a*;rows=1-2
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> ...
>>>>
>>>>
>>>> Also, I can't remember if we decided at the workshop whether servers would be allowed to overrule the client's requested range and return (for example) a smaller number of rows. This is how entry_points works, which is why it has "start" and "end" attributes in the response in addition to the "total" attribute.
>>>>
>>>> On 23 Mar 2011, at 12:08, Gustavo Salazar wrote:
>>>>
>>>>> Hello all,
>>>>>
>>>>> Following the momentum that the DAS workshop let us I started tackling one of the many projects that we defined during the 3rd day: The pagination for the features command.
>>>>> I added the proposal for the extension in the wiki:
>>>>> http://www.biodas.org/wiki/DAS1.6E#Pagination_for_DAS
>>>>> I have implemented it in MyDas and is included in the snapshot version of the repository in case anyone wants to play with it
>>>>> http://mydas.googlecode.com/svn/snapshot-repository/uk/ac/ebi/mydas/mydas/1.6.5-SNAPSHOT/
>>>>> As a nightly version it may have changes whenever we do a release after some proper testing of it
>>>>>
>>>>> Looking for your feedback about it!
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Gustavo.
>>>>> _______________________________________________
>>>>> DAS mailing list
>>>>> DAS at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/das
>>>>
>>>> _______________________________________________
>>>> DAS mailing list
>>>> DAS at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/das
>>
>> _______________________________________________
>> DAS mailing list
>> DAS at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/das
>
> _______________________________________________
> DAS mailing list
> DAS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/das
From jw12 at sanger.ac.uk Thu Mar 31 05:37:49 2011
From: jw12 at sanger.ac.uk (Jonathan Warren)
Date: Thu, 31 Mar 2011 10:37:49 +0100
Subject: [DAS] Videos from DAS workshop now available
Message-ID: <17BEA02E-A5B5-4553-A600-5FD772A74FC9@sanger.ac.uk>
Hi
Thanks again for all those who presented at this years DAS workshop
early this month.
Videos of the presentations from day 2 are now available here: http://www.scivee.tv/browse
Type das in the search box and they come up. Links will soon be added
to the workshop wiki page.
I'm pleased to say the quality of the video and the audio is quite
reasonable.
Many of the presentations have already had a number of views which is
very encouraging.
Thanks
Jonathan
Jonathan Warren
Senior Developer and DAS coordinator
blog: http://biodasman.wordpress.com/
jw12 at sanger.ac.uk
Ext: 2314
Telephone: 01223 492314
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
From jw12 at sanger.ac.uk Tue Mar 1 10:09:28 2011
From: jw12 at sanger.ac.uk (Jonathan Warren)
Date: Tue, 1 Mar 2011 10:09:28 +0000
Subject: [DAS] RELAX NG schema for "sequence" command
In-Reply-To: <0F015AE7-09D7-4F4D-B547-353830FD82A2@broadinstitute.org>
References: <0F015AE7-09D7-4F4D-B547-353830FD82A2@broadinstitute.org>
Message-ID: <563D9C80-555A-4866-9E74-F04FF5458E27@sanger.ac.uk>
This is fixed now.
On 28 Feb 2011, at 18:34, Andrew Roberts wrote:
> Hello,
>
> I'm working on some validations, and I think I may have found an
> error in sequence.rng (www.dasregistry.org/validation1.6/
> sequence.rng). It appears that this schema makes no provision for
> the actual text data that belongs to the sequence element (i.e. the
> bases themselves). Is this a known issue?
>
> If not, I think the solution is to add a element inside the
> sequence element. Line 10 should read,
>
>
> attribute>
> attribute>
> attribute>
> text>
>
> Cheers,
>
> Andrew
> _______________________________________________
> DAS mailing list
> DAS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/das
Jonathan Warren
Senior Developer and DAS coordinator
blog: http://biodasman.wordpress.com/
jw12 at sanger.ac.uk
Ext: 2314
Telephone: 01223 492314
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
From aroberts at broadinstitute.org Tue Mar 1 15:19:13 2011
From: aroberts at broadinstitute.org (Andrew Roberts)
Date: Tue, 1 Mar 2011 10:19:13 -0500
Subject: [DAS] RELAX NG schema for "sequence" command
In-Reply-To: <563D9C80-555A-4866-9E74-F04FF5458E27@sanger.ac.uk>
References: <0F015AE7-09D7-4F4D-B547-353830FD82A2@broadinstitute.org>
<563D9C80-555A-4866-9E74-F04FF5458E27@sanger.ac.uk>
Message-ID: <1FD158A0-4032-48A2-AA86-383F9487BB52@broadinstitute.org>
Alright, now I'm hitting the lack of support for ERRORSEGMENT within
the sequence command response. I think there needs to be a choice
inside the oneOrMore, so that elements can be SEQUENCE or
ERRORSEGMENT. I attached what I'm using as a testing schema in the
meantime, hopefully it's helpful.
Andrew
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sequence.rng
Type: application/octet-stream
Size: 1267 bytes
Desc: not available
URL:
-------------- next part --------------
On Mar 1, 2011, at 5:09 AM, Jonathan Warren wrote:
> This is fixed now.
>
> On 28 Feb 2011, at 18:34, Andrew Roberts wrote:
>
>> Hello,
>>
>> I'm working on some validations, and I think I may have found an
>> error in sequence.rng (www.dasregistry.org/validation1.6/sequence.rng)
>> . It appears that this schema makes no provision for the actual
>> text data that belongs to the sequence element (i.e. the bases
>> themselves). Is this a known issue?
>>
>> If not, I think the solution is to add a element inside
>> the sequence element. Line 10 should read,
>>
>>
>> attribute>
>> attribute>
>> attribute>
>> text>
>>
>> Cheers,
>>
>> Andrew
>> _______________________________________________
>> DAS mailing list
>> DAS at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/das
>
> Jonathan Warren
> Senior Developer and DAS coordinator
> blog: http://biodasman.wordpress.com/
> jw12 at sanger.ac.uk
> Ext: 2314
> Telephone: 01223 492314
>
>
>
>
>
>
>
>
>
> --
> The Wellcome Trust Sanger Institute is operated by Genome
> ResearchLimited, a charity registered in England with number 1021457
> and acompany registered in England with number 2742969, whose
> registeredoffice is 215 Euston Road, London, NW1 2BE.
From jw12 at sanger.ac.uk Tue Mar 1 15:54:05 2011
From: jw12 at sanger.ac.uk (Jonathan Warren)
Date: Tue, 1 Mar 2011 15:54:05 +0000
Subject: [DAS] RELAX NG schema for "sequence" command
In-Reply-To: <1FD158A0-4032-48A2-AA86-383F9487BB52@broadinstitute.org>
References: <0F015AE7-09D7-4F4D-B547-353830FD82A2@broadinstitute.org>
<563D9C80-555A-4866-9E74-F04FF5458E27@sanger.ac.uk>
<1FD158A0-4032-48A2-AA86-383F9487BB52@broadinstitute.org>
Message-ID: <2516E6D2-3584-49AB-8648-14DECF48205D@sanger.ac.uk>
Hi Andrew
I guess this depends on what you are using the RNG for. What are you
using it for? The registry needs the result from a sequence request to
conform to the sequence.rng as it is on the registry as if a sequence
is requested the test region put into the registry should return a
sequence, if it returns an ERRORSEGMENT the the registry wants that
reported as invalid. If I add the errorsegment to that document as you
suggest the registry won't pick it up as invalid.
On 1 Mar 2011, at 15:19, Andrew Roberts wrote:
> Alright, now I'm hitting the lack of support for ERRORSEGMENT
> within the sequence command response. I think there needs to be a
> choice inside the oneOrMore, so that elements can be SEQUENCE or
> ERRORSEGMENT. I attached what I'm using as a testing schema in the
> meantime, hopefully it's helpful.
>
> Andrew
>
>
>
> On Mar 1, 2011, at 5:09 AM, Jonathan Warren wrote:
>
>> This is fixed now.
>>
>> On 28 Feb 2011, at 18:34, Andrew Roberts wrote:
>>
>>> Hello,
>>>
>>> I'm working on some validations, and I think I may have found an
>>> error in sequence.rng (www.dasregistry.org/validation1.6/sequence.rng)
>>> . It appears that this schema makes no provision for the actual
>>> text data that belongs to the sequence element (i.e. the bases
>>> themselves). Is this a known issue?
>>>
>>> If not, I think the solution is to add a element inside
>>> the sequence element. Line 10 should read,
>>>
>>>
>>> attribute>
>>> attribute>
>>> attribute>
>>> text>
>>>
>>> Cheers,
>>>
>>> Andrew
>>> _______________________________________________
>>> DAS mailing list
>>> DAS at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/das
>>
>> Jonathan Warren
>> Senior Developer and DAS coordinator
>> blog: http://biodasman.wordpress.com/
>> jw12 at sanger.ac.uk
>> Ext: 2314
>> Telephone: 01223 492314
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>> The Wellcome Trust Sanger Institute is operated by Genome
>> ResearchLimited, a charity registered in England with number
>> 1021457 and acompany registered in England with number 2742969,
>> whose registeredoffice is 215 Euston Road, London, NW1 2BE.
>
Jonathan Warren
Senior Developer and DAS coordinator
blog: http://biodasman.wordpress.com/
jw12 at sanger.ac.uk
Ext: 2314
Telephone: 01223 492314
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
From andy.jenkinson at ebi.ac.uk Tue Mar 1 16:19:12 2011
From: andy.jenkinson at ebi.ac.uk (Andy Jenkinson)
Date: Tue, 1 Mar 2011 16:19:12 +0000
Subject: [DAS] RELAX NG schema for "sequence" command
In-Reply-To: <2516E6D2-3584-49AB-8648-14DECF48205D@sanger.ac.uk>
References: <0F015AE7-09D7-4F4D-B547-353830FD82A2@broadinstitute.org>
<563D9C80-555A-4866-9E74-F04FF5458E27@sanger.ac.uk>
<1FD158A0-4032-48A2-AA86-383F9487BB52@broadinstitute.org>
<2516E6D2-3584-49AB-8648-14DECF48205D@sanger.ac.uk>
Message-ID: <45146209-B676-4307-B020-C87AFFFD0235@ebi.ac.uk>
To put another way, the ERRORSEGMENT element is only valid if the server supports the error-segment capability. I guess the issue is that the only use of the RNG schemas upto now for validation has been by the registry.
Personally from a technical point of view, I think including the errorsegment element in the RNG schemas (this will apply to many commands not just sequence) is the "correct" interpretation. However that means the registry would need to apply it's errorsegment test separately rather than relying on the rng to flag it as invalid. It already does the reverse of this test separately so this makes sense to me (i.e. if I ask for a bad segment do you give me an errorsegment element?).
Cheers,
Andy
On 1 Mar 2011, at 15:54, Jonathan Warren wrote:
> Hi Andrew
>
> I guess this depends on what you are using the RNG for. What are you using it for? The registry needs the result from a sequence request to conform to the sequence.rng as it is on the registry as if a sequence is requested the test region put into the registry should return a sequence, if it returns an ERRORSEGMENT the the registry wants that reported as invalid. If I add the errorsegment to that document as you suggest the registry won't pick it up as invalid.
>
> On 1 Mar 2011, at 15:19, Andrew Roberts wrote:
>
>> Alright, now I'm hitting the lack of support for ERRORSEGMENT within the sequence command response. I think there needs to be a choice inside the oneOrMore, so that elements can be SEQUENCE or ERRORSEGMENT. I attached what I'm using as a testing schema in the meantime, hopefully it's helpful.
>>
>> Andrew
>>
>>
>>
>> On Mar 1, 2011, at 5:09 AM, Jonathan Warren wrote:
>>
>>> This is fixed now.
>>>
>>> On 28 Feb 2011, at 18:34, Andrew Roberts wrote:
>>>
>>>> Hello,
>>>>
>>>> I'm working on some validations, and I think I may have found an error in sequence.rng (www.dasregistry.org/validation1.6/sequence.rng). It appears that this schema makes no provision for the actual text data that belongs to the sequence element (i.e. the bases themselves). Is this a known issue?
>>>>
>>>> If not, I think the solution is to add a element inside the sequence element. Line 10 should read,
>>>>
>>>>
>>>>
>>>> Cheers,
>>>>
>>>> Andrew
>>>> _______________________________________________
>>>> DAS mailing list
>>>> DAS at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/das
>>>
>>> Jonathan Warren
>>> Senior Developer and DAS coordinator
>>> blog: http://biodasman.wordpress.com/
>>> jw12 at sanger.ac.uk
>>> Ext: 2314
>>> Telephone: 01223 492314
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>> The Wellcome Trust Sanger Institute is operated by Genome ResearchLimited, a charity registered in England with number 1021457 and acompany registered in England with number 2742969, whose registeredoffice is 215 Euston Road, London, NW1 2BE.
>>
>
> Jonathan Warren
> Senior Developer and DAS coordinator
> blog: http://biodasman.wordpress.com/
> jw12 at sanger.ac.uk
> Ext: 2314
> Telephone: 01223 492314
>
>
>
>
>
>
>
>
>
> --
> The Wellcome Trust Sanger Institute is operated by Genome ResearchLimited, a charity registered in England with number 1021457 and acompany registered in England with number 2742969, whose registeredoffice is 215 Euston Road, London, NW1 2BE._______________________________________________
> DAS mailing list
> DAS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/das
From aroberts at broadinstitute.org Tue Mar 1 16:20:38 2011
From: aroberts at broadinstitute.org (Andrew Roberts)
Date: Tue, 1 Mar 2011 11:20:38 -0500
Subject: [DAS] RELAX NG schema for "sequence" command
In-Reply-To: <2516E6D2-3584-49AB-8648-14DECF48205D@sanger.ac.uk>
References: <0F015AE7-09D7-4F4D-B547-353830FD82A2@broadinstitute.org>
<563D9C80-555A-4866-9E74-F04FF5458E27@sanger.ac.uk>
<1FD158A0-4032-48A2-AA86-383F9487BB52@broadinstitute.org>
<2516E6D2-3584-49AB-8648-14DECF48205D@sanger.ac.uk>
Message-ID: <3E08E18F-CE0A-4B3F-A3FA-7A0EF5269E95@broadinstitute.org>
Ah, interesting. I'm exercising my implementations of the das commands, and then validating the XML against the rng as a correctness test. I had been assuming that the rng was a description of all possible server responses, including error cases, but I now see what you guys are using them for.
Thanks!
On Mar 1, 2011, at 10:54 AM, Jonathan Warren wrote:
> Hi Andrew
>
> I guess this depends on what you are using the RNG for. What are you using it for? The registry needs the result from a sequence request to conform to the sequence.rng as it is on the registry as if a sequence is requested the test region put into the registry should return a sequence, if it returns an ERRORSEGMENT the the registry wants that reported as invalid. If I add the errorsegment to that document as you suggest the registry won't pick it up as invalid.
>
> On 1 Mar 2011, at 15:19, Andrew Roberts wrote:
>
>> Alright, now I'm hitting the lack of support for ERRORSEGMENT within the sequence command response. I think there needs to be a choice inside the oneOrMore, so that elements can be SEQUENCE or ERRORSEGMENT. I attached what I'm using as a testing schema in the meantime, hopefully it's helpful.
>>
>> Andrew
>>
>>
>>
>> On Mar 1, 2011, at 5:09 AM, Jonathan Warren wrote:
>>
>>> This is fixed now.
>>>
>>> On 28 Feb 2011, at 18:34, Andrew Roberts wrote:
>>>
>>>> Hello,
>>>>
>>>> I'm working on some validations, and I think I may have found an error in sequence.rng (www.dasregistry.org/validation1.6/sequence.rng). It appears that this schema makes no provision for the actual text data that belongs to the sequence element (i.e. the bases themselves). Is this a known issue?
>>>>
>>>> If not, I think the solution is to add a element inside the sequence element. Line 10 should read,
>>>>
>>>>
>>>>
>>>> Cheers,
>>>>
>>>> Andrew
>>>> _______________________________________________
>>>> DAS mailing list
>>>> DAS at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/das
>>>
>>> Jonathan Warren
>>> Senior Developer and DAS coordinator
>>> blog: http://biodasman.wordpress.com/
>>> jw12 at sanger.ac.uk
>>> Ext: 2314
>>> Telephone: 01223 492314
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>> The Wellcome Trust Sanger Institute is operated by Genome ResearchLimited, a charity registered in England with number 1021457 and acompany registered in England with number 2742969, whose registeredoffice is 215 Euston Road, London, NW1 2BE.
>>
>
> Jonathan Warren
> Senior Developer and DAS coordinator
> blog: http://biodasman.wordpress.com/
> jw12 at sanger.ac.uk
> Ext: 2314
> Telephone: 01223 492314
>
>
>
>
>
>
>
>
>
> --
> The Wellcome Trust Sanger Institute is operated by Genome ResearchLimited, a charity registered in England with number 1021457 and acompany registered in England with number 2742969, whose registeredoffice is 215 Euston Road, London, NW1 2BE.
From jw12 at sanger.ac.uk Tue Mar 1 16:33:51 2011
From: jw12 at sanger.ac.uk (Jonathan Warren)
Date: Tue, 1 Mar 2011 16:33:51 +0000
Subject: [DAS] RELAX NG schema for "sequence" command
In-Reply-To: <3E08E18F-CE0A-4B3F-A3FA-7A0EF5269E95@broadinstitute.org>
References: <0F015AE7-09D7-4F4D-B547-353830FD82A2@broadinstitute.org>
<563D9C80-555A-4866-9E74-F04FF5458E27@sanger.ac.uk>
<1FD158A0-4032-48A2-AA86-383F9487BB52@broadinstitute.org>
<2516E6D2-3584-49AB-8648-14DECF48205D@sanger.ac.uk>
<3E08E18F-CE0A-4B3F-A3FA-7A0EF5269E95@broadinstitute.org>
Message-ID: <2CA04BEC-F6A4-4AC4-B30C-06F0D647F235@sanger.ac.uk>
If your data source is publicaly available you can test it by making a
request like this:
http://www.dasregistry.org/validate?url=http://das.sanger.ac.uk/das/otter_das/&testcode=22:19173435,19372173
which returns properties for the status of each command. Or you can
use the validation page on the registry interface to do the same thing.
Some people here are working on some XSD documents in order to use
JAXB which will probably be of use to you as they should be structured
more as you suggest.
We maybe need to make it more clear how we use these RNG documents in
the documentation.
Thanks for the feedback!
On 1 Mar 2011, at 16:20, Andrew Roberts wrote:
> Ah, interesting. I'm exercising my implementations of the das
> commands, and then validating the XML against the rng as a
> correctness test. I had been assuming that the rng was a description
> of all possible server responses, including error cases, but I now
> see what you guys are using them for.
>
> Thanks!
>
>
>
> On Mar 1, 2011, at 10:54 AM, Jonathan Warren
> wrote:
>
>> Hi Andrew
>>
>> I guess this depends on what you are using the RNG for. What are
>> you using it for? The registry needs the result from a sequence
>> request to conform to the sequence.rng as it is on the registry as
>> if a sequence is requested the test region put into the registry
>> should return a sequence, if it returns an ERRORSEGMENT the the
>> registry wants that reported as invalid. If I add the errorsegment
>> to that document as you suggest the registry won't pick it up as
>> invalid.
>>
>> On 1 Mar 2011, at 15:19, Andrew Roberts wrote:
>>
>>> Alright, now I'm hitting the lack of support for ERRORSEGMENT
>>> within the sequence command response. I think there needs to be a
>>> choice inside the oneOrMore, so that elements can be SEQUENCE or
>>> ERRORSEGMENT. I attached what I'm using as a testing schema in
>>> the meantime, hopefully it's helpful.
>>>
>>> Andrew
>>>
>>>
>>>
>>> On Mar 1, 2011, at 5:09 AM, Jonathan Warren wrote:
>>>
>>>> This is fixed now.
>>>>
>>>> On 28 Feb 2011, at 18:34, Andrew Roberts wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> I'm working on some validations, and I think I may have found
>>>>> an error in sequence.rng (www.dasregistry.org/validation1.6/sequence.rng)
>>>>> . It appears that this schema makes no provision for the actual
>>>>> text data that belongs to the sequence element (i.e. the bases
>>>>> themselves). Is this a known issue?
>>>>>
>>>>> If not, I think the solution is to add a element
>>>>> inside the sequence element. Line 10 should read,
>>>>>
>>>>>
>>>>> attribute>
>>>>> attribute>
>>>>> attribute>
>>>>> text>
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Andrew
>>>>> _______________________________________________
>>>>> DAS mailing list
>>>>> DAS at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/das
>>>>
>>>> Jonathan Warren
>>>> Senior Developer and DAS coordinator
>>>> blog: http://biodasman.wordpress.com/
>>>> jw12 at sanger.ac.uk
>>>> Ext: 2314
>>>> Telephone: 01223 492314
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> The Wellcome Trust Sanger Institute is operated by Genome
>>>> ResearchLimited, a charity registered in England with number
>>>> 1021457 and acompany registered in England with number 2742969,
>>>> whose registeredoffice is 215 Euston Road, London, NW1 2BE.
>>>
>>
>> Jonathan Warren
>> Senior Developer and DAS coordinator
>> blog: http://biodasman.wordpress.com/
>> jw12 at sanger.ac.uk
>> Ext: 2314
>> Telephone: 01223 492314
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>> The Wellcome Trust Sanger Institute is operated by Genome
>> ResearchLimited, a charity registered in England with number
>> 1021457 and acompany registered in England with number 2742969,
>> whose registeredoffice is 215 Euston Road, London, NW1 2BE.
Jonathan Warren
Senior Developer and DAS coordinator
blog: http://biodasman.wordpress.com/
jw12 at sanger.ac.uk
Ext: 2314
Telephone: 01223 492314
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
From thomas.a.down at gmail.com Sat Mar 5 15:01:42 2011
From: thomas.a.down at gmail.com (Thomas Down)
Date: Sat, 5 Mar 2011 15:01:42 +0000
Subject: [DAS] Adjacent feature extension
Message-ID:
Following on from a discussion at the DAS Workshop:
DAS doesn't have offer any specific support for clients that provide
mechanisms for skipping from the current position to the "next" or
"previous" feature. I'd like to propose a small extension (one extra filter
option on the existing "features" command) to facilitate this. Full details
are here:
https://github.com/dasmoth/dalliance/wiki/AdjacentFeatures
...but briefly, a request like:
/das/features?adjacent=chr21:30000000
...would be expected to return a standard DASGFF document containing the two
features either side of the specified point.
All comments are welcome! One open question is whether the query should *
just* return the adjacent features, or should also return other features
overlapping the adjacent feature. My preference is for them former, but the
latter does have the merit of being quirk-wards compatible with the existing
feature_id filter. Does anyone else have strong feelings one way or
another.
There isn't currently a full implementation of this, but if nobody comes up
with major objections, I'm hoping to try implementations in Dazzle (server)
and Dalliance (client) within the next couple of weeks.
Thomas.
From andy.jenkinson at ebi.ac.uk Mon Mar 7 10:04:08 2011
From: andy.jenkinson at ebi.ac.uk (Andy Jenkinson)
Date: Mon, 7 Mar 2011 10:04:08 +0000
Subject: [DAS] Adjacent feature extension
In-Reply-To:
References:
Message-ID: <4CB1186B-D5F0-4750-A297-88538BB196B5@ebi.ac.uk>
Hi Thomas,
Thanks for this. Regarding the option of whether to return just one feature per side or all overlapping features, the only other advantage that immediately springs to mind for the latter (in addition to some measure of consistency, as you mention) is that it allows the client to immediately render the exact region of that feature without triggering another request. It would generally mean changing zoom level. I'm can't say if clients are likely to follow this mechanism as opposed to, say, pan and centre on the feature, but if they wanted to it would be more efficient (and possibly a little bit more efficient anyway depending on how your client does its requests).
Disadvantages I can think of:
- "adjacent" request takes marginally longer
- not quite as obvious what clients should put in their UI controls - need to pick a feature to be able to do "jump to BRCA1"
- risk of servers not implementing it correctly and only returning one feature anyway (although I don't think this is likely as the concept is different to "feature-by-id")
Some things to further define:
- servers can't return a fake feature
- should servers return features on different reference sequences if there are none one the current one?
- how should servers treat features that overlap the adjacent range? Treat them as the adjacent feature to return, or only include features completely outside the query range? What if the next feature completely outside the query range is part of the same feature hierarchy (e.g. an exon outside the current window).
Any thoughts from anyone on these?
Cheers,
Andy
On 5 Mar 2011, at 15:01, Thomas Down wrote:
> Following on from a discussion at the DAS Workshop:
>
> DAS doesn't have offer any specific support for clients that provide
> mechanisms for skipping from the current position to the "next" or
> "previous" feature. I'd like to propose a small extension (one extra filter
> option on the existing "features" command) to facilitate this. Full details
> are here:
>
> https://github.com/dasmoth/dalliance/wiki/AdjacentFeatures
>
> ...but briefly, a request like:
>
> /das/features?adjacent=chr21:30000000
>
> ...would be expected to return a standard DASGFF document containing the two
> features either side of the specified point.
>
> All comments are welcome! One open question is whether the query should *
> just* return the adjacent features, or should also return other features
> overlapping the adjacent feature. My preference is for them former, but the
> latter does have the merit of being quirk-wards compatible with the existing
> feature_id filter. Does anyone else have strong feelings one way or
> another.
>
> There isn't currently a full implementation of this, but if nobody comes up
> with major objections, I'm hoping to try implementations in Dazzle (server)
> and Dalliance (client) within the next couple of weeks.
>
> Thomas.
> _______________________________________________
> DAS mailing list
> DAS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/das
From thomas.a.down at gmail.com Mon Mar 7 10:35:02 2011
From: thomas.a.down at gmail.com (Thomas Down)
Date: Mon, 7 Mar 2011 10:35:02 +0000
Subject: [DAS] Adjacent feature extension
In-Reply-To: <4CB1186B-D5F0-4750-A297-88538BB196B5@ebi.ac.uk>
References:
<4CB1186B-D5F0-4750-A297-88538BB196B5@ebi.ac.uk>
Message-ID:
On Mon, Mar 7, 2011 at 10:04 AM, Andy Jenkinson wrote:
> Hi Thomas,
>
> Thanks for this. Regarding the option of whether to return just one feature
> per side or all overlapping features, the only other advantage that
> immediately springs to mind for the latter (in addition to some measure of
> consistency, as you mention) is that it allows the client to immediately
> render the exact region of that feature without triggering another request.
> It would generally mean changing zoom level. I'm can't say if clients are
> likely to follow this mechanism as opposed to, say, pan and centre on the
> feature, but if they wanted to it would be more efficient (and possibly a
> little bit more efficient anyway depending on how your client does its
> requests).
>
Yep, I agree. I'd be interested to learn whether there are any clients that
would seriously consider taking advantage of this. My own thinking is that
even if we do adjust zoom level (as Dalliance sometimes does, e.g. in the
"jump to gene..." navigation op), clients are much more likely to zoom to a
view that contains the target feature plus a "sensible" amount of flanking
sequence, rather than a view where the target feature is perfectly framed.
Furthermore, this rather seems like optimizing for the case where only one
annotation source is active. Surely we're talking about the
*distributed*annotation system, and clients will still have to go off
and query all the
other annotation sources, even if they are able to skip the one which
responded to the "adjacent" query. So long as there's some kind of query
parallelization in place, this probably isn't a performance issue.
Do any other client developers feel differently?
> Disadvantages I can think of:
> - "adjacent" request takes marginally longer
> - not quite as obvious what clients should put in their UI controls - need
> to pick a feature to be able to do "jump to BRCA1"
> - risk of servers not implementing it correctly and only returning one
> feature anyway (although I don't think this is likely as the concept is
> different to "feature-by-id")
>
> Some things to further define:
> - servers can't return a fake feature
>
Yep, will clarify this.
> - should servers return features on different reference sequences if there
> are none one the current one?
>
In my opinion, absolutely yes. Otherwise the "10 features in the genome"
case remains a massive pain (and potentially a disaster, for
inhomogeneous-dstributed data; won't someone think of the MHC tiling arrays?
:-). And even worse for the "10 features in UniProt" case (where I can also
see this feature being quite interesting).
I've tried to be explicit about this in my proposal (see the penultimate
paragraph + example 3), but any suggestions for further clarifications are
welcome.
> - how should servers treat features that overlap the adjacent range? Treat
> them as the adjacent feature to return, or only include features completely
> outside the query range? What if the next feature completely outside the
> query range is part of the same feature hierarchy (e.g. an exon outside the
> current window).
>
It's a point rather than a range, but yes I agree this is still an open
question. I'd actually written the spec such that overlapping features do
get returned (on the assumption that clients will do "trivial" cases of
next/previous feature in-memory without a network round trip), but again if
other client developers do things differently, I'd like to know.
I think "include overlapping" will have less special-cases to worry about,
though. e.g. the PART/PARENT issue you allude to. Let clients deal with
that ("dumb servers, smart clients").
Thomas.
From ljgarcia at ebi.ac.uk Mon Mar 7 10:49:09 2011
From: ljgarcia at ebi.ac.uk (Leyla Garcia)
Date: Mon, 07 Mar 2011 10:49:09 +0000
Subject: [DAS] Adjacent feature extension
In-Reply-To:
References: <4CB1186B-D5F0-4750-A297-88538BB196B5@ebi.ac.uk>
Message-ID: <4D74B825.6080509@ebi.ac.uk>
On 07/03/2011 10:35, Thomas Down wrote:
> On Mon, Mar 7, 2011 at 10:04 AM, Andy Jenkinsonwrote:
>
>> Hi Thomas,
>>
>> Thanks for this. Regarding the option of whether to return just one feature
>> per side or all overlapping features, the only other advantage that
>> immediately springs to mind for the latter (in addition to some measure of
>> consistency, as you mention) is that it allows the client to immediately
>> render the exact region of that feature without triggering another request.
>> It would generally mean changing zoom level. I'm can't say if clients are
>> likely to follow this mechanism as opposed to, say, pan and centre on the
>> feature, but if they wanted to it would be more efficient (and possibly a
>> little bit more efficient anyway depending on how your client does its
>> requests).
>>
> Yep, I agree. I'd be interested to learn whether there are any clients that
> would seriously consider taking advantage of this. My own thinking is that
> even if we do adjust zoom level (as Dalliance sometimes does, e.g. in the
> "jump to gene..." navigation op), clients are much more likely to zoom to a
> view that contains the target feature plus a "sensible" amount of flanking
> sequence, rather than a view where the target feature is perfectly framed.
>
> Furthermore, this rather seems like optimizing for the case where only one
> annotation source is active. Surely we're talking about the
> *distributed*annotation system, and clients will still have to go off
> and query all the
> other annotation sources, even if they are able to skip the one which
> responded to the "adjacent" query. So long as there's some kind of query
> parallelization in place, this probably isn't a performance issue.
>
> Do any other client developers feel differently?
So we can include or exclude the overlapping features. What about adding
another filter? "adjacent" would retrieve only those around the feature
requested and "overlapping" would returns the overlaps? Using both would
be interpreted as OR. Does is make sense?
Would that be interesting for people using this type of filters?
>> Disadvantages I can think of:
>> - "adjacent" request takes marginally longer
>> - not quite as obvious what clients should put in their UI controls - need
>> to pick a feature to be able to do "jump to BRCA1"
>> - risk of servers not implementing it correctly and only returning one
>> feature anyway (although I don't think this is likely as the concept is
>> different to "feature-by-id")
>>
>> Some things to further define:
>> - servers can't return a fake feature
> Yep, will clarify this.
>
>> - should servers return features on different reference sequences if there
>> are none one the current one?
> In my opinion, absolutely yes. Otherwise the "10 features in the genome"
> case remains a massive pain (and potentially a disaster, for
> inhomogeneous-dstributed data; won't someone think of the MHC tiling arrays?
> :-). And even worse for the "10 features in UniProt" case (where I can also
> see this feature being quite interesting).
Mmm, I do not understand it, "10 features in UniProt" case? "10 features
in the genome" case? Could you please some more information about it?
> I've tried to be explicit about this in my proposal (see the penultimate
> paragraph + example 3), but any suggestions for further clarifications are
> welcome.
>
>
>> - how should servers treat features that overlap the adjacent range? Treat
>> them as the adjacent feature to return, or only include features completely
>> outside the query range? What if the next feature completely outside the
>> query range is part of the same feature hierarchy (e.g. an exon outside the
>> current window).
>>
> It's a point rather than a range, but yes I agree this is still an open
> question. I'd actually written the spec such that overlapping features do
> get returned (on the assumption that clients will do "trivial" cases of
> next/previous feature in-memory without a network round trip), but again if
> other client developers do things differently, I'd like to know.
>
> I think "include overlapping" will have less special-cases to worry about,
> though. e.g. the PART/PARENT issue you allude to. Let clients deal with
> that ("dumb servers, smart clients").
>
> Thomas.
> _______________________________________________
> DAS mailing list
> DAS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/das
From jw12 at sanger.ac.uk Mon Mar 7 10:57:48 2011
From: jw12 at sanger.ac.uk (Jonathan Warren)
Date: Mon, 7 Mar 2011 10:57:48 +0000
Subject: [DAS] Adjacent feature extension
In-Reply-To:
References:
<4CB1186B-D5F0-4750-A297-88538BB196B5@ebi.ac.uk>
Message-ID:
On 7 Mar 2011, at 10:35, Thomas Down wrote:
> On Mon, Mar 7, 2011 at 10:04 AM, Andy Jenkinson >wrote:
>
>> Hi Thomas,
>>
>> Thanks for this. Regarding the option of whether to return just one
>> feature
>> per side or all overlapping features, the only other advantage that
>> immediately springs to mind for the latter (in addition to some
>> measure of
>> consistency, as you mention) is that it allows the client to
>> immediately
>> render the exact region of that feature without triggering another
>> request.
>> It would generally mean changing zoom level. I'm can't say if
>> clients are
>> likely to follow this mechanism as opposed to, say, pan and centre
>> on the
>> feature, but if they wanted to it would be more efficient (and
>> possibly a
>> little bit more efficient anyway depending on how your client does
>> its
>> requests).
>>
>
> Yep, I agree. I'd be interested to learn whether there are any
> clients that
> would seriously consider taking advantage of this. My own thinking
> is that
> even if we do adjust zoom level (as Dalliance sometimes does, e.g.
> in the
> "jump to gene..." navigation op), clients are much more likely to
> zoom to a
> view that contains the target feature plus a "sensible" amount of
> flanking
> sequence, rather than a view where the target feature is perfectly
> framed.
>
> Furthermore, this rather seems like optimizing for the case where
> only one
> annotation source is active. Surely we're talking about the
> *distributed*annotation system, and clients will still have to go off
> and query all the
> other annotation sources, even if they are able to skip the one which
> responded to the "adjacent" query. So long as there's some kind of
> query
> parallelization in place, this probably isn't a performance issue.
My vote would ideally to change feature_by_id to return one feature
and have the adjacent_feature as returning one feature. This in my
opinion would mean these capabilities on servers do "exactly as they
say on the tin" and would be easier to implement for data providers
and are thus more likely to be implemented?
If the feature_id capability as it stands is needed it could be
changed to something more akin to what it means like feature_id_region
but I would bet no one would bother to change it/use it?
However the reality is that we are too late to change the old
feature_by_id, but I don't think we need to make the same mistake
twice by repeating it for adjacent_features?
>
> Do any other client developers feel differently?
>
>
>> Disadvantages I can think of:
>> - "adjacent" request takes marginally longer
>> - not quite as obvious what clients should put in their UI controls
>> - need
>> to pick a feature to be able to do "jump to BRCA1"
>> - risk of servers not implementing it correctly and only returning
>> one
>> feature anyway (although I don't think this is likely as the
>> concept is
>> different to "feature-by-id")
>>
>> Some things to further define:
>> - servers can't return a fake feature
>>
>
> Yep, will clarify this.
>
>
>> - should servers return features on different reference sequences
>> if there
>> are none one the current one?
>>
>
> In my opinion, absolutely yes. Otherwise the "10 features in the
> genome"
> case remains a massive pain (and potentially a disaster, for
> inhomogeneous-dstributed data; won't someone think of the MHC tiling
> arrays?
> :-). And even worse for the "10 features in UniProt" case (where I
> can also
> see this feature being quite interesting).
>
> I've tried to be explicit about this in my proposal (see the
> penultimate
> paragraph + example 3), but any suggestions for further
> clarifications are
> welcome.
>
>
>> - how should servers treat features that overlap the adjacent
>> range? Treat
>> them as the adjacent feature to return, or only include features
>> completely
>> outside the query range? What if the next feature completely
>> outside the
>> query range is part of the same feature hierarchy (e.g. an exon
>> outside the
>> current window).
>>
>
> It's a point rather than a range, but yes I agree this is still an
> open
> question. I'd actually written the spec such that overlapping
> features do
> get returned (on the assumption that clients will do "trivial" cases
> of
> next/previous feature in-memory without a network round trip), but
> again if
> other client developers do things differently, I'd like to know.
>
> I think "include overlapping" will have less special-cases to worry
> about,
> though. e.g. the PART/PARENT issue you allude to. Let clients deal
> with
> that ("dumb servers, smart clients").
>
> Thomas.
> _______________________________________________
> DAS mailing list
> DAS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/das
Jonathan Warren
Senior Developer and DAS coordinator
blog: http://biodasman.wordpress.com/
jw12 at sanger.ac.uk
Ext: 2314
Telephone: 01223 492314
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
From andy.jenkinson at ebi.ac.uk Mon Mar 7 11:19:22 2011
From: andy.jenkinson at ebi.ac.uk (Andy Jenkinson)
Date: Mon, 7 Mar 2011 11:19:22 +0000
Subject: [DAS] Adjacent feature extension
In-Reply-To: