From rafael at ebi.ac.uk Thu Jul 1 06:19:16 2010 From: rafael at ebi.ac.uk (Rafael Jimenez) Date: Thu, 1 Jul 2010 11:19:16 +0100 Subject: [DAS] PDB-Uniprot alignment sources Message-ID: Hi everybody, For those of you interested to use a PDB-Uniport alignment/mapping DAS source. As far as I remember the registry has referenced in total three PDB-Uniprot alignment sources. "biojavapdbuniprot" stopped working around October 2009 and "alig_pdb_sp" stooped working in June 2010. Andreas set up a new PDB-Uniport alignment source in November 2009 named "pdb_uniprot_sifts_mapping" which is a replacement for the previous ones. You can find more details about this mapping server in the DAS registry ... http://www.dasregistry.org/showdetails.jsp?auto_id=DS_861 Thanks to Andreas for maintaining this useful server. Best regards, Rafael. From jw12 at sanger.ac.uk Tue Jul 20 11:59:36 2010 From: jw12 at sanger.ac.uk (Jonathan Warren) Date: Tue, 20 Jul 2010 16:59:36 +0100 Subject: [DAS] Registry and Sanger new features/developments Message-ID: The registry has some new features: Cross Origin Resource Sharing (CORS) headers are now tested by the registry: The http://www.dasregistry.org/das/sources response has or for data sources that have the correct header ("Access- Control-Allow-Origin:*"). These headers should make it easier for JavaScript based DAS clients to make requests from DAS sources using ajax and alleviate the need for proxy use for cross client requests. The registry web service responses also have cors headers. 16 of the many UCSC sources are now available in the registry (created via a sources document at www.dasregistry.org/ucsc/das/sources). Note that currently clients will want to filter these sources using types filters as the UCSC sources work slightly differently to most of the sources available in the registry. Each UCSC source effectively contains many tracks (we are currently working with the UCSC to change this). However if the DAS specification is adhered to both clients and servers should support the types filtering anyway. New web service requests available for clients to interact with the registry: * added rows capability for the sources cmd http://www.dasregistry.org/das/sources?rows=1-2 returns the first two sources with a number for the total sources available and the start and end rows/numbers requested * extra command for registry dasregistry/das/knownCapabilities shows what capabilities are known to the registry. * added keywords parameter to sources cmds e.g. http://www.dasregistry.org/das/sources?keywords=zebrafish will filter by sources that contain the keywords zebrafish * added keywords parameter to coordinatesystem command http://www.dasregistry.org/das/coordinatesystem?keywords=gorilla * added total, start, end attributes to coordinatesystem request response if rows specified http://www.dasregistry.org/das/coordinatesystem?keywords=gene&rows=1-10 A sources document exists for the sanger das sources here http://das.sanger.ac.uk/das/sources which contains all the recent sources that have a valid coordinate systems and should make it easier for clients to interact with the many sanger das sources. For any more information please don't hesitate to contact me or read my blog at http://biodasman.wordpress.com Thanks Jonathan. Jonathan Warren Senior Developer and DAS coordinator blog: http://biodasman.wordpress.com/ jonathan.warren at sanger.ac.uk -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From andy.jenkinson at ebi.ac.uk Fri Jul 30 11:00:44 2010 From: andy.jenkinson at ebi.ac.uk (Andy Jenkinson) Date: Fri, 30 Jul 2010 16:00:44 +0100 Subject: [DAS] querying for nonpositional annotations Message-ID: Hi list, I'd like to canvas for opinion on what is "expected" behaviour concerning how servers respond to features queries in some circumstances. So, bear with me... Since protein DAS came along, the format of the segment parameter no longer requires the start and end position. Personally, my take on this has always been that this means that these two queries are effectively identical: /das/source/features?segment=P99999 /das/source/features?segment=P99999:1,150 [i.e. the full length of the protein] This interpretation is reflected in the 1.6 spec, with the advantage that it is quite easy to explain the behaviour (use parts of the spec also use a "if omitted, the default is XYX" paradigm). If you follow this interpretation, nonpositional features (those without start/end positions) will always be returned no matter what the start and end are, or indeed whether they are included (since behaviour in both cases is identical). For me this is fine because nonpositional features are annotations that apply to the whole sequence/object, which includes all parts of it. Leyla came to a different view from me, reasoning that only the former of these queries should return nonpositional features because they are really outside the range of the second request. She has suggested that queries like these could be possible ways to further control whether nonpositional and/or positional features are returned: /das/source/features?segment=P99999:0,0 [only nonpositional features] /das/source/features?segment=P99999:0,150 [all positional and nonpositional features] I guess this would be an additional functionality as I do not know of any clients or servers that use zero as a meaningful query range, but i recognise the logic so would like to know: what do others think? How do your existing servers function? Sorry for the long email (as usual!) Cheers, Andy From ek at ebi.ac.uk Fri Jul 30 11:17:46 2010 From: ek at ebi.ac.uk (Eugene Kulesha) Date: Fri, 30 Jul 2010 16:17:46 +0100 Subject: [DAS] querying for nonpositional annotations In-Reply-To: References: Message-ID: <4C52ED1A.9050900@ebi.ac.uk> IMHO zero should not be used as a way to filter out positional/non-positional features 'type' can be used to filter out what is not needed, i.e /das/source/features?segment=P99999 returns all features /das/source/features?segment=P99999:1,150 returns all features within 1..150 region (i,e the positional features) /das/source/features?segment=P99999;type=annotation returns all features that have type annotation ( or non-positional or whatever ) cheers E On 30/07/2010 16:00, Andy Jenkinson wrote: > Hi list, > > I'd like to canvas for opinion on what is "expected" behaviour concerning how servers respond to features queries in some circumstances. So, bear with me... > > Since protein DAS came along, the format of the segment parameter no longer requires the start and end position. Personally, my take on this has always been that this means that these two queries are effectively identical: > /das/source/features?segment=P99999 > /das/source/features?segment=P99999:1,150 [i.e. the full length of the protein] > This interpretation is reflected in the 1.6 spec, with the advantage that it is quite easy to explain the behaviour (use parts of the spec also use a "if omitted, the default is XYX" paradigm). If you follow this interpretation, nonpositional features (those without start/end positions) will always be returned no matter what the start and end are, or indeed whether they are included (since behaviour in both cases is identical). For me this is fine because nonpositional features are annotations that apply to the whole sequence/object, which includes all parts of it. > > Leyla came to a different view from me, reasoning that only the former of these queries should return nonpositional features because they are really outside the range of the second request. She has suggested that queries like these could be possible ways to further control whether nonpositional and/or positional features are returned: > /das/source/features?segment=P99999:0,0 [only nonpositional features] > /das/source/features?segment=P99999:0,150 [all positional and nonpositional features] > > I guess this would be an additional functionality as I do not know of any clients or servers that use zero as a meaningful query range, but i recognise the logic so would like to know: what do others think? How do your existing servers function? > > Sorry for the long email (as usual!) > > Cheers, > Andy > _______________________________________________ > DAS mailing list > DAS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das > From andy.jenkinson at ebi.ac.uk Fri Jul 30 11:57:10 2010 From: andy.jenkinson at ebi.ac.uk (Andy Jenkinson) Date: Fri, 30 Jul 2010 16:57:10 +0100 Subject: [DAS] querying for nonpositional annotations In-Reply-To: References: Message-ID: <9FD87196-C80D-4C59-BE54-374A6CD71D68@ebi.ac.uk> On 30 Jul 2010, at 16:35, Thomas Down wrote: > I'm going to go with Eugene, at least to the extent that treating '0' as a magic index is ugly (and would seem to explicitly exclude the possibility of ever applying DAS to coordinate systems which aren't 1..n). A fair point. > Some of this comes down to whether positional and non-positional features are likely to co-exist on a single DSN. My initial expectation is that they don't -- but I'm not all that familiar with the conventions of protein DAS. > > If a given DSN is always either positional or non-positional, this seems like a non-issue to me. > > If they do coexist then: > > a) I'd be quite interested to know more about the use-cases for this so I can take them into account in my code. For some coordinate systems/servers it is either/or, but for others (specifically UniProt) the same source can provide both. For example, the location of SNPs (positional) and a publication citation (nonpositional). > b) I think there should be some way to distinguish between positional and non-positional (in the TYPES response?) so Eugene's suggestion of filtering on types is workable. > > I'm agnostic about whether there should be a difference between P99999 and P99999:1,150... but if it helps, it would never occur to me to specify the start and end when I'm expecting non-positional annotations. I guess what is most important is the reverse of this: i.e., would you be surprised to see nonpositional annotations if you specified P99999:1,150? I just checked uniprot's server and it does not return them if you do this. So either the spec as written needs to be changed to state that nonpositional features will not be returned if you specify start/end, or the uniprot server must be changed to return them. Either way, there is currently no way to get nonpositional features by themselves. Eugene's suggestion works if they can be identified in the types response as you suggest, but to be honest I can't say it's ever been a problem for me. I just request the whole segment, they're small enough that getting all the positional features too isn't an issue. Is anyone else bothered enough to add this? > Thomas. > > > > On Fri, Jul 30, 2010 at 4:00 PM, Andy Jenkinson wrote: > Hi list, > > I'd like to canvas for opinion on what is "expected" behaviour concerning how servers respond to features queries in some circumstances. So, bear with me... > > Since protein DAS came along, the format of the segment parameter no longer requires the start and end position. Personally, my take on this has always been that this means that these two queries are effectively identical: > /das/source/features?segment=P99999 > /das/source/features?segment=P99999:1,150 [i.e. the full length of the protein] > This interpretation is reflected in the 1.6 spec, with the advantage that it is quite easy to explain the behaviour (use parts of the spec also use a "if omitted, the default is XYX" paradigm). If you follow this interpretation, nonpositional features (those without start/end positions) will always be returned no matter what the start and end are, or indeed whether they are included (since behaviour in both cases is identical). For me this is fine because nonpositional features are annotations that apply to the whole sequence/object, which includes all parts of it. > > Leyla came to a different view from me, reasoning that only the former of these queries should return nonpositional features because they are really outside the range of the second request. She has suggested that queries like these could be possible ways to further control whether nonpositional and/or positional features are returned: > /das/source/features?segment=P99999:0,0 [only nonpositional features] > /das/source/features?segment=P99999:0,150 [all positional and nonpositional features] > > I guess this would be an additional functionality as I do not know of any clients or servers that use zero as a meaningful query range, but i recognise the logic so would like to know: what do others think? How do your existing servers function? > > Sorry for the long email (as usual!) > > Cheers, > Andy > _______________________________________________ > DAS mailing list > DAS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das > From David.Messina at sbc.su.se Fri Jul 30 15:33:40 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 30 Jul 2010 15:33:40 -0400 Subject: [DAS] querying for nonpositional annotations In-Reply-To: <9FD87196-C80D-4C59-BE54-374A6CD71D68@ebi.ac.uk> References: <9FD87196-C80D-4C59-BE54-374A6CD71D68@ebi.ac.uk> Message-ID: I too agree with Eugene. No magic numbers. Types can be used for filtering, and actually you get more fine-grained control than simply positional or non-positional. (I use this technique now in DASher.) * In my opinion, the current spec as written is correct. That is, non-positional features don't just apply to the whole sequence, they apply to any part of the sequence. As an example, consider a journal reference ? a particular protein was isolated by a lab, they wrote a paper about it, and deposited the protein sequence in a database. If you look at a subsequence of the protein sequence, that subsequence still derives from the paper, right? So therefore the feature containing that journal reference should still be attached to the subsequence. On that basis, I think the uniprot server is technically doing it wrong and should be changed, although I have to say that in practice it hasn't been an issue for me. Dave * It might be nice, though, to add 'positional' and 'non-positional' types, which would be a way to grab all of the existing positional or non-positional types in one go. (currently it's necessary to specify multiple types to get the same functionality.) From rafael at ebi.ac.uk Thu Jul 1 10:19:16 2010 From: rafael at ebi.ac.uk (Rafael Jimenez) Date: Thu, 1 Jul 2010 11:19:16 +0100 Subject: [DAS] PDB-Uniprot alignment sources Message-ID: Hi everybody, For those of you interested to use a PDB-Uniport alignment/mapping DAS source. As far as I remember the registry has referenced in total three PDB-Uniprot alignment sources. "biojavapdbuniprot" stopped working around October 2009 and "alig_pdb_sp" stooped working in June 2010. Andreas set up a new PDB-Uniport alignment source in November 2009 named "pdb_uniprot_sifts_mapping" which is a replacement for the previous ones. You can find more details about this mapping server in the DAS registry ... http://www.dasregistry.org/showdetails.jsp?auto_id=DS_861 Thanks to Andreas for maintaining this useful server. Best regards, Rafael. From jw12 at sanger.ac.uk Tue Jul 20 15:59:36 2010 From: jw12 at sanger.ac.uk (Jonathan Warren) Date: Tue, 20 Jul 2010 16:59:36 +0100 Subject: [DAS] Registry and Sanger new features/developments Message-ID: The registry has some new features: Cross Origin Resource Sharing (CORS) headers are now tested by the registry: The http://www.dasregistry.org/das/sources response has or for data sources that have the correct header ("Access- Control-Allow-Origin:*"). These headers should make it easier for JavaScript based DAS clients to make requests from DAS sources using ajax and alleviate the need for proxy use for cross client requests. The registry web service responses also have cors headers. 16 of the many UCSC sources are now available in the registry (created via a sources document at www.dasregistry.org/ucsc/das/sources). Note that currently clients will want to filter these sources using types filters as the UCSC sources work slightly differently to most of the sources available in the registry. Each UCSC source effectively contains many tracks (we are currently working with the UCSC to change this). However if the DAS specification is adhered to both clients and servers should support the types filtering anyway. New web service requests available for clients to interact with the registry: * added rows capability for the sources cmd http://www.dasregistry.org/das/sources?rows=1-2 returns the first two sources with a number for the total sources available and the start and end rows/numbers requested * extra command for registry dasregistry/das/knownCapabilities shows what capabilities are known to the registry. * added keywords parameter to sources cmds e.g. http://www.dasregistry.org/das/sources?keywords=zebrafish will filter by sources that contain the keywords zebrafish * added keywords parameter to coordinatesystem command http://www.dasregistry.org/das/coordinatesystem?keywords=gorilla * added total, start, end attributes to coordinatesystem request response if rows specified http://www.dasregistry.org/das/coordinatesystem?keywords=gene&rows=1-10 A sources document exists for the sanger das sources here http://das.sanger.ac.uk/das/sources which contains all the recent sources that have a valid coordinate systems and should make it easier for clients to interact with the many sanger das sources. For any more information please don't hesitate to contact me or read my blog at http://biodasman.wordpress.com Thanks Jonathan. Jonathan Warren Senior Developer and DAS coordinator blog: http://biodasman.wordpress.com/ jonathan.warren at sanger.ac.uk -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From andy.jenkinson at ebi.ac.uk Fri Jul 30 15:00:44 2010 From: andy.jenkinson at ebi.ac.uk (Andy Jenkinson) Date: Fri, 30 Jul 2010 16:00:44 +0100 Subject: [DAS] querying for nonpositional annotations Message-ID: Hi list, I'd like to canvas for opinion on what is "expected" behaviour concerning how servers respond to features queries in some circumstances. So, bear with me... Since protein DAS came along, the format of the segment parameter no longer requires the start and end position. Personally, my take on this has always been that this means that these two queries are effectively identical: /das/source/features?segment=P99999 /das/source/features?segment=P99999:1,150 [i.e. the full length of the protein] This interpretation is reflected in the 1.6 spec, with the advantage that it is quite easy to explain the behaviour (use parts of the spec also use a "if omitted, the default is XYX" paradigm). If you follow this interpretation, nonpositional features (those without start/end positions) will always be returned no matter what the start and end are, or indeed whether they are included (since behaviour in both cases is identical). For me this is fine because nonpositional features are annotations that apply to the whole sequence/object, which includes all parts of it. Leyla came to a different view from me, reasoning that only the former of these queries should return nonpositional features because they are really outside the range of the second request. She has suggested that queries like these could be possible ways to further control whether nonpositional and/or positional features are returned: /das/source/features?segment=P99999:0,0 [only nonpositional features] /das/source/features?segment=P99999:0,150 [all positional and nonpositional features] I guess this would be an additional functionality as I do not know of any clients or servers that use zero as a meaningful query range, but i recognise the logic so would like to know: what do others think? How do your existing servers function? Sorry for the long email (as usual!) Cheers, Andy From ek at ebi.ac.uk Fri Jul 30 15:17:46 2010 From: ek at ebi.ac.uk (Eugene Kulesha) Date: Fri, 30 Jul 2010 16:17:46 +0100 Subject: [DAS] querying for nonpositional annotations In-Reply-To: References: Message-ID: <4C52ED1A.9050900@ebi.ac.uk> IMHO zero should not be used as a way to filter out positional/non-positional features 'type' can be used to filter out what is not needed, i.e /das/source/features?segment=P99999 returns all features /das/source/features?segment=P99999:1,150 returns all features within 1..150 region (i,e the positional features) /das/source/features?segment=P99999;type=annotation returns all features that have type annotation ( or non-positional or whatever ) cheers E On 30/07/2010 16:00, Andy Jenkinson wrote: > Hi list, > > I'd like to canvas for opinion on what is "expected" behaviour concerning how servers respond to features queries in some circumstances. So, bear with me... > > Since protein DAS came along, the format of the segment parameter no longer requires the start and end position. Personally, my take on this has always been that this means that these two queries are effectively identical: > /das/source/features?segment=P99999 > /das/source/features?segment=P99999:1,150 [i.e. the full length of the protein] > This interpretation is reflected in the 1.6 spec, with the advantage that it is quite easy to explain the behaviour (use parts of the spec also use a "if omitted, the default is XYX" paradigm). If you follow this interpretation, nonpositional features (those without start/end positions) will always be returned no matter what the start and end are, or indeed whether they are included (since behaviour in both cases is identical). For me this is fine because nonpositional features are annotations that apply to the whole sequence/object, which includes all parts of it. > > Leyla came to a different view from me, reasoning that only the former of these queries should return nonpositional features because they are really outside the range of the second request. She has suggested that queries like these could be possible ways to further control whether nonpositional and/or positional features are returned: > /das/source/features?segment=P99999:0,0 [only nonpositional features] > /das/source/features?segment=P99999:0,150 [all positional and nonpositional features] > > I guess this would be an additional functionality as I do not know of any clients or servers that use zero as a meaningful query range, but i recognise the logic so would like to know: what do others think? How do your existing servers function? > > Sorry for the long email (as usual!) > > Cheers, > Andy > _______________________________________________ > DAS mailing list > DAS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das > From andy.jenkinson at ebi.ac.uk Fri Jul 30 15:57:10 2010 From: andy.jenkinson at ebi.ac.uk (Andy Jenkinson) Date: Fri, 30 Jul 2010 16:57:10 +0100 Subject: [DAS] querying for nonpositional annotations In-Reply-To: References: Message-ID: <9FD87196-C80D-4C59-BE54-374A6CD71D68@ebi.ac.uk> On 30 Jul 2010, at 16:35, Thomas Down wrote: > I'm going to go with Eugene, at least to the extent that treating '0' as a magic index is ugly (and would seem to explicitly exclude the possibility of ever applying DAS to coordinate systems which aren't 1..n). A fair point. > Some of this comes down to whether positional and non-positional features are likely to co-exist on a single DSN. My initial expectation is that they don't -- but I'm not all that familiar with the conventions of protein DAS. > > If a given DSN is always either positional or non-positional, this seems like a non-issue to me. > > If they do coexist then: > > a) I'd be quite interested to know more about the use-cases for this so I can take them into account in my code. For some coordinate systems/servers it is either/or, but for others (specifically UniProt) the same source can provide both. For example, the location of SNPs (positional) and a publication citation (nonpositional). > b) I think there should be some way to distinguish between positional and non-positional (in the TYPES response?) so Eugene's suggestion of filtering on types is workable. > > I'm agnostic about whether there should be a difference between P99999 and P99999:1,150... but if it helps, it would never occur to me to specify the start and end when I'm expecting non-positional annotations. I guess what is most important is the reverse of this: i.e., would you be surprised to see nonpositional annotations if you specified P99999:1,150? I just checked uniprot's server and it does not return them if you do this. So either the spec as written needs to be changed to state that nonpositional features will not be returned if you specify start/end, or the uniprot server must be changed to return them. Either way, there is currently no way to get nonpositional features by themselves. Eugene's suggestion works if they can be identified in the types response as you suggest, but to be honest I can't say it's ever been a problem for me. I just request the whole segment, they're small enough that getting all the positional features too isn't an issue. Is anyone else bothered enough to add this? > Thomas. > > > > On Fri, Jul 30, 2010 at 4:00 PM, Andy Jenkinson wrote: > Hi list, > > I'd like to canvas for opinion on what is "expected" behaviour concerning how servers respond to features queries in some circumstances. So, bear with me... > > Since protein DAS came along, the format of the segment parameter no longer requires the start and end position. Personally, my take on this has always been that this means that these two queries are effectively identical: > /das/source/features?segment=P99999 > /das/source/features?segment=P99999:1,150 [i.e. the full length of the protein] > This interpretation is reflected in the 1.6 spec, with the advantage that it is quite easy to explain the behaviour (use parts of the spec also use a "if omitted, the default is XYX" paradigm). If you follow this interpretation, nonpositional features (those without start/end positions) will always be returned no matter what the start and end are, or indeed whether they are included (since behaviour in both cases is identical). For me this is fine because nonpositional features are annotations that apply to the whole sequence/object, which includes all parts of it. > > Leyla came to a different view from me, reasoning that only the former of these queries should return nonpositional features because they are really outside the range of the second request. She has suggested that queries like these could be possible ways to further control whether nonpositional and/or positional features are returned: > /das/source/features?segment=P99999:0,0 [only nonpositional features] > /das/source/features?segment=P99999:0,150 [all positional and nonpositional features] > > I guess this would be an additional functionality as I do not know of any clients or servers that use zero as a meaningful query range, but i recognise the logic so would like to know: what do others think? How do your existing servers function? > > Sorry for the long email (as usual!) > > Cheers, > Andy > _______________________________________________ > DAS mailing list > DAS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das > From David.Messina at sbc.su.se Fri Jul 30 19:33:40 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 30 Jul 2010 15:33:40 -0400 Subject: [DAS] querying for nonpositional annotations In-Reply-To: <9FD87196-C80D-4C59-BE54-374A6CD71D68@ebi.ac.uk> References: <9FD87196-C80D-4C59-BE54-374A6CD71D68@ebi.ac.uk> Message-ID: I too agree with Eugene. No magic numbers. Types can be used for filtering, and actually you get more fine-grained control than simply positional or non-positional. (I use this technique now in DASher.) * In my opinion, the current spec as written is correct. That is, non-positional features don't just apply to the whole sequence, they apply to any part of the sequence. As an example, consider a journal reference ? a particular protein was isolated by a lab, they wrote a paper about it, and deposited the protein sequence in a database. If you look at a subsequence of the protein sequence, that subsequence still derives from the paper, right? So therefore the feature containing that journal reference should still be attached to the subsequence. On that basis, I think the uniprot server is technically doing it wrong and should be changed, although I have to say that in practice it hasn't been an issue for me. Dave * It might be nice, though, to add 'positional' and 'non-positional' types, which would be a way to grab all of the existing positional or non-positional types in one go. (currently it's necessary to specify multiple types to get the same functionality.)