From dhoworth at mrc-lmb.cam.ac.uk Fri Oct 1 06:12:42 2004 From: dhoworth at mrc-lmb.cam.ac.uk (Dave Howorth) Date: Fri Oct 1 06:12:30 2004 Subject: [DAS] Re: das/2 proposal status In-Reply-To: <43E0DE26-0C36-11D9-90EE-000A956826C8@dalkescientific.com> References: <200409151528.16053.lstein@cshl.edu> <43E0DE26-0C36-11D9-90EE-000A956826C8@dalkescientific.com> Message-ID: <415D2D9A.5070301@mrc-lmb.cam.ac.uk> Andrew Dalke wrote: > Lincoln: > > Dates should follow the HTTP date specification. > > RFC 2068 (HTTP/1.1) allows three different formats > I would prefer the DAS spec be more specific about which > of those is allowed. I think it's okay to say "RFC 1123 with > 4 digit years". We can pin this down later. I would like to make an alternative suggestion. The DAS documents in which these dates appear are XML documents so it seems to me more natural to require that dates follow XML standards rather than HTTP ones. XML dates are defined in , broadly as 1999-10-26 for a date or 2000-03-04T20:00:00Z for a dateTime. I would suggest mandating the canonical representations. This format has several advantages over the earlier complex textual ones: * Dates can be compared directly as strings with no need for parsing, * Dates are easier to parse when it is necessary, * They don't require non-English speakers to learn abbreviations, * XML defines rules for interpretation and comparison. > That should probably be 'name' instead of 'id'. For consistency's > sake since 'id' seems otherwise always used for resolvable URIs. In the context of an XML document, I think the use of 'id' attributes for values that are not of ID type is very misleading. In the case of resolvable URIs, why not use the tag 'url' instead? And use 'name' as Andrew suggests in other cases. Cheers, Dave -- Dave Howorth MRC Centre for Protein Engineering Hills Road, Cambridge, CB2 2QH 01223 252960 From sicotteh at mail.nih.gov Fri Oct 1 08:30:54 2004 From: sicotteh at mail.nih.gov (Sicotte, Hugues (NIH/NCI)) Date: Fri Oct 1 08:30:21 2004 Subject: [DAS] Re: das/2 proposal status Message-ID: <27C204BD76CBC142BA1AE46D62A8548E0F4E54F5@nihexchange9.nih.gov> I second Dave's comment. Furthermore if we use XML date types we can now take time zones into account which is useful now that we are doing worldwide computing. e.g. 1999-05-31T13:20:00-05:00 would represent and Eastern Standard Time, which is 5 hours behind UTC (Universal Time Coordinate) as per iso 8601 for time which is mostly followed by the w3c xml schema http://www.w3.org/TR/xmlschema-2/ Hugues Sicotte -----Original Message----- From: Dave Howorth [mailto:dhoworth@mrc-lmb.cam.ac.uk] Sent: Friday, October 01, 2004 6:13 AM To: das@biodas.org Subject: Re: [DAS] Re: das/2 proposal status Andrew Dalke wrote: > Lincoln: > > Dates should follow the HTTP date specification. > > RFC 2068 (HTTP/1.1) allows three different formats > I would prefer the DAS spec be more specific about which > of those is allowed. I think it's okay to say "RFC 1123 with > 4 digit years". We can pin this down later. I would like to make an alternative suggestion. The DAS documents in which these dates appear are XML documents so it seems to me more natural to require that dates follow XML standards rather than HTTP ones. XML dates are defined in , broadly as 1999-10-26 for a date or 2000-03-04T20:00:00Z for a dateTime. I would suggest mandating the canonical representations. This format has several advantages over the earlier complex textual ones: * Dates can be compared directly as strings with no need for parsing, * Dates are easier to parse when it is necessary, * They don't require non-English speakers to learn abbreviations, * XML defines rules for interpretation and comparison. > That should probably be 'name' instead of 'id'. For consistency's > sake since 'id' seems otherwise always used for resolvable URIs. In the context of an XML document, I think the use of 'id' attributes for values that are not of ID type is very misleading. In the case of resolvable URIs, why not use the tag 'url' instead? And use 'name' as Andrew suggests in other cases. Cheers, Dave -- Dave Howorth MRC Centre for Protein Engineering Hills Road, Cambridge, CB2 2QH 01223 252960 _______________________________________________ DAS mailing list DAS@biodas.org http://biodas.org/mailman/listinfo/das From lstein at cshl.edu Fri Oct 1 11:45:29 2004 From: lstein at cshl.edu (Lincoln Stein) Date: Fri Oct 1 11:44:59 2004 Subject: [DAS] Re: das/2 proposal status In-Reply-To: <27C204BD76CBC142BA1AE46D62A8548E0F4E54F5@nihexchange9.nih.gov> References: <27C204BD76CBC142BA1AE46D62A8548E0F4E54F5@nihexchange9.nih.gov> Message-ID: <200410011145.29489.lstein@cshl.edu> > In the context of an XML document, I think the use of 'id' > attributes for values that are not of ID type is very misleading. > In the case of resolvable URIs, why not use the tag 'url' instead? > And use 'name' as Andrew suggests in other cases. What if the value of the "id" tag is changed into a URI? Lincoln -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 From dalke at dalkescientific.com Fri Oct 1 12:30:08 2004 From: dalke at dalkescientific.com (Andrew Dalke) Date: Fri Oct 1 12:29:34 2004 Subject: [DAS] Re: das/2 proposal status In-Reply-To: <415D2D9A.5070301@mrc-lmb.cam.ac.uk> References: <200409151528.16053.lstein@cshl.edu> <43E0DE26-0C36-11D9-90EE-000A956826C8@dalkescientific.com> <415D2D9A.5070301@mrc-lmb.cam.ac.uk> Message-ID: <2855E33A-13C7-11D9-90EE-000A956826C8@dalkescientific.com> Dave Howorth wrote: > XML dates are defined in , broadly > as 1999-10-26 for a date or 2000-03-04T20:00:00Z for a dateTime. I > would suggest mandating the canonical representations. I know the ISO date formats less than I do the RFC date formats. In part because all ISO specs cost money, so aren't freely available on the web. Thus when reading the XML schema documentation, where it says [Definition:]??date represents a calendar date. The *value space* of date is the set of Gregorian calendar dates as defined in ? 5.2.1 of [ISO 8601]. Specifically, it is a set of one-day long, non-periodic instances e.g. lexical 1999-10-26 to represent the calendar date 1999-10-26, independent of how many hours this day has. it's hard for me to know what that means. Is 1999-1-3 allowed for the 3rd of January? Elsewhere in the documentation it suggests the answer is no, that all months and days are two digits in length. But there are comments like See ISO 8601 Date and Time Formats (?D) for details about legal values in the various fields. which defer final say to that spec. I'm not saying that we shouldn't use ISO 8601. I'm complaining because I don't have enough knowledge of it to make a judgment and the information needed for clarification isn't available. > This format has several advantages over the earlier complex textual > ones: > * Dates can be compared directly as strings with no need for parsing, > * Dates are easier to parse when it is necessary, > * They don't require non-English speakers to learn abbreviations, > * XML defines rules for interpretation and comparison. Date comparisons do require parsing. The XML schema spec modifies the ISO spec to allow years past 9999. (See "3.2.9.1 Lexical representation") This is done by allowing digits to the left of the ISO year spec. Thus to support the XML Schema datetime type, an implementation must allow the year 16293-03-02, which is lexigraphically before 2000-01-01 . Python's standard datetime only handles proleptic Gregorian between 0001-01-01 to 9999-12-31. I would prefer staying to that range as I suspect more libraries can handle ISO dates rather than XML Schema extensions to the ISO dates. Were we to go this route I would insist on restricting the allowed dates to the subset of ISO 8601 and XML Schema. That is, no dates outside 0001-9999 (8601 allows 0000-9999 while XML Schema allows -Inf to +Inf, *except* 0000). Let the future figure out how to extend DAS to make it Y10K compliant. :) It's also true that 8601 is easier to parse and generate than RFC dates. The latter requires, eg, getting the day-of-week correct. But since most every language has code for doing that how is either one harder than the other, practically speaking? Actually, http://www.mcs.vuw.ac.nz/technical/software/SGML/doc/iso8601/ ISO8601.html claims The following complete, abbreviated or truncated formats are permissible: "19930214" or "1993-02-14" (complete representation) "1993-02" (reduced precision) "1993" "19" "930214" or "93-02-14" (truncated, current century assumed) "-9302" or "-93-02" "-93" "--0214" or "--02-14" "--02" "---14" so the ISO datetime can be quite complicated to parse. The XML Schema restricts it to "the extended format CCYY-MM-DDThh:mm:ss" where only the last field, the seconds, may have decimals. That New Zealand URL says the ISO spec says Decimal fractions may be included with an hour, minute or second. The decimal sign should be either a comma (preferred) or a full stop. If the value is less than one then the decimal sign should be preceded by a zero. The number of decimal places is set depending on the application. The XML Schema spec says it's "s.sss" for that case and uses the phrase "decimal point" instead of "decimal sign". That suggests some libraries might generate ISO extended format and end up with a "," for the seconds decimal instead of ".". I had no experience to judge. Python uses a "." but that's because the implementer is from the US. There are some abbreviations. There's "Z" for the time zone ;) Less glibly, the datetime is not going to be read by humans who aren't comfortable with English. It will be read by people who like looking at the low-level format, and by software. The rest of the format (element tag, spec) requires a good knowledge of English so that first group isn't going to be left out. As for software for reading the date, well, is the ISO spec for the dates, or the XML schema spec for the dates, available in different languages? ISO, perhaps might be in French too. In other words, I don't think that's a major objection. I do like that if we use some subset of ISO and XML Schema as our datetime then the XML schema language will be able to validate it for us, and automatically convert it into the native date type for a language using the dates. That's good enough reason for me. But I would want to test it out first. >> That should probably be 'name' instead of 'id'. For consistency's >> sake since 'id' seems otherwise always used for resolvable URIs. > > In the context of an XML document, I think the use of 'id' attributes > for values that are not of ID type is very misleading. In the case > of resolvable URIs, why not use the tag 'url' instead? And use 'name' > as Andrew suggests in other cases. I don't think 'url' is right for this case, but I don't know the precise expected semantics for an ID type. In general we use id for the case exemplified by the following To get to the first child, use http://..../abc/xyz Is that correct? If that's not the correct meaning for id (I suspect it's supposed to be usable for targets, as with the ....#xyz syntax used in HTML) then we can easily change it to 'url'. Andrew dalke@dalkescientific.com From dalke at dalkescientific.com Fri Oct 1 12:51:33 2004 From: dalke at dalkescientific.com (Andrew Dalke) Date: Fri Oct 1 12:50:58 2004 Subject: [DAS] Re: das/2 proposal status In-Reply-To: <27C204BD76CBC142BA1AE46D62A8548E0F4E54F5@nihexchange9.nih.gov> References: <27C204BD76CBC142BA1AE46D62A8548E0F4E54F5@nihexchange9.nih.gov> Message-ID: <263D64CA-13CA-11D9-90EE-000A956826C8@dalkescientific.com> Sicotte, Hugues (NIH/NCI) wrote: > Furthermore if we use XML date types we can now > take time zones into account which is useful now > that we are doing worldwide computing. The spec was using RFC 2068 dates. I suggested using a clarification of RFC 1123, to require 4 digit dates. That RFC is a clarification of RFC 822. RFC 822 is for email, which is almost be definition a form of worldwide computing. It has allowed timezones for the last 22 years zone = "UT" / "GMT" ; Universal Time ; North American : UT / "EST" / "EDT" ; Eastern: - 5/ - 4 / "CST" / "CDT" ; Central: - 6/ - 5 / "MST" / "MDT" ; Mountain: - 7/ - 6 / "PST" / "PDT" ; Pacific: - 8/ - 7 / 1ALPHA ; Military: Z = UT; ; A:-1; (J not used) ; M:-12; N:+1; Y:+12 / ( ("+" / "-") 4DIGIT ) ; Local differential ; hours+min. (HHMM) RFC 2822 does change this to obsolete the military and North American qualifications, so that the preferred use is the ( ("+" / "-") 4DIGIT ) local differential. > e.g. 1999-05-31T13:20:00-05:00 > would represent and Eastern Standard Time, which is 5 hours behind > UTC (Universal Time Coordinate) > as per iso 8601 for time which is mostly followed by the w3c xml schema > http://www.w3.org/TR/xmlschema-2/ Under the normal RFC dates that's written, with timezone, as Mon, 03 May 1999 13:20:00 -0500 BTW, it appears that the proper spec for RFC-style dates is now RFC 2822. Andrew dalke@dalkescientific.com From lstein at cshl.edu Fri Oct 8 11:38:39 2004 From: lstein at cshl.edu (Lincoln Stein) Date: Fri Oct 8 11:38:20 2004 Subject: [DAS] Re: das/2 proposal status In-Reply-To: <263D64CA-13CA-11D9-90EE-000A956826C8@dalkescientific.com> References: <27C204BD76CBC142BA1AE46D62A8548E0F4E54F5@nihexchange9.nih.gov> <263D64CA-13CA-11D9-90EE-000A956826C8@dalkescientific.com> Message-ID: <200410081138.39553.lstein@cshl.edu> Andrew, What is the final decision on this? Will it be compatible with the HTTP style, which is Fri, 08 Oct 2004 15:37:46 GMT ?? Lincoln On Friday 01 October 2004 12:51 pm, Andrew Dalke wrote: > Sicotte, Hugues (NIH/NCI) wrote: > > Furthermore if we use XML date types we can now > > take time zones into account which is useful now > > that we are doing worldwide computing. > > The spec was using RFC 2068 dates. I suggested using > a clarification of RFC 1123, to require 4 digit dates. > That RFC is a clarification of RFC 822. > > RFC 822 is for email, which is almost be definition a > form of worldwide computing. It has allowed timezones > for the last 22 years > > zone = "UT" / "GMT" ; Universal Time > ; North American > : UT / "EST" / "EDT" ; Eastern: - 5/ - 4 / "CST" > / "CDT" ; Central: - 6/ - 5 / "MST" / "MDT" > ; Mountain: - 7/ - 6 / "PST" / "PDT" ; > Pacific: - 8/ - 7 / 1ALPHA ; Military: Z = > UT; ; A:-1; (J not used) ; M:-12; N:+1; Y:+12 / ( ("+" / "-") > 4DIGIT ) ; Local differential ; hours+min. (HHMM) > > RFC 2822 does change this to obsolete the military > and North American qualifications, so that the preferred > use is the ( ("+" / "-") 4DIGIT ) local differential. > > > e.g. 1999-05-31T13:20:00-05:00 > > would represent and Eastern Standard Time, which is 5 hours > > behind UTC (Universal Time Coordinate) > > as per iso 8601 for time which is mostly followed by the w3c xml > > schema http://www.w3.org/TR/xmlschema-2/ > > Under the normal RFC dates that's written, with timezone, as > Mon, 03 May 1999 13:20:00 -0500 > > > BTW, it appears that the proper spec for RFC-style > dates is now RFC 2822. > > Andrew > dalke@dalkescientific.com > > _______________________________________________ > DAS mailing list > DAS@biodas.org > http://biodas.org/mailman/listinfo/das -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 From sicotteh at mail.nih.gov Fri Oct 8 14:17:50 2004 From: sicotteh at mail.nih.gov (Sicotte, Hugues (NIH/NCI)) Date: Fri Oct 8 14:17:16 2004 Subject: [DAS] Re: das/2 proposal status Message-ID: <27C204BD76CBC142BA1AE46D62A8548E0F4E5525@nihexchange9.nih.gov> I'm satisfied. Andrew? -----Original Message----- From: Lincoln Stein [mailto:lstein@cshl.edu] Sent: Friday, October 08, 2004 11:39 AM To: Andrew Dalke; das@biodas.org Subject: Re: [DAS] Re: das/2 proposal status Andrew, What is the final decision on this? Will it be compatible with the HTTP style, which is Fri, 08 Oct 2004 15:37:46 GMT ?? Lincoln On Friday 01 October 2004 12:51 pm, Andrew Dalke wrote: > Sicotte, Hugues (NIH/NCI) wrote: > > Furthermore if we use XML date types we can now > > take time zones into account which is useful now > > that we are doing worldwide computing. > > The spec was using RFC 2068 dates. I suggested using > a clarification of RFC 1123, to require 4 digit dates. > That RFC is a clarification of RFC 822. > > RFC 822 is for email, which is almost be definition a > form of worldwide computing. It has allowed timezones > for the last 22 years > > zone = "UT" / "GMT" ; Universal Time > ; North American > : UT / "EST" / "EDT" ; Eastern: - 5/ - 4 / "CST" > / "CDT" ; Central: - 6/ - 5 / "MST" / "MDT" > ; Mountain: - 7/ - 6 / "PST" / "PDT" ; > Pacific: - 8/ - 7 / 1ALPHA ; Military: Z = > UT; ; A:-1; (J not used) ; M:-12; N:+1; Y:+12 / ( ("+" / "-") > 4DIGIT ) ; Local differential ; hours+min. (HHMM) > > RFC 2822 does change this to obsolete the military > and North American qualifications, so that the preferred > use is the ( ("+" / "-") 4DIGIT ) local differential. > > > e.g. 1999-05-31T13:20:00-05:00 > > would represent and Eastern Standard Time, which is 5 hours > > behind UTC (Universal Time Coordinate) > > as per iso 8601 for time which is mostly followed by the w3c xml > > schema http://www.w3.org/TR/xmlschema-2/ > > Under the normal RFC dates that's written, with timezone, as > Mon, 03 May 1999 13:20:00 -0500 > > > BTW, it appears that the proper spec for RFC-style > dates is now RFC 2822. > > Andrew > dalke@dalkescientific.com > > _______________________________________________ > DAS mailing list > DAS@biodas.org > http://biodas.org/mailman/listinfo/das -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 _______________________________________________ DAS mailing list DAS@biodas.org http://biodas.org/mailman/listinfo/das From dalke at dalkescientific.com Fri Oct 8 18:22:02 2004 From: dalke at dalkescientific.com (Andrew Dalke) Date: Fri Oct 8 18:21:09 2004 Subject: [DAS] Re: das/2 proposal status In-Reply-To: <200410081138.39553.lstein@cshl.edu> References: <27C204BD76CBC142BA1AE46D62A8548E0F4E54F5@nihexchange9.nih.gov> <263D64CA-13CA-11D9-90EE-000A956826C8@dalkescientific.com> <200410081138.39553.lstein@cshl.edu> Message-ID: <7A385D26-1978-11D9-ADD2-000A956826C8@dalkescientific.com> Lincoln: > What is the final decision on this? Will it be compatible with the > HTTP style, which is > > Fri, 08 Oct 2004 15:37:46 GMT The ISO style for this is 2004-10-08T15:37:46Z or 2004-10-08T15:37:46+00:00 It is not compatible with the HTTP style. I'm sympathetic with the idea now that ISO dates are more appropriate for XML documents and HTTP dates are more appropriate for HTTP headers. While the ISO spec is proprietary, costs money, and complicated, it has a minimal basic requirement which is of the form YYYY-MM-DDTHH:MM:SS(.ss*)?(Z|[+-]hh:mm)? where YYYY is the 4 digit year, MM the two digit month (01 to 12), DD the two digit day, 'T' is the required separator between date and time. HH is the 24 hour time (00 to 24), MM the minute (00 to 59) SS the second (00 to 60), .ss* the optional decimal seconds, 'Z' means UTC/GMT and +hh:mm / -hh:mm the timezone shift. Some caveats. The XML Schema dates are different than the ISO dates. ISO allows 0000 to 9999 in the years while XSD doesn't allow 0000. ISO allows "24:00" as a time" while XSD does not. SS = 60 only happens with leap seconds. Python's datetime doesn't support leap seconds nor the year 0000. Perl's default uses seconds in epoch, so doesn't go before 1970. But DateTime.pm handles the range 0000-9999 just fine (as well as a much larger range). I did not look into Java but I'm pretty sure there is standard library support for this. I don't know about C or C++. There is a 'time.rb' Ruby module for XSD dates. Were we to go this route I would say we define that all dates be given as YYYY-MM-DD all datetimes be given as YYYY-MM-DDTHH:MM:SS(.ss*)?(Z|[+-]hh:mm) (timezone required, fractions of a second optional), 0001 <= YYYY <= 9999, 00<=HH<=23 and leap second support is implementation dependent. This is compatible with ISO 8601, compatible with XML Schema, supportable by the likely DAS/2 clients and servers, and not dependent on any external specification. Because this is supported by the XML Schema definition, I propose we use this format instead, when inside XML documents. Andrew dalke@dalkescientific.com From David at GeerCom.com Thu Oct 28 10:51:26 2004 From: David at GeerCom.com (David Geer) Date: Fri Oct 29 22:15:03 2004 Subject: [DAS] Biggest News in DAS Message-ID: <001101c4bcfd$9ad7d3c0$01fea8c0@domain.invalid> Hello List Members, I'd like to know what the biggest story in DAS is today. Is it the development of DAS/2, the growing acceptance of DAS, or something else? Best Regards, David Geer Writer/Owner Geer Communications http://www.GeerCom.com 440-964-9832 David@GeerCom.com