BLAST DTD (was RE: [Biojava-l] SeqSimilaritySearchSubHit - St rand information)

Tue Dec 2 17:57:38 EST 2003

Interesting read.  There are two sections worthy of comment:

  >NCBI is not proposing a new data model, but is simply transliterating 
  >the data model we have used for the last decade into a different language
for the 
  >convenience of our users. ASN.1 has a number of specific data types such
as INTEGER 
  >or REAL numbers while XML has only strings, so our DTD automatically adds
some 
  >ENTITY definitions at the top which maps these numbers to strings. This
mapping only 
  >allows humans that read the DTD to see where numbers are expected; an XML
validator  
  >will not care what is there.

Use of an XML Schema would allow the enforcement of data types.

  >Summary:
  >While the effect of Roles, Scope, and Alternate Forms results in
extensive 
  >tags in the XML, it does accurately reflect the structure and use of the
data. It allows 
  >XML programs to capture as little or as much of the full data structure
as they wish. 

I guess I fail to see the point of all this.  How would a structure
resulting from the suggestions that I propose be "lossy" in any way?

Stephen Bobick

-----Original Message-----
From: Michael E. Smoot [mailto:mes5k at cs.virginia.edu] 
Sent: Tuesday, December 02, 2003 2:37 PM
To: Bobick, Stephen
Cc: biojava-l at biojava.org
Subject: Re: BLAST DTD (was RE: [Biojava-l] SeqSimilaritySearchSubHit -
Strand information)

This page explains how the DTD's were created:

	http://www.ncbi.nlm.nih.gov/IEB/ToolBox/XML/ncbixml.txt

The short version is that the DTD's are transliterations of their ASN.1
data models.

Mike