BLAST DTD (was RE: [Biojava-l] SeqSimilaritySearchSubHit -
St rand information)
Bobick, Stephen
Stephen_Bobick at rosettabio.com
Tue Dec 2 17:57:38 EST 2003
Interesting read. There are two sections worthy of comment:
>NCBI is not proposing a new data model, but is simply transliterating
>the data model we have used for the last decade into a different language
for the
>convenience of our users. ASN.1 has a number of specific data types such
as INTEGER
>or REAL numbers while XML has only strings, so our DTD automatically adds
some
>ENTITY definitions at the top which maps these numbers to strings. This
mapping only
>allows humans that read the DTD to see where numbers are expected; an XML
validator
>will not care what is there.
Use of an XML Schema would allow the enforcement of data types.
>Summary:
>While the effect of Roles, Scope, and Alternate Forms results in
extensive
>tags in the XML, it does accurately reflect the structure and use of the
data. It allows
>XML programs to capture as little or as much of the full data structure
as they wish.
I guess I fail to see the point of all this. How would a structure
resulting from the suggestions that I propose be "lossy" in any way?
Stephen Bobick
-----Original Message-----
From: Michael E. Smoot [mailto:mes5k at cs.virginia.edu]
Sent: Tuesday, December 02, 2003 2:37 PM
To: Bobick, Stephen
Cc: biojava-l at biojava.org
Subject: Re: BLAST DTD (was RE: [Biojava-l] SeqSimilaritySearchSubHit -
Strand information)
This page explains how the DTD's were created:
http://www.ncbi.nlm.nih.gov/IEB/ToolBox/XML/ncbixml.txt
The short version is that the DTD's are transliterations of their ASN.1
data models.
Mike
More information about the Biojava-l
mailing list