BLAST DTD (was RE: [Biojava-l] SeqSimilaritySearchSubHit - Strand
information)
Michael E. Smoot
mes5k at cs.virginia.edu
Tue Dec 2 17:37:02 EST 2003
This page explains how the DTD's were created:
http://www.ncbi.nlm.nih.gov/IEB/ToolBox/XML/ncbixml.txt
The short version is that the DTD's are transliterations of their ASN.1
data models.
Mike
On Tue, 2 Dec 2003, Bobick, Stephen wrote:
>
> Greetings,
>
> I'm afraid I will not be answering the poster here, but the message caught
> my curiousity and prompted me to take a peek at the BLAST DTD, and
> subsequently post this commentary. My question is how was the BLAST DTD
> designed and under what standards? I find the choice of element names to be
> unfortunate. In comparing to standard XML naming and DTD design I would
> expect something like:
>
> <hsp_query from="576" to="229" frame="1"/>
>
> Rather than the following:
>
> <Hsp_query-from>576</Hsp_query-from>
> <Hsp_query-to>229</Hsp_query-to>
> <Hsp_query-frame>1</Hsp_query-frame>
>
> The two primary differences are in capitalization, and the choice attributes
> rather than separate elements for each datum in this excerpt. As a
> consequence, the "expected" form is more succinct. From the DTD I see the
> latter naming and element/attribute choice is repeated many times.
>
> I will add an admission that I have not worked with BLAST results in several
> years, as my focus has been on data management software (LIMS) and, more
> recently, analysis software. Still, as a professional in the greater
> bioinformatics community, who works daily with XML, I do like to see an
> incorporation of good practices from the "pure" software development
> community.
>
> Comments?
>
> Stephen Bobick
>
>
> -----Original Message-----
> From: biojava-l-bounces at portal.open-bio.org
> [mailto:biojava-l-bounces at portal.open-bio.org] On Behalf Of Jan Würthner
> Sent: Tuesday, December 02, 2003 12:52 AM
> To: biojava-l at biojava.org
> Subject: [Biojava-l] SeqSimilaritySearchSubHit - Strand information
>
>
>
> Hi folks,
>
> I'm constructing SeqSimilaritySearchSubHit instances from xml formatted NCBI
>
> BLAST results, and I'm getting steadily confused with the query's and
> subject's from and to information on one hand and the query's and subject's
> strand on the other hand.
>
> The NCBI returns for example:
>
> <Hsp_query-from>576</Hsp_query-from>
> <Hsp_query-to>229</Hsp_query-to>
> <Hsp_query-frame>1</Hsp_query-frame>
>
> <Hsp_hit-from>12374053</Hsp_hit-from>
> <Hsp_hit-to>12374401</Hsp_hit-to>
> <Hsp_hit-frame> -1</Hsp_hit-frame>
>
> I'd think that the possibility to assign the from- and to-values in
> different
> orders (like descending in this query) already includes the information
> about
> the direction (POSITIVE/NEGATIVE). Why is there an additional "frame" value,
>
> and why is the query's frame value set to +1, and the subject's (=hit's)
> value set to -1? I assumed it to be assigned vice versa.
>
> My question is: How shall I set the SeqSimilaritySearchSubHit instance's
> query/subject values from these data?
>
> Having answered this will be of much help!
>
> Thank you
> Jan
>
> --
> Jan Würthner
> Institute for Medical Microbiology
> Building 22.21
> Heinrich-Heine-University
> Universitätsstraße 1
> 40225 Duesseldorf
>
> Tel. +49 (0) 211 81 12461
> URL: www.medmikro.uni-duesseldorf.de
>
>
> _______________________________________________
> Biojava-l mailing list - Biojava-l at biojava.org
> http://biojava.org/mailman/listinfo/biojava-l
>
>
> _______________________________________________
> Biojava-l mailing list - Biojava-l at biojava.org
> http://biojava.org/mailman/listinfo/biojava-l
>
More information about the Biojava-l
mailing list