[Biopython-dev] Fwd: [Utilities-announce] PubMed E-Utility 2011 DTD updates - please read!

Peter biopython at maubp.freeserve.co.uk
Tue Sep 14 17:59:29 EDT 2010


Hi all,

It looks like there are two more DTD files (available now)
to add to Biopython for the Bio.Entrez parser.

Peter

---------- Forwarded message ----------
From: <utilities-announce at ncbi.nlm.nih.gov>
Date: Tue, Sep 14, 2010 at 9:24 PM
Subject: [Utilities-announce] PubMed E-Utility 2011 DTD updates - please
read!
To: NLM/NCBI List utilities-announce <utilities-announce at ncbi.nlm.nih.gov>


 Dear NCBI PubMed E-Utility Users,



We anticipate updating the PubMed E-Utility DTDs for 2011 in mid-December,
approximately on December 13, 2010.



The forthcoming DTDs are available from:

http://eutils.ncbi.nlm.nih.gov/corehtml/query/DTD/pubmed_110101.dtd

http://eutils.ncbi.nlm.nih.gov/corehtml/query/DTD/nlmmedlinecitationset_110101.dtd
*[image: http://jira/images/icons/linkext7.gif]*<http://eutils.ncbi.nlm.nih.gov/corehtml/query/DTD/nlmmedlinecitationset_110101.dtd#>



   1. DTD AND XML CHANGES FOR 2011
      1. Changes to NLMMedlineCitationSet DTD AND PubMed XML
      The DTD changes for the 2011 production year are itemized in the
      Revision Notes section near the top of the DTD. The following
describes the
      substantive changes to NLMMedlineCitationSet dtd and PubMed XML:
         1. Accommodating Structured
Abstracts<http://www.nlm.nih.gov/bsd/policy/structured_abstracts.html>
         Two new attributes, Label and NlmCategory, are added to the
         AbstractText element which is used with both the Abstract and
OtherAbstract
         elements. A valid label name found in published structured
abstracts (e.g.,
         Introduction, Goals, Study Design, Findings, Discussion) will
be identified
         in the XML as an Abstract Text Label and each ‘parent’
concept to which the
         published Label name is mapped at NLM will be identified as
an Abstract Text
         NlmCategory. Five NLM-assigned mapped-to categories are possible:
         Background, Objective, Methods, Results, and Conclusions. In
general, the
         lack of Label and NlmCategory attributes in AbstractText
means the published
         abstract is unstructured.

Note that the content of structured abstracts will be exported in separate
segments that need to be joined for display of the complete abstract text.

DTD:

<!ELEMENT       AbstractText (#PCDATA)>

<!ATTLIST       AbstractText

     Label CDATA #IMPLIED

     NlmCategory (UNLABELLED | BACKGROUND | OBJECTIVE | METHODS |
RESULTS | CONCLUSIONS) #IMPLIED>

In the following example, the published label names are INTRODUCTION; AIMS;
DESIGN, SETTING AND PARTICIPANTS; RESULTS; and DISCUSSION which
correspondingly map to the five NLM-assigned categories.

Sample XML:

<Abstract>

<AbstractText Label = "INTRODUCTION " NlmCategory = "BACKGROUND":
Physicians are often reluctant to prescribe strong opioids for

chronic non cancer pain (CNCP). No study has qualitatively examined
physicians' beliefs about …</AbstractText >

<AbstractText Label = "AIMS" NlmCategory = "OBJECTIVE": To describe
physicians' attitudes and experience of prescribing opioids for

CNCP to PWHSA.</AbstractText>

<AbstractText Label = "DESIGN, SETTING AND PARTICIPANTS" NlmCategory =
"METHODS": Nineteen individual interviews and two focus

groups were conducted with GPs, Addiction Specialists, Pain
Specialists and Rheumatologists.</AbstractText >

<AbstractText Label = "RESULTS" NlmCategory = "RESULTS": Physicians
were "reluctant" to prescribe opioids to PWHSA experiencing

CNCP for fear of addiction, misuse or diversion of medications. Many
exhibited "distrust"…</AbstractText>

<AbstractText Label = "DISCUSSION" NlmCategory = "CONCLUSIONS":
Applying the chronic disease model to comorbid addiction and CNCP

would ensure a health and social care system that makes it difficult
to stigmatise patients…</AbstractText>

</Abstract>


    1. Implementing Protocol Class 2 Supplementary Concept Record (SCR) and
         Rare Disease Class 3 SCR terms
         A new element, SupplMeshList, is added to the MedlineCitation
         element and another new element, SupplMeshName with its
attribute Type, is
         added to SupplMeshList.

DTD:

<!ELEMENT       MedlineCitation (PMID, DateCreated, DateCompleted?,
DateRevised?, Article, MedlineJournalInfo, ChemicalList?,

SupplMeshList?, CitationSubset*, CommentsCorrectionsList?,
GeneSymbolList?, MeshHeadingList?, NumberOfReferences?,

PersonalNameSubjectList?, OtherID*, OtherAbstract*, KeywordList*,
SpaceFlightMission*, InvestigatorList?, GeneralNote*)>



<!ELEMENT       SupplMeshList (SupplMeshName+)>

<!ELEMENT       SupplMeshName (#PCDATA)>

<!ATTLIST       SupplMeshName Type (Disease | Protocol) #REQUIRED>



Sample XML:

<SupplMeshList>

<SupplMeshName Type="Disease">disease term</SupplMeshName>

<SupplMeshName Type="Protocol">protocol term</SupplMeshName>

</SupplMeshList>


    1. Separating MeSH Geographic Descriptor Names from other MeSH
         Descriptors
         The Type attribute is added to the DescriptorName element.

         DTD:

<!ELEMENT       DescriptorName (#PCDATA)>

            <!ATTLIST       DescriptorName

            MajorTopicYN (Y | N) "N"

            Type (Geographic) #IMPLIED>



Sample XML:

<MeshHeadingList>

     <MeshHeading>

     <DescriptorName MajorTopicYN="N" Type="Geographic">New
York</DescriptorName>

     </MeshHeading>

</MeshHeadingList>


    1. Accommodating Versioned Articles; Corresponding Change to PMID
         There is a new model of publishing referred to as ‘versioning’
         whereby multiple versions of the same online article are
released, sometimes
         in quick succession and sometimes almost as soon as the
original article has
         been published. Beginning in the 2011 production year, NLM
will create an
         individual citation for each article’s version and link the
versions via new
         attributes for the MedlineCitation and PMID elements.

         The new attributes for the MedlineCitation element are VersionID
         and VersionDate.

         DTD:

<!ELEMENT       MedlineCitation (PMID, DateCreated, DateCompleted?,
DateRevised?, Article, MedlineJournalInfo, ChemicalList?,

SupplMeshList?, CitationSubset*, CommentsCorrectionsList?,
GeneSymbolList?, MeshHeadingList?, NumberOfReferences?,

PersonalNameSubjectListOtherID*, OtherAbstract*, KeywordList*,
SpaceFlightMission*, InvestigatorList?, GeneralNote*)>

<!ATTLIST     MedlineCitation

     Owner (NLM | NASA | PIP | KIE | HSR | HMD | NOTNLM) "NLM"

 Status (Completed | In-Process | PubMed-not-MEDLINE | In-Data-Review
|Publisher | MEDLINE | OLDMEDLINE) #REQUIRED

     VersionID CDATA #IMPLIED

     VersionDate CDATA #IMPLIED>

The new attribute for the PMID element is Version.

DTD:

<!ELEMENT       PMID (#PCDATA)>

<!ATTLIST       PMID

     Version CDATA #REQUIRED>

Sample XML:

<MedlineCitation Status = "MEDLINE" Owner ="NLM" VersionID =
"PMC2781303.2" VersionDate = 20091207">

<PMID Version = "2">20029669</PMID>

Search and display implementation in PubMed is under consideration at this
time; all implementation decisions will be documented in a forthcoming NLM
Technical Bulletin <http://www.nlm.nih.gov/pubs/techbull/tb.html> article.
Details are:

     - The PMID combined with its version attribute value (e.g., 1, 2, 3)
            becomes the citation’s new unique identifier, represented as <PMID
            Version="2">12345678</PMID>.
            - The PMID Version attribute value ‘1’ will be assigned to all
            existing records at the time the 2011 baseline files are
produced and
            exported.
            - The PubDate value on citations of versions of the same article
            will be identical. The MedlineCitation Version Date and
VersionID attribute
            values supplied by the publisher will identify the
specific version.
            - If a citation is not for PMID Version 1, it must contain
            MedlineCitation Version Date and VersionID attribute
values, and the
            original publication date for Version 1 as the PubDate.
            - A PMID Version attribute value higher than 1 indicates that
            there is a citation for at least one prior version
(although it might
            happen, rarely, that a prior version subsequently gets
deleted). Although
            the MedlineCitation VersionDate value may be different
from the PubDate, it
            might be the same as PubDate if the new version was
released later the same
            day.
            - In the future, when non-PubMed Central journals are included,
            the publisher-supplied VersionID value will be whatever
the publisher
            decides it to be; e.g. the 2nd publisher-supplied
VersionID may be ‘b’ or
            ‘2b’ and the PMID Version attribute value assigned by NLM
will still be 2.
         1. Eliminating Pre-defined Source Attribute Values for NameID
         Element
         The 2010 DTD specifies the values that may be used for the Source
         attribute for the NameID element. Please note that the NameID
element has
         not yet been used; it is expected to be implemented at some
point during the
         2011.

         DTD:

<!ELEMENT       NameID (#PCDATA)>

<!ATTLIST       NameID Source CDATA #REQUIRED>

Sample XML:

<NameID Source = "NCBI">


    1. Simplifying Author Element Structure
         The NameID element has been repositioned in the Author element to
         simplify the DTD structure. The XML is not affected.

         DTD:

<!ELEMENT       Author (((LastName, ForeName?, Initials?, Suffix? ) |
CollectiveName),NameID*)>


    1. Accommodating Identification of Machine-generated Keywords. A new
         valid value, NLM-AUTO, is added to the Owner attribute of the element
         KeywordList.
         DTD:

<!ELEMENT       KeywordList (Keyword+)>

<!ATTLIST       KeywordList Owner (NLM | NLM-AUTO | NASA | PIP | KIE |
NOTNLM) "NLM">

Sample XML:

<KeywordList Owner ="NLM-AUTO">


   1. ENHANCED CHARACTER
SET<http://www.nlm.nih.gov/databases/dtd/medline_characters.html>

A subset of UTF-8 characters is currently supported for PubMed data. PubMed
data now supports the full UTF-8 Character Set.

Exceptions:
All instances that represent a Double Quote will be translated to the
straight double quote (Unicode 0022).
All instances that represent a Single Quote (including the prime and
apostrophe) will be translated to the straight single quote (Unicode 0027).
Em Dash, En Dash, Hyphen, or Minus will be translated to the single dash
(Unicode 002D).
Those three Unicode values are part of the current Character Set.



_______________________________________________
Utilities-announce mailing list
http://www.ncbi.nlm.nih.gov/mailman/listinfo/utilities-announce



More information about the Biopython-dev mailing list