[Biopython-dev] Fwd: [Utilities-announce] PubMed E-Utility 2011 DTD updates - please read!
Peter
biopython at maubp.freeserve.co.uk
Tue Sep 14 21:59:29 UTC 2010
Hi all,
It looks like there are two more DTD files (available now)
to add to Biopython for the Bio.Entrez parser.
Peter
---------- Forwarded message ----------
From: <utilities-announce at ncbi.nlm.nih.gov>
Date: Tue, Sep 14, 2010 at 9:24 PM
Subject: [Utilities-announce] PubMed E-Utility 2011 DTD updates - please
read!
To: NLM/NCBI List utilities-announce <utilities-announce at ncbi.nlm.nih.gov>
Dear NCBI PubMed E-Utility Users,
We anticipate updating the PubMed E-Utility DTDs for 2011 in mid-December,
approximately on December 13, 2010.
The forthcoming DTDs are available from:
http://eutils.ncbi.nlm.nih.gov/corehtml/query/DTD/pubmed_110101.dtd
http://eutils.ncbi.nlm.nih.gov/corehtml/query/DTD/nlmmedlinecitationset_110101.dtd
*[image: http://jira/images/icons/linkext7.gif]*<http://eutils.ncbi.nlm.nih.gov/corehtml/query/DTD/nlmmedlinecitationset_110101.dtd#>
1. DTD AND XML CHANGES FOR 2011
1. Changes to NLMMedlineCitationSet DTD AND PubMed XML
The DTD changes for the 2011 production year are itemized in the
Revision Notes section near the top of the DTD. The following
describes the
substantive changes to NLMMedlineCitationSet dtd and PubMed XML:
1. Accommodating Structured
Abstracts<http://www.nlm.nih.gov/bsd/policy/structured_abstracts.html>
Two new attributes, Label and NlmCategory, are added to the
AbstractText element which is used with both the Abstract and
OtherAbstract
elements. A valid label name found in published structured
abstracts (e.g.,
Introduction, Goals, Study Design, Findings, Discussion) will
be identified
in the XML as an Abstract Text Label and each ‘parent’
concept to which the
published Label name is mapped at NLM will be identified as
an Abstract Text
NlmCategory. Five NLM-assigned mapped-to categories are possible:
Background, Objective, Methods, Results, and Conclusions. In
general, the
lack of Label and NlmCategory attributes in AbstractText
means the published
abstract is unstructured.
Note that the content of structured abstracts will be exported in separate
segments that need to be joined for display of the complete abstract text.
DTD:
<!ELEMENT AbstractText (#PCDATA)>
<!ATTLIST AbstractText
Label CDATA #IMPLIED
NlmCategory (UNLABELLED | BACKGROUND | OBJECTIVE | METHODS |
RESULTS | CONCLUSIONS) #IMPLIED>
In the following example, the published label names are INTRODUCTION; AIMS;
DESIGN, SETTING AND PARTICIPANTS; RESULTS; and DISCUSSION which
correspondingly map to the five NLM-assigned categories.
Sample XML:
<Abstract>
<AbstractText Label = "INTRODUCTION " NlmCategory = "BACKGROUND":
Physicians are often reluctant to prescribe strong opioids for
chronic non cancer pain (CNCP). No study has qualitatively examined
physicians' beliefs about …</AbstractText >
<AbstractText Label = "AIMS" NlmCategory = "OBJECTIVE": To describe
physicians' attitudes and experience of prescribing opioids for
CNCP to PWHSA.</AbstractText>
<AbstractText Label = "DESIGN, SETTING AND PARTICIPANTS" NlmCategory =
"METHODS": Nineteen individual interviews and two focus
groups were conducted with GPs, Addiction Specialists, Pain
Specialists and Rheumatologists.</AbstractText >
<AbstractText Label = "RESULTS" NlmCategory = "RESULTS": Physicians
were "reluctant" to prescribe opioids to PWHSA experiencing
CNCP for fear of addiction, misuse or diversion of medications. Many
exhibited "distrust"…</AbstractText>
<AbstractText Label = "DISCUSSION" NlmCategory = "CONCLUSIONS":
Applying the chronic disease model to comorbid addiction and CNCP
would ensure a health and social care system that makes it difficult
to stigmatise patients…</AbstractText>
</Abstract>
1. Implementing Protocol Class 2 Supplementary Concept Record (SCR) and
Rare Disease Class 3 SCR terms
A new element, SupplMeshList, is added to the MedlineCitation
element and another new element, SupplMeshName with its
attribute Type, is
added to SupplMeshList.
DTD:
<!ELEMENT MedlineCitation (PMID, DateCreated, DateCompleted?,
DateRevised?, Article, MedlineJournalInfo, ChemicalList?,
SupplMeshList?, CitationSubset*, CommentsCorrectionsList?,
GeneSymbolList?, MeshHeadingList?, NumberOfReferences?,
PersonalNameSubjectList?, OtherID*, OtherAbstract*, KeywordList*,
SpaceFlightMission*, InvestigatorList?, GeneralNote*)>
<!ELEMENT SupplMeshList (SupplMeshName+)>
<!ELEMENT SupplMeshName (#PCDATA)>
<!ATTLIST SupplMeshName Type (Disease | Protocol) #REQUIRED>
Sample XML:
<SupplMeshList>
<SupplMeshName Type="Disease">disease term</SupplMeshName>
<SupplMeshName Type="Protocol">protocol term</SupplMeshName>
</SupplMeshList>
1. Separating MeSH Geographic Descriptor Names from other MeSH
Descriptors
The Type attribute is added to the DescriptorName element.
DTD:
<!ELEMENT DescriptorName (#PCDATA)>
<!ATTLIST DescriptorName
MajorTopicYN (Y | N) "N"
Type (Geographic) #IMPLIED>
Sample XML:
<MeshHeadingList>
<MeshHeading>
<DescriptorName MajorTopicYN="N" Type="Geographic">New
York</DescriptorName>
</MeshHeading>
</MeshHeadingList>
1. Accommodating Versioned Articles; Corresponding Change to PMID
There is a new model of publishing referred to as ‘versioning’
whereby multiple versions of the same online article are
released, sometimes
in quick succession and sometimes almost as soon as the
original article has
been published. Beginning in the 2011 production year, NLM
will create an
individual citation for each article’s version and link the
versions via new
attributes for the MedlineCitation and PMID elements.
The new attributes for the MedlineCitation element are VersionID
and VersionDate.
DTD:
<!ELEMENT MedlineCitation (PMID, DateCreated, DateCompleted?,
DateRevised?, Article, MedlineJournalInfo, ChemicalList?,
SupplMeshList?, CitationSubset*, CommentsCorrectionsList?,
GeneSymbolList?, MeshHeadingList?, NumberOfReferences?,
PersonalNameSubjectListOtherID*, OtherAbstract*, KeywordList*,
SpaceFlightMission*, InvestigatorList?, GeneralNote*)>
<!ATTLIST MedlineCitation
Owner (NLM | NASA | PIP | KIE | HSR | HMD | NOTNLM) "NLM"
Status (Completed | In-Process | PubMed-not-MEDLINE | In-Data-Review
|Publisher | MEDLINE | OLDMEDLINE) #REQUIRED
VersionID CDATA #IMPLIED
VersionDate CDATA #IMPLIED>
The new attribute for the PMID element is Version.
DTD:
<!ELEMENT PMID (#PCDATA)>
<!ATTLIST PMID
Version CDATA #REQUIRED>
Sample XML:
<MedlineCitation Status = "MEDLINE" Owner ="NLM" VersionID =
"PMC2781303.2" VersionDate = 20091207">
<PMID Version = "2">20029669</PMID>
Search and display implementation in PubMed is under consideration at this
time; all implementation decisions will be documented in a forthcoming NLM
Technical Bulletin <http://www.nlm.nih.gov/pubs/techbull/tb.html> article.
Details are:
- The PMID combined with its version attribute value (e.g., 1, 2, 3)
becomes the citation’s new unique identifier, represented as <PMID
Version="2">12345678</PMID>.
- The PMID Version attribute value ‘1’ will be assigned to all
existing records at the time the 2011 baseline files are
produced and
exported.
- The PubDate value on citations of versions of the same article
will be identical. The MedlineCitation Version Date and
VersionID attribute
values supplied by the publisher will identify the
specific version.
- If a citation is not for PMID Version 1, it must contain
MedlineCitation Version Date and VersionID attribute
values, and the
original publication date for Version 1 as the PubDate.
- A PMID Version attribute value higher than 1 indicates that
there is a citation for at least one prior version
(although it might
happen, rarely, that a prior version subsequently gets
deleted). Although
the MedlineCitation VersionDate value may be different
from the PubDate, it
might be the same as PubDate if the new version was
released later the same
day.
- In the future, when non-PubMed Central journals are included,
the publisher-supplied VersionID value will be whatever
the publisher
decides it to be; e.g. the 2nd publisher-supplied
VersionID may be ‘b’ or
‘2b’ and the PMID Version attribute value assigned by NLM
will still be 2.
1. Eliminating Pre-defined Source Attribute Values for NameID
Element
The 2010 DTD specifies the values that may be used for the Source
attribute for the NameID element. Please note that the NameID
element has
not yet been used; it is expected to be implemented at some
point during the
2011.
DTD:
<!ELEMENT NameID (#PCDATA)>
<!ATTLIST NameID Source CDATA #REQUIRED>
Sample XML:
<NameID Source = "NCBI">
1. Simplifying Author Element Structure
The NameID element has been repositioned in the Author element to
simplify the DTD structure. The XML is not affected.
DTD:
<!ELEMENT Author (((LastName, ForeName?, Initials?, Suffix? ) |
CollectiveName),NameID*)>
1. Accommodating Identification of Machine-generated Keywords. A new
valid value, NLM-AUTO, is added to the Owner attribute of the element
KeywordList.
DTD:
<!ELEMENT KeywordList (Keyword+)>
<!ATTLIST KeywordList Owner (NLM | NLM-AUTO | NASA | PIP | KIE |
NOTNLM) "NLM">
Sample XML:
<KeywordList Owner ="NLM-AUTO">
1. ENHANCED CHARACTER
SET<http://www.nlm.nih.gov/databases/dtd/medline_characters.html>
A subset of UTF-8 characters is currently supported for PubMed data. PubMed
data now supports the full UTF-8 Character Set.
Exceptions:
All instances that represent a Double Quote will be translated to the
straight double quote (Unicode 0022).
All instances that represent a Single Quote (including the prime and
apostrophe) will be translated to the straight single quote (Unicode 0027).
Em Dash, En Dash, Hyphen, or Minus will be translated to the single dash
(Unicode 002D).
Those three Unicode values are part of the current Character Set.
_______________________________________________
Utilities-announce mailing list
http://www.ncbi.nlm.nih.gov/mailman/listinfo/utilities-announce
More information about the Biopython-dev
mailing list