[Biopython] Entrez.parse error

Konrad Koehler konrad.koehler at mac.com
Tue Dec 20 04:43:38 UTC 2016


Then how does one parse the output? Entrez.parse used to work, but no longer. Apparently NCBI has made changes to their xml that has broken Entrez.parse. Entrez.read returns a complex data structure that is difficult to parse.

If one adds "['PubmedArticle']" to line 302 of /Bio/Entrez/Parse.py so that it reads:

records = self.stack[0]['PubmedArticle']

this suppresses the error message, but it mysteriously returns only the strings "PubmedArticle" and "PubmedBookArticle" and not the citation. Any ideas?

Konrad

> On 20 Dec 2016, at 05:16, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> 
> Entrez.read works for me for the example shown.
> 
> Best,
> -Michiel
> 
> 
> On Sunday, December 18, 2016 11:57 AM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> 
> 
> On Sun, Dec 18, 2016 at 2:50 AM, Peter Cock <p.j.a.cock at googlemail.com <mailto:p.j.a.cock at googlemail.com>> wrote:
> > On Thu, Dec 15, 2016 at 7:37 PM, Konrad Koehler <konrad.koehler at mac.com <mailto:konrad.koehler at mac.com>> wrote:
> >> Hello everyone,
> >>
> >> I have been using Entrez.parse for years without any errors.  However just
> >> in the last day or two, it stopped working.  I have been able to reproduce
> >> the error using the following example from the biopython Package Entrez
> >> documentation:
> >>
> >
> > I can reproduce this. The XML looks sensible, two <PubmedArticle>
> > tags:
> >
> > <?xml version="1.0" ?>
> > <!DOCTYPE PubmedArticleSet PUBLIC "-//NLM//DTD PubMedArticle, 1st
> > January 2017//EN"
> > "https://dtd.nlm.nih.gov/ncbi/pubmed/out/pubmed_170101.dtd <https://dtd.nlm.nih.gov/ncbi/pubmed/out/pubmed_170101.dtd>">
> > <PubmedArticleSet>
> > <PubmedArticle>
> >    <MedlineCitation Status="MEDLINE" Owner="NLM">
> >        <PMID Version="1">19304878</PMID>
> >        ...
> >    </MedlineCitation>
> >    <PubmedData>
> >        ...
> >    </PubmedData>
> > </PubmedArticle>
> > <PubmedArticle>
> >    <MedlineCitation Status="MEDLINE" Owner="NLM">
> >        <PMID Version="1">14630660</PMID>
> >        ...
> >    </MedlineCitation>
> >    <PubmedData>
> >        ...
> >    </PubmedData>
> > </PubmedArticle>
> > </PubmedArticleSet>
> >
> > Note however it is using a new DTD file for Jan 2017,
> >
> > https://dtd.nlm.nih.gov/ncbi/pubmed/out/pubmed_170101.dtd <https://dtd.nlm.nih.gov/ncbi/pubmed/out/pubmed_170101.dtd>
> >
> >
> >> Does anyone have any suggestions on how to get Entrez.parse working again? I
> >> am also curious why this stopped working.  Has the NCBI server changed?
> >>
> >
> > I would guess that the NCBI changed something subtly. Michiel?
> >
> > Peter
> 
> Logged on GitHub,
> 
> https://github.com/biopython/biopython/issues/1027 <https://github.com/biopython/biopython/issues/1027>
> 
> 
> Peter
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20161220/9704bf9a/attachment.html>


More information about the Biopython mailing list