[Biopython] Entrez.parse error
Michiel de Hoon
mjldehoon at yahoo.com
Wed Dec 21 07:47:52 UTC 2016
In what sense is the current result from Entrez.read more difficult to parse than the previous result from Entrez.parse?As far as I can tell, Entrez.read and Entrez.parse are both working correctly.Best,-Michiel
On Tuesday, December 20, 2016 1:43 PM, Konrad Koehler <konrad.koehler at mac.com> wrote:
Then how does one parse the output? Entrez.parse used to work, but no longer. Apparently NCBI has made changes to their xml that has broken Entrez.parse. Entrez.read returns a complex data structure that is difficult to parse.If one adds "['PubmedArticle']" to line 302 of /Bio/Entrez/Parse.py so that it reads:records = self.stack[0]['PubmedArticle']this suppresses the error message, but it mysteriously returns only the strings "PubmedArticle" and "PubmedBookArticle" and not the citation. Any ideas?
Konrad
On 20 Dec 2016, at 05:16, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
Entrez.read works for me for the example shown.
Best,-Michiel
On Sunday, December 18, 2016 11:57 AM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
On Sun, Dec 18, 2016 at 2:50 AM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Thu, Dec 15, 2016 at 7:37 PM, Konrad Koehler <konrad.koehler at mac.com> wrote:
>> Hello everyone,
>>
>> I have been using Entrez.parse for years without any errors. However just
>> in the last day or two, it stopped working. I have been able to reproduce
>> the error using the following example from the biopython Package Entrez
>> documentation:
>>
>
> I can reproduce this. The XML looks sensible, two <PubmedArticle>
> tags:
>
> <?xml version="1.0" ?>
> <!DOCTYPE PubmedArticleSet PUBLIC "-//NLM//DTD PubMedArticle, 1st
> January 2017//EN"
> "https://dtd.nlm.nih.gov/ncbi/pubmed/out/pubmed_170101.dtd">
> <PubmedArticleSet>
> <PubmedArticle>
> <MedlineCitation Status="MEDLINE" Owner="NLM">
> <PMID Version="1">19304878</PMID>
> ...
> </MedlineCitation>
> <PubmedData>
> ...
> </PubmedData>
> </PubmedArticle>
> <PubmedArticle>
> <MedlineCitation Status="MEDLINE" Owner="NLM">
> <PMID Version="1">14630660</PMID>
> ...
> </MedlineCitation>
> <PubmedData>
> ...
> </PubmedData>
> </PubmedArticle>
> </PubmedArticleSet>
>
> Note however it is using a new DTD file for Jan 2017,
>
> https://dtd.nlm.nih.gov/ncbi/pubmed/out/pubmed_170101.dtd
>
>
>> Does anyone have any suggestions on how to get Entrez.parse working again? I
>> am also curious why this stopped working. Has the NCBI server changed?
>>
>
> I would guess that the NCBI changed something subtly. Michiel?
>
> Peter
Logged on GitHub,
https://github.com/biopython/biopython/issues/1027
Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20161221/91ae474f/attachment.html>
More information about the Biopython
mailing list