[Biopython] Entrez.parse error

Peter Cock p.j.a.cock at googlemail.com
Thu Dec 22 03:47:55 UTC 2016


On Wed, Dec 21, 2016 at 7:47 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> In what sense is the current result from Entrez.read more difficult to parse
> than the previous result from Entrez.parse?
> As far as I can tell, Entrez.read and Entrez.parse are both working
> correctly.
> Best,
> -Michiel

In this example we expected a list-like structure with an
entry for each record requested (here two), allowing
iteration over these records with Entrez.parse as in the
original example:

from Bio import Entrez
Entrez.email = "Your.Name.Here at example.org"
handle = Entrez.efetch("pubmed", id="19304878,14630660", retmode="xml")
records = Entrez.parse(handle)
for record in records:
    print(record['MedlineCitation']['Article']['ArticleTitle’])

That no longer works - it seems the Entrez parsing code no
longer thinks what the NCBI returns is list-like, and so
Entrez.parse rejects it, saying using Entrez.read to load
everything at once.

This works perfectly with our Tests/Entrez/pubmed2.xml
example file (also two PubMed articles), and at first glance
the XML structure is the same (other than the DTD update).

The top level XML tag's DTD has changed slightly:

<!ELEMENT PubmedArticleSet (PubmedArticle | PubmedBookArticle)+>

Now with pubmed_170101.dtd this can be a deletion:

<!ELEMENT PubmedArticleSet ((PubmedArticle | PubmedBookArticle)+,
DeleteCitation?) >

I remain puzzled about what exactly has changed here.

Peter



More information about the Biopython mailing list