[Biopython] [eFetch] doesn't work with NLMcatalog

c.buhtz at posteo.jp c.buhtz at posteo.jp
Mon Dec 7 18:14:14 UTC 2015


There is a problem while parsing the XML-stuff. Don't know why and
don't know how I could do more diagnosis on this problem (e.g. the
handle).

Please see at the end the URL. Using it directly in browser give a nice
result.

>>> h = Entrez.efetch(db='nlmcatalog', id='7508686', retmode='xml')
>>> r = Entrez.read(h)

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.4/dist-packages/Bio/Entrez/__init__.py",
line 421, in read record = handler.read(handle)
  File "/usr/local/lib/python3.4/dist-packages/Bio/Entrez/Parser.py",
line 215, in read self.parser.ParseFile(handle)
  File "../Modules/pyexpat.c", line 405, in StartElement
  File "/usr/local/lib/python3.4/dist-packages/Bio/Entrez/Parser.py",
line 350, in startElementHandler raise ValidationError(name)
Bio.Entrez.Parser.ValidationError: Failed to find tag
'NLMCatalogRecordSet' in the DTD. To skip all tags that are not
represented in the DTD, please call Bio.Entrez.read or Bio.Entrez.parse
with validate=False.

>>> r = Entrez.read(h, validate=False)

Traceback (most recent call last):
  File "/usr/local/lib/python3.4/dist-packages/Bio/Entrez/Parser.py",
line 215, in read self.parser.ParseFile(handle)
xml.parsers.expat.ExpatError: not well-formed (invalid token): line 1,
column 5

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.4/dist-packages/Bio/Entrez/__init__.py",
line 421, in read record = handler.read(handle)
  File "/usr/local/lib/python3.4/dist-packages/Bio/Entrez/Parser.py",
line 225, in read raise NotXMLError(e)
Bio.Entrez.Parser.NotXMLError: Failed to parse the XML data (not
well-formed (invalid token): line 1, column 5). Please make sure that
the input data are in XML format.


>>> h.url
'http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?email=x&tool=biopython&id=7508686&db=nlmcatalog&retmode=xml'

If this is a bug or something that would take its time to fix I am
asking if there is a fast workaround?
-- 
GnuPGP-Key ID 0751A8EC


More information about the Biopython mailing list