[Biopython-dev] [Biopython (old issues only) - Bug #3430] (Resolved) Error parsing PubMedCentral XML files

redmine at redmine.open-bio.org redmine at redmine.open-bio.org
Fri Jul 22 19:51:10 UTC 2016

Issue #3430 has been updated by Travis Wrightsman.

Status changed from New to Resolved
% Done changed from 0 to 100

NXML file was improperly formatted.

Bug #3430: Error parsing PubMedCentral XML files

* Author: Paulo Nuin
* Status: Resolved
* Priority: Normal
* Assignee: Biopython Dev Mailing List
* Category: Main Distribution
* Target version: 
* URL: 
It seems that there is an error parsing locally downloaded PubMedCentral xml (extension nxml) files. Using the code 

from Bio import Entrez
handle = open('nihms83342.nxml')
records = Entrez.parse(handle)
for record in records:
    print record

the following error occurs (copied from iPython), even though the XML header contains the declaration

NotXMLError                               Traceback (most recent call last)
<ipython-input-5-e78d8d3c3888> in <module>()
      2 handle = open('nihms83342.nxml')
      3 records = Entrez.parse(handle)
----> 4 for record in records:
      5     print record

/Library/Python/2.7/site-packages/Bio/Entrez/Parser.pyc in parse(self, handle)
    229                         # We did not see the initial <!xml declaration, so
    230                         # probably the input data is not in XML format.
--> 231                         raise NotXMLError("XML declaration not found")
    232                 self.parser.Parse("", True)
    233                 self.parser = None

NotXMLError: Failed to parse the XML data (XML declaration not found). Please make sure that the input data are in XML format.

The XML file in question is attached.

nihms83342.nxml (74.9 KB)

You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython-dev/attachments/20160722/9cb28a48/attachment.html>

More information about the Biopython-dev mailing list