[Biopython-dev] [Biopython - Bug #3430] (New) Error parsing PubMedCentral XML files

redmine at redmine.open-bio.org redmine at redmine.open-bio.org
Sat Apr 27 19:46:51 UTC 2013

Issue #3430 has been reported by Paulo Nuin.

Bug #3430: Error parsing PubMedCentral XML files

Author: Paulo Nuin
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 

It seems that there is an error parsing locally downloaded PubMedCentral xml (extension nxml) files. Using the code 

from Bio import Entrez
handle = open('nihms83342.nxml')
records = Entrez.parse(handle)
for record in records:
    print record

the following error occurs (copied from iPython), even though the XML header contains the declaration

NotXMLError                               Traceback (most recent call last)
<ipython-input-5-e78d8d3c3888> in <module>()
      2 handle = open('nihms83342.nxml')
      3 records = Entrez.parse(handle)
----> 4 for record in records:
      5     print record

/Library/Python/2.7/site-packages/Bio/Entrez/Parser.pyc in parse(self, handle)
    229                         # We did not see the initial <!xml declaration, so
    230                         # probably the input data is not in XML format.
--> 231                         raise NotXMLError("XML declaration not found")
    232                 self.parser.Parse("", True)
    233                 self.parser = None

NotXMLError: Failed to parse the XML data (XML declaration not found). Please make sure that the input data are in XML format.

The XML file in question is attached.

You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org

More information about the Biopython-dev mailing list