[Biopython-dev] [Biopython (old issues only) - Bug #3430] (Resolved) Error parsing PubMedCentral XML files
redmine at redmine.open-bio.org
redmine at redmine.open-bio.org
Fri Jul 22 19:51:10 UTC 2016
Issue #3430 has been updated by Travis Wrightsman.
Status changed from New to Resolved
% Done changed from 0 to 100
NXML file was improperly formatted.
----------------------------------------
Bug #3430: Error parsing PubMedCentral XML files
https://redmine.open-bio.org/issues/3430#change-15300
* Author: Paulo Nuin
* Status: Resolved
* Priority: Normal
* Assignee: Biopython Dev Mailing List
* Category: Main Distribution
* Target version:
* URL:
----------------------------------------
It seems that there is an error parsing locally downloaded PubMedCentral xml (extension nxml) files. Using the code
@
from Bio import Entrez
handle = open('nihms83342.nxml')
records = Entrez.parse(handle)
for record in records:
print record
@
the following error occurs (copied from iPython), even though the XML header contains the declaration
---------------------------------------------------------------------------
NotXMLError Traceback (most recent call last)
<ipython-input-5-e78d8d3c3888> in <module>()
2 handle = open('nihms83342.nxml')
3 records = Entrez.parse(handle)
----> 4 for record in records:
5 print record
/Library/Python/2.7/site-packages/Bio/Entrez/Parser.pyc in parse(self, handle)
229 # We did not see the initial <!xml declaration, so
230 # probably the input data is not in XML format.
--> 231 raise NotXMLError("XML declaration not found")
232 self.parser.Parse("", True)
233 self.parser = None
NotXMLError: Failed to parse the XML data (XML declaration not found). Please make sure that the input data are in XML format.
The XML file in question is attached.
---Files--------------------------------
nihms83342.nxml (74.9 KB)
--
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython-dev/attachments/20160722/9cb28a48/attachment.html>
More information about the Biopython-dev
mailing list