[Biopython] PubmedCentral XML parsing

Paulo Nuin nuin at genedrift.org
Thu Apr 25 18:42:07 UTC 2013


Hi

What would be the most direct way of parsing XML files downloaded from PubmedCentral ftp using BioPython?  These are files that use the archivearticle.dtd and when parsed using non-DTD based code generate broken paragraphs on the body of the document due to < > between <p> items of the body.

Thanks in advance

Paulo 



More information about the Biopython mailing list