[Biopython] PubmedCentral XML parsing

Peter Cock p.j.a.cock at googlemail.com
Thu Apr 25 19:05:32 UTC 2013


On Thu, Apr 25, 2013 at 7:42 PM, Paulo Nuin <nuin at genedrift.org> wrote:
> Hi
>
> What would be the most direct way of parsing XML files downloaded from
> PubmedCentral ftp using BioPython?  These are files that use the
> archivearticle.dtd and when parsed using non-DTD based code generate broken
> paragraphs on the body of the document due to < > between <p> items of the
> body.
>
> Thanks in advance
>
> Paulo

The Bio.Entrez parser is DTD based, and might suit your needs.

Peter



More information about the Biopython mailing list