[Biopython] need help! how to retrieve full text from Pubmed central ?

Michiel de Hoon mjldehoon at yahoo.com
Mon Jan 4 15:15:57 UTC 2010



--- On Mon, 1/4/10, Brad Chapman <chapmanb at 50mail.com> wrote:

> Following your example, doing:
> 
> from Bio import Entrez
> Entrez.email = 'yours at blah.com'
> handle = Entrez.efetch(db='pmc', id=2747014,
> rettype='full', retmode='xml')
> handle.read()
> 
> gives back the full XML text, as you wanted. Your next
> step, calling
> Entrez.read, asks Biopython to parse this into a record
> object.
> There isn't support in Biopython for this currently, 

This *is* supported by Biopython. In principle, Bio.Entrez can parse any XML generated by NCBI Entrez as long as the corresponding DTDs are available. In this case, the DTD included in Biopython 1.53 is corrupted, causing the error. Unfortunately, the correct DTD relies on a large number of other DTDs, so just replacing the one DTD is not sufficient.

Hmm... maybe we should think of a more robust way of getting the DTDs without relying on their inclusion in the Biopython distribution ...

--Michiel.


      



More information about the Biopython mailing list