[Biopython] Pubmeddata XML parsing with Entrez .fetch and .read

Peter biopython at maubp.freeserve.co.uk
Wed Jul 14 21:18:34 UTC 2010


On Wed, Jul 14, 2010 at 9:48 PM, Guy Eakin <guyeakin at gmail.com> wrote:
> I am using Bio.Entrez.read to parse XML returned from pubmed.
>
> This results in a dictionary for which one of the keys is ArticleIDList,
> e.g,
> Example
> PubmedData': {u'ArticleIdList': ['S0735-6757(09)00464-1',
> '10.1016/j.ajem.2009.09.013', '20579576'], blah: blah, etc.}
>
> In the original XML Each <ArticleID> in <ArticleIDList> contains an IDtype
> attribute that names the ID. for example
> <ArtcleID IDType="doi">10.1016/j.ajem.2009.09.013</ArticleID>
>
> the IDtype is useful, but I can't find it in the Bio.Entrez.read output, so
> I have no *easy* way of determining whether the ID# is
> pii, pmc, pmid, etc.
>
> Is there a better way to get the IDtype attribute, or other XML tag
> attributes from the Entrez.read output?
>

Hi,

This information is in the tutorial, but could perhaps be clearer.
It might look like you get strings back, but in fact it is a subclass
with an attributes property (a dictionary). e.g.

from Bio import Entrez
handle = Entrez.efetch(db="pubmed",retmode="xml",rettype='medline',id='19304878')
r = Entrez.read(handle)
handle.close()
print r[0]['PubmedData']['ArticleIdList'][1]
print r[0]['PubmedData']['ArticleIdList'][1].attributes

Michiel - maybe we need to override the __repr__ method so it shows
this information?

Peter



More information about the Biopython mailing list