[Biopython] Pubmeddata XML parsing with Entrez .fetch and .read

Guy Eakin guyeakin at gmail.com
Wed Jul 14 20:48:41 UTC 2010


I am using Bio.Entrez.read to parse XML returned from pubmed.

This results in a dictionary for which one of the keys is ArticleIDList,
e.g,
Example
PubmedData': {u'ArticleIdList': ['S0735-6757(09)00464-1',
'10.1016/j.ajem.2009.09.013', '20579576'], blah: blah, etc.}

In the original XML Each <ArticleID> in <ArticleIDList> contains an IDtype
attribute that names the ID. for example
<ArtcleID IDType="doi">10.1016/j.ajem.2009.09.013</ArticleID>

the IDtype is useful, but I can't find it in the Bio.Entrez.read output, so
I have no *easy* way of determining whether the ID# is
pii, pmc, pmid, etc.

Is there a better way to get the IDtype attribute, or other XML tag
attributes from the Entrez.read output?

Thanks.
Guy

Code below

---------------
from Bio import Medline
from Bio import Entrez
import routine_pubmed_query_terms as pubmedterms #this is a separate .py
file that I use to hold query terms, email address, etc.

  s = Entrez.read(Entrez.esearch(db="pubmed",
                                   term=pubmedterms.entrezquery(program),
                                   retmax=pubmedterms.maxlimit,
                                   usehistory="y",
                                   reldate=pubmedterms.datelimit,
                                   datetype="edat"))
    print "found %s records, returning %s" % (int(s["Count"]),
                                                  len(s["IdList"]))

    r = Entrez.read(Entrez.efetch(db="pubmed",retmode="xml",
                                  rettype='medline', webenv=s["WebEnv"],
                                  query_key=s["QueryKey"]))



More information about the Biopython mailing list