[Biopython] Pubmeddata XML parsing with Entrez .fetch and .read
Guy Eakin
guyeakin at gmail.com
Wed Jul 14 20:48:41 UTC 2010
I am using Bio.Entrez.read to parse XML returned from pubmed.
This results in a dictionary for which one of the keys is ArticleIDList,
e.g,
Example
PubmedData': {u'ArticleIdList': ['S0735-6757(09)00464-1',
'10.1016/j.ajem.2009.09.013', '20579576'], blah: blah, etc.}
In the original XML Each <ArticleID> in <ArticleIDList> contains an IDtype
attribute that names the ID. for example
<ArtcleID IDType="doi">10.1016/j.ajem.2009.09.013</ArticleID>
the IDtype is useful, but I can't find it in the Bio.Entrez.read output, so
I have no *easy* way of determining whether the ID# is
pii, pmc, pmid, etc.
Is there a better way to get the IDtype attribute, or other XML tag
attributes from the Entrez.read output?
Thanks.
Guy
Code below
---------------
from Bio import Medline
from Bio import Entrez
import routine_pubmed_query_terms as pubmedterms #this is a separate .py
file that I use to hold query terms, email address, etc.
s = Entrez.read(Entrez.esearch(db="pubmed",
term=pubmedterms.entrezquery(program),
retmax=pubmedterms.maxlimit,
usehistory="y",
reldate=pubmedterms.datelimit,
datetype="edat"))
print "found %s records, returning %s" % (int(s["Count"]),
len(s["IdList"]))
r = Entrez.read(Entrez.efetch(db="pubmed",retmode="xml",
rettype='medline', webenv=s["WebEnv"],
query_key=s["QueryKey"]))
More information about the Biopython
mailing list