[BioPython] Bio.EUtils, MultiDict: getting all the authors?

Peter biopython at maubp.freeserve.co.uk
Tue Aug 12 22:19:52 UTC 2008


> I would suggest you try Bio.Entrez.efetch() to get the data as XML,
> and the Bio.Entrez.read() function to parse the XML.  You'll get a
> nested structure of python dictionaries and lists.  See "Chapter 7" of
> the Tutorial,
> http://www.biopython.org/DIST/docs/tutorial/Tutorial.html
>
> Was there anything particular piece of information you wanted to extract?

Assuming it was just the author list, something like this might suit you:

from Bio import Entrez
PMIDs = "17447753,17447754"
handle = Entrez.efetch(db="pubmed", id=PMIDs, retmode="XML")
records = Entrez.read(handle)
for record in records :
    print record['MedlineCitation']['Article']['ArticleTitle']
    for author_dict in record['MedlineCitation']['Article']['AuthorList'] :
        print " - %(ForeName)s %(LastName)s" % author_dict
handle.close()

And the output,

Analysis of HIV wild-type and mutant structures via in silico docking
against diverse ligand libraries.
 - Max W Chang
 - William Lindstrom
 - Arthur J Olson
 - Richard K Belew
Synthesis and spectroscopic characterization of copper(II)-nitrito
complexes with hydrotris(pyrazolyl)borate and related coligands.
 - Nicolai Lehnert
 - Ursula Cornelissen
 - Frank Neese
 - Tetsuya Ono
 - Yuki Noguchi
 - Ken-Ichi Okamoto
 - Kiyoshi Fujisawa

And done.  The author's initial are also included in the dictionary
(but not printed).

If you are familar with the XML DTD, working out where the data you
want is much easier!  As you desired, the Bio.Entrez parser does stay
close to the DTDs - both a blessing and a curse.

Peter



More information about the Biopython mailing list