[Biopython] Parsing Pubmed-Entrez searches into a normalized relational resource

Christopher Walentas cwalentas at gmail.com
Thu Sep 16 04:36:13 UTC 2010


  Apologies in advance- all of this is very new to me- and I hope that 
this is the proper forum for this query.

What I would like to do is parse the returns of an entrez pubmed search 
into their smallest, unique useful bits and create a relational database 
(sqlite, dee?).  Ideally this would not only be of returned fields, but 
also drilling further down into say affiliation, addresses, etc...

I believe that I've mastered the search and download functions and 
individual citations exist as a stacked dictionary of the xml outputs.

Where I am falling down is understanding how to extract the structure of 
these outputs and create a persistent relational resource that's been 
normalized such that these fields can be mapped to used to "correct" 
values in an uncurated dataset with highly analogous fields.

I've been struggling to bridge the gap between python and sqlite/dee, 
however have recently been informed that it might be possible to do 
everything within python itself and again apologies for any navieties- 
they are indeed sincere, however I'm well aware that a little knowledge 
can be dangerous- hence reaching out.

 From what I've already read, it would seem that all of this is ideally 
suited to bio-/python and am looking forward to learning- I'm just 
looking for that swift shove in the right direction and to benefit from 
your collective informed guidance.

Cheers in advance,
christopher



More information about the Biopython mailing list