[Biopython] Entrez module and obtaining organism metadata

Kirk Vander Meulen kvander11 at gmail.com
Tue Jul 11 14:56:02 UTC 2017


I am trying to extract metadata, specifically things like temperature range
and oxygen requirement status, for a large list of organisms. I am having
trouble accomplishing this by searching the genome database using either
efetch or esummary. No matter the route I take, it appears I end up with
the "docsum" version of data. As an example, I am searching for the genome
id# 13296. If I search the genome database on NCBI's website, I get a
results page corresponding to a page ending with ".../genome/?term=13296",
which often contains my desired metadata. But when using the Entrez module,
I get results corresponding to ".../genome/?term=13296&report=docsum",
which contains pretty bare-bones level of information.

Here is an example of what I have tried and the result:

>>>record=Entrez.esummary(gb="genome,id="13296")
>>>Entrez.read(record)
[{'Number_of_Organelles': '0', 'Number_of_Plasmids': '0', 'Create_Date':
'2012/03/29 00:00', 'ProjectID': '157423', 'Assembly_Accession':
'GCA_000284295.1', 'Number_of_Chromosomes': '1', 'Options': '', 'DefLine':
'Actinoplanes missouriensis overview', u'Item': [], 'Organism_Kingdom':
'Bacteria', 'Assembly_Name': 'ASM28429v1', 'AssemblyID': '408138',
'Organism_Name': 'Actinoplanes missouriensis', u'Id': '13296'}]


I have tried playing with rettype="full", retmode="text" and the like, as
well as using efetch instead of esummary, but regardless I only get this
more limited set of data. Is there a different approach I should be taking
here? Thank you very much for any pointers.

Kirk
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20170711/2ded62fd/attachment.html>


More information about the Biopython mailing list