[Biopython] Entrez module and obtaining organism metadata

Peter Cock p.j.a.cock at googlemail.com
Tue Jul 11 15:52:14 UTC 2017


Hi Kirk,

Unfortunately based on the NCBI documentation, with efetch
and the genome database it appears only docsum and uilist
are supported:

https://www.ncbi.nlm.nih.gov/books/NBK25499/table/chapter4.T._valid_values_of__retmode_and/

Unless one of the Biopython community has tried something
similar, you might be best contacting the NCBI Entrez team
directly to ask. Please do let us know if you solve it.

Good luck,

Peter

On Tue, Jul 11, 2017 at 3:56 PM, Kirk Vander Meulen <kvander11 at gmail.com> wrote:
> I am trying to extract metadata, specifically things like temperature range
> and oxygen requirement status, for a large list of organisms. I am having
> trouble accomplishing this by searching the genome database using either
> efetch or esummary. No matter the route I take, it appears I end up with the
> "docsum" version of data. As an example, I am searching for the genome id#
> 13296. If I search the genome database on NCBI's website, I get a results
> page corresponding to a page ending with ".../genome/?term=13296", which
> often contains my desired metadata. But when using the Entrez module, I get
> results corresponding to ".../genome/?term=13296&report=docsum", which
> contains pretty bare-bones level of information.
>
> Here is an example of what I have tried and the result:
>
>>>>record=Entrez.esummary(gb="genome,id="13296")
>>>>Entrez.read(record)
> [{'Number_of_Organelles': '0', 'Number_of_Plasmids': '0', 'Create_Date':
> '2012/03/29 00:00', 'ProjectID': '157423', 'Assembly_Accession':
> 'GCA_000284295.1', 'Number_of_Chromosomes': '1', 'Options': '', 'DefLine':
> 'Actinoplanes missouriensis overview', u'Item': [], 'Organism_Kingdom':
> 'Bacteria', 'Assembly_Name': 'ASM28429v1', 'AssemblyID': '408138',
> 'Organism_Name': 'Actinoplanes missouriensis', u'Id': '13296'}]
>
>
> I have tried playing with rettype="full", retmode="text" and the like, as
> well as using efetch instead of esummary, but regardless I only get this
> more limited set of data. Is there a different approach I should be taking
> here? Thank you very much for any pointers.
>
> Kirk
>
> _______________________________________________
> Biopython mailing list  -  Biopython at mailman.open-bio.org
> http://mailman.open-bio.org/mailman/listinfo/biopython


More information about the Biopython mailing list