[Biopython] Entrez module and obtaining organism metadata

Kirk Vander Meulen kvander11 at gmail.com
Wed Jul 12 16:07:41 UTC 2017


Thank you, Peter. I did contact NCBI and the preferred work-around is to
download the rather large bioproject.xml file from the ftp site and work
with that.

On Tue, Jul 11, 2017 at 10:52 AM, Peter Cock <p.j.a.cock at googlemail.com>
wrote:

> Hi Kirk,
>
> Unfortunately based on the NCBI documentation, with efetch
> and the genome database it appears only docsum and uilist
> are supported:
>
> https://www.ncbi.nlm.nih.gov/books/NBK25499/table/chapter4.
> T._valid_values_of__retmode_and/
>
> Unless one of the Biopython community has tried something
> similar, you might be best contacting the NCBI Entrez team
> directly to ask. Please do let us know if you solve it.
>
> Good luck,
>
> Peter
>
> On Tue, Jul 11, 2017 at 3:56 PM, Kirk Vander Meulen <kvander11 at gmail.com>
> wrote:
> > I am trying to extract metadata, specifically things like temperature
> range
> > and oxygen requirement status, for a large list of organisms. I am having
> > trouble accomplishing this by searching the genome database using either
> > efetch or esummary. No matter the route I take, it appears I end up with
> the
> > "docsum" version of data. As an example, I am searching for the genome
> id#
> > 13296. If I search the genome database on NCBI's website, I get a results
> > page corresponding to a page ending with ".../genome/?term=13296", which
> > often contains my desired metadata. But when using the Entrez module, I
> get
> > results corresponding to ".../genome/?term=13296&report=docsum", which
> > contains pretty bare-bones level of information.
> >
> > Here is an example of what I have tried and the result:
> >
> >>>>record=Entrez.esummary(gb="genome,id="13296")
> >>>>Entrez.read(record)
> > [{'Number_of_Organelles': '0', 'Number_of_Plasmids': '0', 'Create_Date':
> > '2012/03/29 00:00', 'ProjectID': '157423', 'Assembly_Accession':
> > 'GCA_000284295.1', 'Number_of_Chromosomes': '1', 'Options': '',
> 'DefLine':
> > 'Actinoplanes missouriensis overview', u'Item': [], 'Organism_Kingdom':
> > 'Bacteria', 'Assembly_Name': 'ASM28429v1', 'AssemblyID': '408138',
> > 'Organism_Name': 'Actinoplanes missouriensis', u'Id': '13296'}]
> >
> >
> > I have tried playing with rettype="full", retmode="text" and the like, as
> > well as using efetch instead of esummary, but regardless I only get this
> > more limited set of data. Is there a different approach I should be
> taking
> > here? Thank you very much for any pointers.
> >
> > Kirk
> >
> > _______________________________________________
> > Biopython mailing list  -  Biopython at mailman.open-bio.org
> > http://mailman.open-bio.org/mailman/listinfo/biopython
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20170712/2d5c41e6/attachment.html>


More information about the Biopython mailing list