[Biopython] Entrez module and obtaining organism metadata

Peter Cock p.j.a.cock at googlemail.com
Wed Jul 12 16:54:58 UTC 2017


Thank you for letting us know.

I'm sure the NCBI Entrez team will take into account these sorts
of requests as-and-when they have resources to expand the
web API.

Peter

On Wed, Jul 12, 2017 at 5:07 PM, Kirk Vander Meulen <kvander11 at gmail.com> wrote:
> Thank you, Peter. I did contact NCBI and the preferred work-around is to
> download the rather large bioproject.xml file from the ftp site and work
> with that.
>
> On Tue, Jul 11, 2017 at 10:52 AM, Peter Cock <p.j.a.cock at googlemail.com>
> wrote:
>>
>> Hi Kirk,
>>
>> Unfortunately based on the NCBI documentation, with efetch
>> and the genome database it appears only docsum and uilist
>> are supported:
>>
>>
>> https://www.ncbi.nlm.nih.gov/books/NBK25499/table/chapter4.T._valid_values_of__retmode_and/
>>
>> Unless one of the Biopython community has tried something
>> similar, you might be best contacting the NCBI Entrez team
>> directly to ask. Please do let us know if you solve it.
>>
>> Good luck,
>>
>> Peter
>>
>> On Tue, Jul 11, 2017 at 3:56 PM, Kirk Vander Meulen <kvander11 at gmail.com>
>> wrote:
>> > I am trying to extract metadata, specifically things like temperature
>> > range
>> > and oxygen requirement status, for a large list of organisms. I am
>> > having
>> > trouble accomplishing this by searching the genome database using either
>> > efetch or esummary. No matter the route I take, it appears I end up with
>> > the
>> > "docsum" version of data. As an example, I am searching for the genome
>> > id#
>> > 13296. If I search the genome database on NCBI's website, I get a
>> > results
>> > page corresponding to a page ending with ".../genome/?term=13296", which
>> > often contains my desired metadata. But when using the Entrez module, I
>> > get
>> > results corresponding to ".../genome/?term=13296&report=docsum", which
>> > contains pretty bare-bones level of information.
>> >
>> > Here is an example of what I have tried and the result:
>> >
>> >>>>record=Entrez.esummary(gb="genome,id="13296")
>> >>>>Entrez.read(record)
>> > [{'Number_of_Organelles': '0', 'Number_of_Plasmids': '0', 'Create_Date':
>> > '2012/03/29 00:00', 'ProjectID': '157423', 'Assembly_Accession':
>> > 'GCA_000284295.1', 'Number_of_Chromosomes': '1', 'Options': '',
>> > 'DefLine':
>> > 'Actinoplanes missouriensis overview', u'Item': [], 'Organism_Kingdom':
>> > 'Bacteria', 'Assembly_Name': 'ASM28429v1', 'AssemblyID': '408138',
>> > 'Organism_Name': 'Actinoplanes missouriensis', u'Id': '13296'}]
>> >
>> >
>> > I have tried playing with rettype="full", retmode="text" and the like,
>> > as
>> > well as using efetch instead of esummary, but regardless I only get this
>> > more limited set of data. Is there a different approach I should be
>> > taking
>> > here? Thank you very much for any pointers.
>> >
>> > Kirk
>> >
>> > _______________________________________________
>> > Biopython mailing list  -  Biopython at mailman.open-bio.org
>> > http://mailman.open-bio.org/mailman/listinfo/biopython
>
>
>
> _______________________________________________
> Biopython mailing list  -  Biopython at mailman.open-bio.org
> http://mailman.open-bio.org/mailman/listinfo/biopython


More information about the Biopython mailing list