[Biopython] Iterate Entrez.esearch in Biopython

Peter Cock p.j.a.cock at googlemail.com
Tue Dec 8 09:36:11 UTC 2015


Hi Jarrod

The simplest solution is two for loops, nested. i.e.

# do imports
# load lists, or set them like this
list_of_species = ["E. coli", "H. sapiens", "M. tardes"]
list_of_genes = ["yyy", "zzz", "aaa"]
for species in list_of_species:
    for gene in list_of_genes:
        # do the search for this species, gene combination

Depending on what you want to do with the results, you
might record the counts in a dictionary in memory, or
maybe write them to a file.

Is that enough to make progress or do you need a bit more
guidance?

Also, when you build the Entrez query string, the species
name should (I think) be quoted as the full name rather
than with an abbreviated genus, e.g.

"Homo sapiens"[ORGN] AND yyy[GENE]

To do that in python, the easiest way to get the double
quotes is to use single quotes for the Python string,

'"{0}"[ORGN] AND {1}[GENE]'.format(species, gene)

That is: single quote, double quote, open braces, zero, ...

Peter


On Tue, Dec 8, 2015 at 3:04 AM, Jarrod Scott <jjscott at uwalumni.com> wrote:
> Greetings all.
>
> I have 1) a list of species and 2) a list of genes. I am trying to use
> Entrez.esearch within Biopython to get a list of accession numbers from NCBI
> for each gene from each species. I wrote a small code that can do it for one
> gene and one species but have been unsuccessful at creating a code to
> iterate through the lists. Here is an example of the code that works. This
> returns '11' hits which matches a simple GenBank search. Any help on how to
> iterate through two list would be most appreciated.
>
> Jarrod
>
> import sys
> import time
> from Bio import Entrez
> Entrez.email = "jjscott at uwalumni.com"
> gene = 'tufA'
> species = 'Codium decorticatum'
> terms = "{0}[orgn] AND {1}[Gene]".format(species, gene)
> handle = Entrez.esearch(db = "nucleotide", term = terms)
> record = Entrez.read(handle)
> record["Count"]
> record["IdList"]
>
>
>
> Example files:
>
> Species:
>
> E. coli
> H. Sapien
> M. tardes
>
> Genes:
> yyy
> zzz
> aaa
>
>
> biopython-1.66
> Python 2.7.9 :: Anaconda 2.2.0 (x86_64)
> OS X Yosemite 10.10.2
>
> _______________________________________________
> Biopython mailing list  -  Biopython at mailman.open-bio.org
> http://mailman.open-bio.org/mailman/listinfo/biopython


More information about the Biopython mailing list