[Biopython] help! entrez esearch popset issue

Kevin Murray kevin at kdmurray.id.au
Tue Feb 4 12:34:56 UTC 2014


Bartha,

I believe that the retstart keyword argument is your friend.
Something like [Completely contrived and untested]:

request = Entrez.read(Entrez.esearch(db, qry, retstart=0))
answers = request["IdList"]
expected = int(request["Count"])
returned =  len(answers)
while returned < expected:
	request = Entrez.read(Entrez.esearch(db,
				qry,retstart=returned))
	returned += len(request["IdList"])
	answers.extend(request["IdList"])
	print(answers)

This is documented here:
http://www.ncbi.nlm.nih.gov/books/NBK25499/#_chapter4_EFetch_

Others may have more intelligent/complete solutions.

Cheers,
Kevin

On Tue, 4 Feb 2014 11:38:46 +0100
Bartha Dániel <bartha.daniel at agrar.mta.hu> wrote:

>Hi People,
>
>I have an issue with biopythons esearch/efetch, and this drives me
>crazy.
>
>If I search for something in the PopSet, like this, but the query is
>arbitrary:
>
>query = "Homo sapiens[Organism] NOT mitochondrion[All Fields]";
>
>esearch_handle = Entrez.esearch(db="popset", term=query)
>search_results = Entrez.read(esearch_handle)
>accnos = search_results['IdList']
>
>I get somehow always only 20 results in my IdList, but with the same
>term, many thousands on the website. Is this a bug?
>
>Because by default, on the website, 20 results per page are shown, and
>surprise, my 20 results are equal with the first page. The biopython
>documentation regarding the PopSet DB is not very talkative, so I ask
>you, how do I solve this problem elegant ("python only")?
>
>Since the same constellation doesn't cause any issues by searching in
>the protein or other sequence DB, either has the PopSet DB some tricks
>I don't kow or this is a BUG(?).
>
>
>Regards:
>
>Daniel
>
>
>




More information about the Biopython mailing list