[Biopython] help! entrez esearch popset issue
Kevin Murray
kevin at kdmurray.id.au
Tue Feb 4 12:34:56 UTC 2014
Bartha,
I believe that the retstart keyword argument is your friend.
Something like [Completely contrived and untested]:
request = Entrez.read(Entrez.esearch(db, qry, retstart=0))
answers = request["IdList"]
expected = int(request["Count"])
returned = len(answers)
while returned < expected:
request = Entrez.read(Entrez.esearch(db,
qry,retstart=returned))
returned += len(request["IdList"])
answers.extend(request["IdList"])
print(answers)
This is documented here:
http://www.ncbi.nlm.nih.gov/books/NBK25499/#_chapter4_EFetch_
Others may have more intelligent/complete solutions.
Cheers,
Kevin
On Tue, 4 Feb 2014 11:38:46 +0100
Bartha Dániel <bartha.daniel at agrar.mta.hu> wrote:
>Hi People,
>
>I have an issue with biopythons esearch/efetch, and this drives me
>crazy.
>
>If I search for something in the PopSet, like this, but the query is
>arbitrary:
>
>query = "Homo sapiens[Organism] NOT mitochondrion[All Fields]";
>
>esearch_handle = Entrez.esearch(db="popset", term=query)
>search_results = Entrez.read(esearch_handle)
>accnos = search_results['IdList']
>
>I get somehow always only 20 results in my IdList, but with the same
>term, many thousands on the website. Is this a bug?
>
>Because by default, on the website, 20 results per page are shown, and
>surprise, my 20 results are equal with the first page. The biopython
>documentation regarding the PopSet DB is not very talkative, so I ask
>you, how do I solve this problem elegant ("python only")?
>
>Since the same constellation doesn't cause any issues by searching in
>the protein or other sequence DB, either has the PopSet DB some tricks
>I don't kow or this is a BUG(?).
>
>
>Regards:
>
>Daniel
>
>
>
More information about the Biopython
mailing list