[Biopython] [BioPython] Question about using Entrez.epost

Thu May 7 04:59:44 EDT 2009

On Wed, May 6, 2009 at 6:07 PM, Walter Scheper <scheper at email.unc.edu> wrote:
>
> On May 6, 2009, at 12:43 PM, Peter wrote:
>
>> Where is the 700 ID limit from?  I don't recall any limits on the EPost
>> documentation.  Was this found by experimentation?
>> http://www.ncbi.nlm.nih.gov/entrez/query/static/epost_help.html
>>
> Yes, that was found by experimentation with epost. I don't think the limit
> is really tied to the number of Ids, but to the length of the string.
> However, I thought the purpose of epost was that you didn't run into url
> length problems, or is it more a way to queue up a request?

I think I see what is going on now.  Using an HTTP POST would avoid
long URL problems with an HTTP GET (something the NCBI documents
don't explain - at first reading the EPost command seems redundant).
However,  from looking over our Bio.Entrez code more carefully it
seems we are not actually using a POST after all - just a plain HTTP
GET.

This would explain the limit you are seeing - you are either the first
person to try such a long ID list with Bio.Entrez (personally I have
only ever downloaded much smaller datasets in one go), or at least
you are the first person to actually report it is broken.  So thank you!

I've filed Bug 2824 on this issue:
http://bugzilla.open-bio.org/show_bug.cgi?id=2824

>> Note that with such large numbers, once you have got EPost to work, you
>> should probably be using EFetch to download the results in batches.
>
> Sure. I suppose I'll just have to break the whole thing, search and
> retrieval, into chunks.

In the short term yes, that is a practical solution.  If you would be happy to
update your installation (or at least, update the Bio/Entrez/__init__.py file)
then you can help test any fix.

Peter