[Biopython] [BioPython] Question about using Entrez.epost

Wed May 6 12:43:15 EDT 2009

On Wed, May 6, 2009 at 4:33 PM, Walter Scheper <scheper at email.unc.edu> wrote:
> Hey folks,
>
> I'm currently working on a project where we need to download large numbers
> (1000s) of SNPs from NCBI's database. The documentation for BioPython tells
> me I should be using Entrez.epost for this, and then using the resulting
> search history to pull down my snps. However, I find that epost itself has a
> maxium limit to the number of rs Ids I can use in a single search, which
> roughly translates into about 700 rs Ids. Is this as intended, or am I not
> using epost correctly? If I am using epost correctly, what's the best way to
> break this up so that (a) I get my data and (b) don't overburden NCBI's
> system.
>
> Here's how I'm calling epost, mostly this is straight out of the tutorial:
>    search_results = Entrez.read(Entrez.epost(db='snp', id=id_string))
>
> Thanks for any help,
> Walter Scheper

Where is the 700 ID limit from?  I don't recall any limits on the EPost
documentation.  Was this found by experimentation?
http://www.ncbi.nlm.nih.gov/entrez/query/static/epost_help.html

Note that with such large numbers, once you have got EPost to work, you
should probably be using EFetch to download the results in batches.

This doesn't really answer your question, but it should be fairly simple to
use EPost and EFetch in batches of (say) 100 records, which avoids the
whole issue.  How big a batch size you choose will depend on how big
the records are.

As far as I know the NCBI don't give any guidance on batch sizes.
However, do make sure you run this script at the weekend or outside
normal USA working hours:
http://www.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html#UserSystemRequirements

Peter