[Biopython-dev] NCBI E-utility 100 requests rule in Bio.Entrez?
Michiel de Hoon
mjldehoon at yahoo.com
Thu Mar 25 21:37:26 EDT 2010
I have no objections, but basically I think that this can be left to the responsibility of the end user.
--Michiel.
--- On Thu, 3/25/10, Peter <biopython at maubp.freeserve.co.uk> wrote:
> From: Peter <biopython at maubp.freeserve.co.uk>
> Subject: NCBI E-utility 100 requests rule in Bio.Entrez?
> To: "Biopython-Dev Mailing List" <biopython-dev at biopython.org>, "Michiel de Hoon" <mjldehoon at yahoo.com>
> Date: Thursday, March 25, 2010, 12:25 PM
> Hi all,
>
> The NCBI recently announced revised guidlines for the
> Entrez
> utilities, which we've started discussing on the OBF
> mailing list:
> http://lists.open-bio.org/pipermail/biopython-dev/2010-March/007499.html
> http://lists.open-bio.org/pipermail/open-bio-l/2010-March/000623.html
>
> As part of this I decided to look at the peak hour rules:
> http://lists.open-bio.org/pipermail/open-bio-l/2010-March/000644.html
>
> The old guideline was:
>
> http://www.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html#UserSystemRequirements
> "Run retrieval scripts on weekends or between 9 pm and 5
> am
> Eastern Time weekdays for any series of more than 100
> requests."
>
> This doesn't define a series - for example, would it be OK
> to run
> a script making 75 requests every two hours? This could be
> regarded
> as multiple separate series each under 100 requests, but
> the
> cumulative count over the 8 peak hours is 600 requests.
>
> Sadly the new guidelines are even more vague:
>
> http://www.ncbi.nlm.nih.gov/bookshelf/br.fcgi?book=helpeutils∂=chapter2#chapter2.Usage_Guidelines_and_Requiremen
> "... and limit large jobs to either weekends or between
> 9:00 PM
> and 5:00 AM Eastern time during weekdays."
>
> Not very helpful.
>
> Also neither version raises the issue of summer/winter
> time
> (daylight savings times) but simply gives Eastern Time
> (EST).
>
> While we may get clarification from the NCBI, the following
> patch
> to Bio.Entrez may be worth considering. It simply counts
> the
> number of Entrez requestes during peak hours, and issues a
> warning if this exceeds 100 (based on a strict
> interpretation of
> the older guidelines).
>
> Does this seem worth checking in, or should we try to get
> some
> clarification from the NCBI first?
>
> Peter
>
> diff --git a/Bio/Entrez/__init__.py
> b/Bio/Entrez/__init__.py
> index 33d8d14..f670354 100644
> --- a/Bio/Entrez/__init__.py
> +++ b/Bio/Entrez/__init__.py
> @@ -285,6 +285,26 @@ def _open(cgi, params={},
> post=False):
> _open.previous = current + wait
> else:
> _open.previous = current
> +
> + # Max 100 requests from 09:00 to 17:00 Eastern Time
> (EST), which is
> + # 5 hours behind Coordinated Universal Time (UTC)
> aka Greenwich
> + # Mean Time (GMT), thus 14:00 to 22:00 UTC/GMT. The
> NCBI don't
> + # mention summer/winter time (daylight saving time),
> so ignore that.
> + if 14 <= time.gmtime(current).tm_hour < 22 \
> + and time.gmtime(current).tm_wday <= 5:
> + # Peak time (Monday = 0, Friday = 5)
> + _open.peak_requests += 1
> + if _open.peak_requests > 100:
> + import warnings
> + warnings.warn("The NCBI request you make
> at most 100 Entrez "
> + "requests during
> the peak time 9AM to 5PM EST "
> + "(which is 14:00 to
> 22:00 UTC/GMT). "
> + "You have exceeded
> this limit.")
> + else:
> + # Off peak
> + # Reset the counter (in case this is a long
> running script)
> + _open.peak_requests = 0
> +
> # Remove None values from the parameters
> for key, value in params.items():
> if value is None:
> @@ -368,3 +388,4 @@ E-utilities.""", UserWarning)
> return uhandle
>
> _open.previous = 0
> +_open.peak_requests = 0
>
More information about the Biopython-dev
mailing list