[Biopython-dev] NCBI E-utility 100 requests rule in Bio.Entrez?

Michiel de Hoon mjldehoon at yahoo.com
Thu Mar 25 21:37:26 EDT 2010


I have no objections, but basically I think that this can be left to the responsibility of the end user.

--Michiel.

--- On Thu, 3/25/10, Peter <biopython at maubp.freeserve.co.uk> wrote:

> From: Peter <biopython at maubp.freeserve.co.uk>
> Subject: NCBI E-utility 100 requests rule in Bio.Entrez?
> To: "Biopython-Dev Mailing List" <biopython-dev at biopython.org>, "Michiel de Hoon" <mjldehoon at yahoo.com>
> Date: Thursday, March 25, 2010, 12:25 PM
> Hi all,
> 
> The NCBI recently announced revised guidlines for the
> Entrez
> utilities, which we've started discussing on the OBF
> mailing list:
> http://lists.open-bio.org/pipermail/biopython-dev/2010-March/007499.html
> http://lists.open-bio.org/pipermail/open-bio-l/2010-March/000623.html
> 
> As part of this I decided to look at the peak hour rules:
> http://lists.open-bio.org/pipermail/open-bio-l/2010-March/000644.html
> 
> The old guideline was:
> 
> http://www.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html#UserSystemRequirements
> "Run retrieval scripts on weekends or between 9 pm and 5
> am
> Eastern Time weekdays for any series of more than 100
> requests."
> 
> This doesn't define a series - for example, would it be OK
> to run
> a script making 75 requests every two hours? This could be
> regarded
> as multiple separate series each under 100 requests, but
> the
> cumulative count over the 8 peak hours is 600 requests.
> 
> Sadly the new guidelines are even more vague:
> 
> http://www.ncbi.nlm.nih.gov/bookshelf/br.fcgi?book=helpeutils∂=chapter2#chapter2.Usage_Guidelines_and_Requiremen
> "... and limit large jobs to either weekends or between
> 9:00 PM
> and 5:00 AM Eastern time during weekdays."
> 
> Not very helpful.
> 
> Also neither version raises the issue of summer/winter
> time
> (daylight savings times) but simply gives Eastern Time
> (EST).
> 
> While we may get clarification from the NCBI, the following
> patch
> to Bio.Entrez may be worth considering. It simply counts
> the
> number of Entrez requestes during peak hours, and issues a
> warning if this exceeds 100 (based on a strict
> interpretation of
> the older guidelines).
> 
> Does this seem worth checking in, or should we try to get
> some
> clarification from the NCBI first?
> 
> Peter
> 
> diff --git a/Bio/Entrez/__init__.py
> b/Bio/Entrez/__init__.py
> index 33d8d14..f670354 100644
> --- a/Bio/Entrez/__init__.py
> +++ b/Bio/Entrez/__init__.py
> @@ -285,6 +285,26 @@ def _open(cgi, params={},
> post=False):
>         _open.previous = current + wait
>     else:
>         _open.previous = current
> +
> +    # Max 100 requests from 09:00 to 17:00 Eastern Time
> (EST), which is
> +    # 5 hours behind Coordinated Universal Time (UTC)
> aka Greenwich
> +    # Mean Time (GMT), thus 14:00 to 22:00 UTC/GMT. The
> NCBI don't
> +    # mention summer/winter time (daylight saving time),
> so ignore that.
> +    if 14 <= time.gmtime(current).tm_hour < 22 \
> +    and time.gmtime(current).tm_wday <= 5:
> +        # Peak time (Monday = 0, Friday = 5)
> +        _open.peak_requests += 1
> +        if _open.peak_requests > 100:
> +            import warnings
> +            warnings.warn("The NCBI request you make
> at most 100 Entrez "
> +                          "requests during
> the peak time 9AM to 5PM EST "
> +                          "(which is 14:00 to
> 22:00 UTC/GMT). "
> +                          "You have exceeded
> this limit.")
> +    else:
> +        # Off peak
> +        # Reset the counter (in case this is a long
> running script)
> +        _open.peak_requests = 0
> +
>     # Remove None values from the parameters
>     for key, value in params.items():
>         if value is None:
> @@ -368,3 +388,4 @@ E-utilities.""", UserWarning)
>     return uhandle
> 
> _open.previous = 0
> +_open.peak_requests = 0
> 


      



More information about the Biopython-dev mailing list