[Biopython-dev] NCBI E-utility 100 requests rule in Bio.Entrez?

Peter biopython at maubp.freeserve.co.uk
Thu Mar 25 12:25:01 EDT 2010


Hi all,

The NCBI recently announced revised guidlines for the Entrez
utilities, which we've started discussing on the OBF mailing list:
http://lists.open-bio.org/pipermail/biopython-dev/2010-March/007499.html
http://lists.open-bio.org/pipermail/open-bio-l/2010-March/000623.html

As part of this I decided to look at the peak hour rules:
http://lists.open-bio.org/pipermail/open-bio-l/2010-March/000644.html

The old guideline was:

http://www.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html#UserSystemRequirements
"Run retrieval scripts on weekends or between 9 pm and 5 am
Eastern Time weekdays for any series of more than 100 requests."

This doesn't define a series - for example, would it be OK to run
a script making 75 requests every two hours? This could be regarded
as multiple separate series each under 100 requests, but the
cumulative count over the 8 peak hours is 600 requests.

Sadly the new guidelines are even more vague:

http://www.ncbi.nlm.nih.gov/bookshelf/br.fcgi?book=helpeutils&part=chapter2#chapter2.Usage_Guidelines_and_Requiremen
"... and limit large jobs to either weekends or between 9:00 PM
and 5:00 AM Eastern time during weekdays."

Not very helpful.

Also neither version raises the issue of summer/winter time
(daylight savings times) but simply gives Eastern Time (EST).

While we may get clarification from the NCBI, the following patch
to Bio.Entrez may be worth considering. It simply counts the
number of Entrez requestes during peak hours, and issues a
warning if this exceeds 100 (based on a strict interpretation of
the older guidelines).

Does this seem worth checking in, or should we try to get some
clarification from the NCBI first?

Peter

diff --git a/Bio/Entrez/__init__.py b/Bio/Entrez/__init__.py
index 33d8d14..f670354 100644
--- a/Bio/Entrez/__init__.py
+++ b/Bio/Entrez/__init__.py
@@ -285,6 +285,26 @@ def _open(cgi, params={}, post=False):
        _open.previous = current + wait
    else:
        _open.previous = current
+
+    # Max 100 requests from 09:00 to 17:00 Eastern Time (EST), which is
+    # 5 hours behind Coordinated Universal Time (UTC) aka Greenwich
+    # Mean Time (GMT), thus 14:00 to 22:00 UTC/GMT. The NCBI don't
+    # mention summer/winter time (daylight saving time), so ignore that.
+    if 14 <= time.gmtime(current).tm_hour < 22 \
+    and time.gmtime(current).tm_wday <= 5:
+        # Peak time (Monday = 0, Friday = 5)
+        _open.peak_requests += 1
+        if _open.peak_requests > 100:
+            import warnings
+            warnings.warn("The NCBI request you make at most 100 Entrez "
+                          "requests during the peak time 9AM to 5PM EST "
+                          "(which is 14:00 to 22:00 UTC/GMT). "
+                          "You have exceeded this limit.")
+    else:
+        # Off peak
+        # Reset the counter (in case this is a long running script)
+        _open.peak_requests = 0
+
    # Remove None values from the parameters
    for key, value in params.items():
        if value is None:
@@ -368,3 +388,4 @@ E-utilities.""", UserWarning)
    return uhandle

_open.previous = 0
+_open.peak_requests = 0



More information about the Biopython-dev mailing list