[BioPython] Fwd: NCBI Abuse Activity with BioPython

Chris Fields cjfields at uiuc.edu
Wed Jun 25 15:34:34 UTC 2008


Just as a note from the BioPerl side, BioPerl modules which access  
eutils use the 3 min sleep rule, and we specify in the documentation  
the NCBI rules.  The modules also identify the tool/agent used as  
'bioperl', I believe.

chris

On Jun 25, 2008, at 10:08 AM, Chris Dagdigian wrote:

>
> Can someone from the biopython dev team respond officially to Scott  
> please?
>
> Regards,
> Chris
>
>
> Begin forwarded message:
>
>> From: "Mcginnis, Scott (NIH/NLM/NCBI) [E]"  
>> <mcginnis at ncbi.nlm.nih.gov>
>> Date: June 25, 2008 10:54:28 AM EDT
>> To: <biopython-owner at lists.open-bio.org>
>> Subject: NCBI Abuse Activity with BioPython
>>
>> Dear Colleague:
>>
>>
>>
>> My name is Scott McGinnis and I am responsible for monitoring the web
>> page at NCBI and blocking users with excessive access.
>>
>>
>>
>> I am seeing more and more activity with BioPython and it is us  
>> concern.
>> Mainly the BioPython suite does not appear to be written to the
>> recommendations made on the main NCBI E-utilities web page
>> (http://eutils.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html).Pr
>> inciply the following are not being done by BioPython tools.
>>
>>
>>
>> *  Send E-utilities requests to http://eutils.ncbi.nlm.nih.gov
>> <http://eutils.ncbi.nlm.nih.gov/> , not the standard NCBI Web  
>> address.
>>
>> *  Make no more than one request every 3 seconds.
>>
>>
>>
>> In fact I recently cc'd you on an event when a user was coming in at
>> over 18 requests per second. We really wish that you would alter you
>> scripts to run with a some sort of sleep in it in order to not send
>> requests more than once per 3 seconds and to not send these to the  
>> main
>> www web servers but use the  http://eutils.ncbi.nlm.nih.gov
>> <http://eutils.ncbi.nlm.nih.gov/> .
>>
>>
>>
>> Also, there is the problem of huge searches in order to build local
>> databases. With you package it seems that if one were so inclined you
>> would send a search for all human sequences (over 10,000,000  
>> sequences)
>> and you program would then retrieve these one ID at a time.  
>> Regardless
>> of the fact that this is an extreme example, we would much prefer if
>> your program could webenv from the Esearch  and  use the search  
>> history
>> and webenv to retrieve sets of sequences at 200 - 200 at a time.
>>
>>
>>
>> History: Requests utility to maintain results in user's environment.
>> Used in conjunction with WebEnv.
>>
>> usehistory=y
>>
>> Web Environment: Value previously returned in XML results from  
>> ESearch
>> or EPost. This value may change with each utility call. If WebEnv is
>> used, History search numbers can be included in an ESummary URL,  
>> e.g.,
>> term=cancer+AND+%23X (where %23 replaces # and X is the History  
>> search
>> number).
>>
>> Note: WebEnv is similar to the cookie that is set on a user's  
>> computers
>> when accessing PubMed on the web.  If the parameter usehistory=y is
>> included in an ESearch URL both a WebEnv (cookie string) and  
>> query_key
>> (history number) values will be returned in the results. Rather than
>> using the retrieved PMIDs in an ESummary or EFetch URL you may simply
>> use the WebEnv and query_key values to retrieve the records. WebEnv  
>> will
>> change for each ESearch query, but a sample URL would be as follows:
>>
>> http://www.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed
>> &WebEnv=%3D%5DzU%5D%3FIJIj%3CC%5E%5DA%3CT%5DEACgdn%3DF%5E%3Eh
>> GFA%5D%3CIFKGCbQkA%5E_hDFiFd%5C%3D
>> &query_key=6&retmode=html&rettype=medline&retmax=15
>>
>> WebEnv=WgHmIcDG]B etc.
>>
>> Display Numbers:
>>
>> retstart=x  (x= sequential number of the first record retrieved -
>> default=0 which will retrieve the first record)
>> retmax=y  (y= number of items retrieved)
>>
>>
>>
>> Otherwise we will end up blocking more of your users which we are
>> unfortunately already doing in some cases.
>>
>>
>>
>> Sincerely,
>> Scott D. McGinnis, M.S.
>> DHHS/NIH/NLM/NCBI
>> www.ncbi.nlm.nih.gov
>>
>>
>>
>
> _______________________________________________
> BioPython mailing list  -  BioPython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Marie-Claude Hofmann
College of Veterinary Medicine
University of Illinois Urbana-Champaign







More information about the Biopython mailing list