[BioPython] Fwd: NCBI Abuse Activity with BioPython

Chris Fields cjfields at uiuc.edu
Wed Jun 25 15:00:34 EDT 2008


Yes, my bad (was in a hurry).

I have heard of instances where specific users/IPs were blocked  
temporarily by NCBI based on spamming, so it's best  to be proactive.

chris

On Jun 25, 2008, at 11:16 AM, Renato Alves wrote:

> you mean 3 seconds no?
>
> Quoting Chris Fields on 06/25/2008 04:34 PM:
>> Just as a note from the BioPerl side, BioPerl modules which access  
>> eutils use the 3 min sleep rule, and we specify in the  
>> documentation the NCBI rules.  The modules also identify the tool/ 
>> agent used as 'bioperl', I believe.
>>
>> chris
>>
>> On Jun 25, 2008, at 10:08 AM, Chris Dagdigian wrote:
>>
>>>
>>> Can someone from the biopython dev team respond officially to  
>>> Scott please?
>>>
>>> Regards,
>>> Chris
>>>
>>>
>>> Begin forwarded message:
>>>
>>>> From: "Mcginnis, Scott (NIH/NLM/NCBI) [E]" <mcginnis at ncbi.nlm.nih.gov 
>>>> >
>>>> Date: June 25, 2008 10:54:28 AM EDT
>>>> To: <biopython-owner at lists.open-bio.org>
>>>> Subject: NCBI Abuse Activity with BioPython
>>>>
>>>> Dear Colleague:
>>>>
>>>>
>>>>
>>>> My name is Scott McGinnis and I am responsible for monitoring the  
>>>> web
>>>> page at NCBI and blocking users with excessive access.
>>>>
>>>>
>>>>
>>>> I am seeing more and more activity with BioPython and it is us  
>>>> concern.
>>>> Mainly the BioPython suite does not appear to be written to the
>>>> recommendations made on the main NCBI E-utilities web page
>>>> (http://eutils.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html).Pr
>>>> inciply the following are not being done by BioPython tools.
>>>>
>>>>
>>>>
>>>> *  Send E-utilities requests to http://eutils.ncbi.nlm.nih.gov
>>>> <http://eutils.ncbi.nlm.nih.gov/> , not the standard NCBI Web  
>>>> address.
>>>>
>>>> *  Make no more than one request every 3 seconds.
>>>>
>>>>
>>>>
>>>> In fact I recently cc'd you on an event when a user was coming in  
>>>> at
>>>> over 18 requests per second. We really wish that you would alter  
>>>> you
>>>> scripts to run with a some sort of sleep in it in order to not send
>>>> requests more than once per 3 seconds and to not send these to  
>>>> the main
>>>> www web servers but use the  http://eutils.ncbi.nlm.nih.gov
>>>> <http://eutils.ncbi.nlm.nih.gov/> .
>>>>
>>>>
>>>>
>>>> Also, there is the problem of huge searches in order to build local
>>>> databases. With you package it seems that if one were so inclined  
>>>> you
>>>> would send a search for all human sequences (over 10,000,000  
>>>> sequences)
>>>> and you program would then retrieve these one ID at a time.  
>>>> Regardless
>>>> of the fact that this is an extreme example, we would much prefer  
>>>> if
>>>> your program could webenv from the Esearch  and  use the search  
>>>> history
>>>> and webenv to retrieve sets of sequences at 200 - 200 at a time.
>>>>
>>>>
>>>>
>>>> History: Requests utility to maintain results in user's  
>>>> environment.
>>>> Used in conjunction with WebEnv.
>>>>
>>>> usehistory=y
>>>>
>>>> Web Environment: Value previously returned in XML results from  
>>>> ESearch
>>>> or EPost. This value may change with each utility call. If WebEnv  
>>>> is
>>>> used, History search numbers can be included in an ESummary URL,  
>>>> e.g.,
>>>> term=cancer+AND+%23X (where %23 replaces # and X is the History  
>>>> search
>>>> number).
>>>>
>>>> Note: WebEnv is similar to the cookie that is set on a user's  
>>>> computers
>>>> when accessing PubMed on the web.  If the parameter usehistory=y is
>>>> included in an ESearch URL both a WebEnv (cookie string) and  
>>>> query_key
>>>> (history number) values will be returned in the results. Rather  
>>>> than
>>>> using the retrieved PMIDs in an ESummary or EFetch URL you may  
>>>> simply
>>>> use the WebEnv and query_key values to retrieve the records.  
>>>> WebEnv will
>>>> change for each ESearch query, but a sample URL would be as  
>>>> follows:
>>>>
>>>> http://www.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed
>>>> &WebEnv=%3D%5DzU%5D%3FIJIj%3CC%5E%5DA%3CT%5DEACgdn%3DF%5E%3Eh
>>>> GFA%5D%3CIFKGCbQkA%5E_hDFiFd%5C%3D
>>>> &query_key=6&retmode=html&rettype=medline&retmax=15
>>>>
>>>> WebEnv=WgHmIcDG]B etc.
>>>>
>>>> Display Numbers:
>>>>
>>>> retstart=x  (x= sequential number of the first record retrieved -
>>>> default=0 which will retrieve the first record)
>>>> retmax=y  (y= number of items retrieved)
>>>>
>>>>
>>>>
>>>> Otherwise we will end up blocking more of your users which we are
>>>> unfortunately already doing in some cases.
>>>>
>>>>
>>>>
>>>> Sincerely,
>>>> Scott D. McGinnis, M.S.
>>>> DHHS/NIH/NLM/NCBI
>>>> www.ncbi.nlm.nih.gov
>>>>
>>>>
>>>>
>>>
>>> _______________________________________________
>>> BioPython mailing list  -  BioPython at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biopython
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Marie-Claude Hofmann
>> College of Veterinary Medicine
>> University of Illinois Urbana-Champaign
>>
>>
>>
>>
>> _______________________________________________
>> BioPython mailing list  -  BioPython at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biopython

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Marie-Claude Hofmann
College of Veterinary Medicine
University of Illinois Urbana-Champaign






More information about the BioPython mailing list