[BioPython] Fwd: NCBI Abuse Activity with BioPython

Renato Alves rjalves at igc.gulbenkian.pt
Wed Jun 25 16:16:49 UTC 2008


you mean 3 seconds no?

Quoting Chris Fields on 06/25/2008 04:34 PM:
> Just as a note from the BioPerl side, BioPerl modules which access 
> eutils use the 3 min sleep rule, and we specify in the documentation 
> the NCBI rules.  The modules also identify the tool/agent used as 
> 'bioperl', I believe.
>
> chris
>
> On Jun 25, 2008, at 10:08 AM, Chris Dagdigian wrote:
>
>>
>> Can someone from the biopython dev team respond officially to Scott 
>> please?
>>
>> Regards,
>> Chris
>>
>>
>> Begin forwarded message:
>>
>>> From: "Mcginnis, Scott (NIH/NLM/NCBI) [E]" <mcginnis at ncbi.nlm.nih.gov>
>>> Date: June 25, 2008 10:54:28 AM EDT
>>> To: <biopython-owner at lists.open-bio.org>
>>> Subject: NCBI Abuse Activity with BioPython
>>>
>>> Dear Colleague:
>>>
>>>
>>>
>>> My name is Scott McGinnis and I am responsible for monitoring the web
>>> page at NCBI and blocking users with excessive access.
>>>
>>>
>>>
>>> I am seeing more and more activity with BioPython and it is us concern.
>>> Mainly the BioPython suite does not appear to be written to the
>>> recommendations made on the main NCBI E-utilities web page
>>> (http://eutils.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html).Pr 
>>>
>>> inciply the following are not being done by BioPython tools.
>>>
>>>
>>>
>>> *  Send E-utilities requests to http://eutils.ncbi.nlm.nih.gov
>>> <http://eutils.ncbi.nlm.nih.gov/> , not the standard NCBI Web address.
>>>
>>> *  Make no more than one request every 3 seconds.
>>>
>>>
>>>
>>> In fact I recently cc'd you on an event when a user was coming in at
>>> over 18 requests per second. We really wish that you would alter you
>>> scripts to run with a some sort of sleep in it in order to not send
>>> requests more than once per 3 seconds and to not send these to the main
>>> www web servers but use the  http://eutils.ncbi.nlm.nih.gov
>>> <http://eutils.ncbi.nlm.nih.gov/> .
>>>
>>>
>>>
>>> Also, there is the problem of huge searches in order to build local
>>> databases. With you package it seems that if one were so inclined you
>>> would send a search for all human sequences (over 10,000,000 sequences)
>>> and you program would then retrieve these one ID at a time. Regardless
>>> of the fact that this is an extreme example, we would much prefer if
>>> your program could webenv from the Esearch  and  use the search history
>>> and webenv to retrieve sets of sequences at 200 - 200 at a time.
>>>
>>>
>>>
>>> History: Requests utility to maintain results in user's environment.
>>> Used in conjunction with WebEnv.
>>>
>>> usehistory=y
>>>
>>> Web Environment: Value previously returned in XML results from ESearch
>>> or EPost. This value may change with each utility call. If WebEnv is
>>> used, History search numbers can be included in an ESummary URL, e.g.,
>>> term=cancer+AND+%23X (where %23 replaces # and X is the History search
>>> number).
>>>
>>> Note: WebEnv is similar to the cookie that is set on a user's computers
>>> when accessing PubMed on the web.  If the parameter usehistory=y is
>>> included in an ESearch URL both a WebEnv (cookie string) and query_key
>>> (history number) values will be returned in the results. Rather than
>>> using the retrieved PMIDs in an ESummary or EFetch URL you may simply
>>> use the WebEnv and query_key values to retrieve the records. WebEnv 
>>> will
>>> change for each ESearch query, but a sample URL would be as follows:
>>>
>>> http://www.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed
>>> &WebEnv=%3D%5DzU%5D%3FIJIj%3CC%5E%5DA%3CT%5DEACgdn%3DF%5E%3Eh
>>> GFA%5D%3CIFKGCbQkA%5E_hDFiFd%5C%3D
>>> &query_key=6&retmode=html&rettype=medline&retmax=15
>>>
>>> WebEnv=WgHmIcDG]B etc.
>>>
>>> Display Numbers:
>>>
>>> retstart=x  (x= sequential number of the first record retrieved -
>>> default=0 which will retrieve the first record)
>>> retmax=y  (y= number of items retrieved)
>>>
>>>
>>>
>>> Otherwise we will end up blocking more of your users which we are
>>> unfortunately already doing in some cases.
>>>
>>>
>>>
>>> Sincerely,
>>> Scott D. McGinnis, M.S.
>>> DHHS/NIH/NLM/NCBI
>>> www.ncbi.nlm.nih.gov
>>>
>>>
>>>
>>
>> _______________________________________________
>> BioPython mailing list  -  BioPython at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biopython
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Marie-Claude Hofmann
> College of Veterinary Medicine
> University of Illinois Urbana-Champaign
>
>
>
>
> _______________________________________________
> BioPython mailing list  -  BioPython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython



More information about the Biopython mailing list