[BioPython] Fwd: NCBI Abuse Activity with BioPython
Chris Fields
cjfields at uiuc.edu
Wed Jun 25 15:00:34 EDT 2008
Yes, my bad (was in a hurry).
I have heard of instances where specific users/IPs were blocked
temporarily by NCBI based on spamming, so it's best to be proactive.
chris
On Jun 25, 2008, at 11:16 AM, Renato Alves wrote:
> you mean 3 seconds no?
>
> Quoting Chris Fields on 06/25/2008 04:34 PM:
>> Just as a note from the BioPerl side, BioPerl modules which access
>> eutils use the 3 min sleep rule, and we specify in the
>> documentation the NCBI rules. The modules also identify the tool/
>> agent used as 'bioperl', I believe.
>>
>> chris
>>
>> On Jun 25, 2008, at 10:08 AM, Chris Dagdigian wrote:
>>
>>>
>>> Can someone from the biopython dev team respond officially to
>>> Scott please?
>>>
>>> Regards,
>>> Chris
>>>
>>>
>>> Begin forwarded message:
>>>
>>>> From: "Mcginnis, Scott (NIH/NLM/NCBI) [E]" <mcginnis at ncbi.nlm.nih.gov
>>>> >
>>>> Date: June 25, 2008 10:54:28 AM EDT
>>>> To: <biopython-owner at lists.open-bio.org>
>>>> Subject: NCBI Abuse Activity with BioPython
>>>>
>>>> Dear Colleague:
>>>>
>>>>
>>>>
>>>> My name is Scott McGinnis and I am responsible for monitoring the
>>>> web
>>>> page at NCBI and blocking users with excessive access.
>>>>
>>>>
>>>>
>>>> I am seeing more and more activity with BioPython and it is us
>>>> concern.
>>>> Mainly the BioPython suite does not appear to be written to the
>>>> recommendations made on the main NCBI E-utilities web page
>>>> (http://eutils.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html).Pr
>>>> inciply the following are not being done by BioPython tools.
>>>>
>>>>
>>>>
>>>> * Send E-utilities requests to http://eutils.ncbi.nlm.nih.gov
>>>> <http://eutils.ncbi.nlm.nih.gov/> , not the standard NCBI Web
>>>> address.
>>>>
>>>> * Make no more than one request every 3 seconds.
>>>>
>>>>
>>>>
>>>> In fact I recently cc'd you on an event when a user was coming in
>>>> at
>>>> over 18 requests per second. We really wish that you would alter
>>>> you
>>>> scripts to run with a some sort of sleep in it in order to not send
>>>> requests more than once per 3 seconds and to not send these to
>>>> the main
>>>> www web servers but use the http://eutils.ncbi.nlm.nih.gov
>>>> <http://eutils.ncbi.nlm.nih.gov/> .
>>>>
>>>>
>>>>
>>>> Also, there is the problem of huge searches in order to build local
>>>> databases. With you package it seems that if one were so inclined
>>>> you
>>>> would send a search for all human sequences (over 10,000,000
>>>> sequences)
>>>> and you program would then retrieve these one ID at a time.
>>>> Regardless
>>>> of the fact that this is an extreme example, we would much prefer
>>>> if
>>>> your program could webenv from the Esearch and use the search
>>>> history
>>>> and webenv to retrieve sets of sequences at 200 - 200 at a time.
>>>>
>>>>
>>>>
>>>> History: Requests utility to maintain results in user's
>>>> environment.
>>>> Used in conjunction with WebEnv.
>>>>
>>>> usehistory=y
>>>>
>>>> Web Environment: Value previously returned in XML results from
>>>> ESearch
>>>> or EPost. This value may change with each utility call. If WebEnv
>>>> is
>>>> used, History search numbers can be included in an ESummary URL,
>>>> e.g.,
>>>> term=cancer+AND+%23X (where %23 replaces # and X is the History
>>>> search
>>>> number).
>>>>
>>>> Note: WebEnv is similar to the cookie that is set on a user's
>>>> computers
>>>> when accessing PubMed on the web. If the parameter usehistory=y is
>>>> included in an ESearch URL both a WebEnv (cookie string) and
>>>> query_key
>>>> (history number) values will be returned in the results. Rather
>>>> than
>>>> using the retrieved PMIDs in an ESummary or EFetch URL you may
>>>> simply
>>>> use the WebEnv and query_key values to retrieve the records.
>>>> WebEnv will
>>>> change for each ESearch query, but a sample URL would be as
>>>> follows:
>>>>
>>>> http://www.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed
>>>> &WebEnv=%3D%5DzU%5D%3FIJIj%3CC%5E%5DA%3CT%5DEACgdn%3DF%5E%3Eh
>>>> GFA%5D%3CIFKGCbQkA%5E_hDFiFd%5C%3D
>>>> &query_key=6&retmode=html&rettype=medline&retmax=15
>>>>
>>>> WebEnv=WgHmIcDG]B etc.
>>>>
>>>> Display Numbers:
>>>>
>>>> retstart=x (x= sequential number of the first record retrieved -
>>>> default=0 which will retrieve the first record)
>>>> retmax=y (y= number of items retrieved)
>>>>
>>>>
>>>>
>>>> Otherwise we will end up blocking more of your users which we are
>>>> unfortunately already doing in some cases.
>>>>
>>>>
>>>>
>>>> Sincerely,
>>>> Scott D. McGinnis, M.S.
>>>> DHHS/NIH/NLM/NCBI
>>>> www.ncbi.nlm.nih.gov
>>>>
>>>>
>>>>
>>>
>>> _______________________________________________
>>> BioPython mailing list - BioPython at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biopython
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Marie-Claude Hofmann
>> College of Veterinary Medicine
>> University of Illinois Urbana-Champaign
>>
>>
>>
>>
>> _______________________________________________
>> BioPython mailing list - BioPython at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biopython
Christopher Fields
Postdoctoral Researcher
Lab of Dr. Marie-Claude Hofmann
College of Veterinary Medicine
University of Illinois Urbana-Champaign
More information about the BioPython
mailing list