[BioPython] Fwd: NCBI Abuse Activity with BioPython

Chris Dagdigian dag at sonsorol.org
Wed Jun 25 11:08:33 EDT 2008


Can someone from the biopython dev team respond officially to Scott  
please?

Regards,
Chris


Begin forwarded message:

> From: "Mcginnis, Scott (NIH/NLM/NCBI) [E]" <mcginnis at ncbi.nlm.nih.gov>
> Date: June 25, 2008 10:54:28 AM EDT
> To: <biopython-owner at lists.open-bio.org>
> Subject: NCBI Abuse Activity with BioPython
>
> Dear Colleague:
>
>
>
> My name is Scott McGinnis and I am responsible for monitoring the web
> page at NCBI and blocking users with excessive access.
>
>
>
> I am seeing more and more activity with BioPython and it is us  
> concern.
> Mainly the BioPython suite does not appear to be written to the
> recommendations made on the main NCBI E-utilities web page
> (http://eutils.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html).Pr
> inciply the following are not being done by BioPython tools.
>
>
>
> *  Send E-utilities requests to http://eutils.ncbi.nlm.nih.gov
> <http://eutils.ncbi.nlm.nih.gov/> , not the standard NCBI Web address.
>
> *  Make no more than one request every 3 seconds.
>
>
>
> In fact I recently cc'd you on an event when a user was coming in at
> over 18 requests per second. We really wish that you would alter you
> scripts to run with a some sort of sleep in it in order to not send
> requests more than once per 3 seconds and to not send these to the  
> main
> www web servers but use the  http://eutils.ncbi.nlm.nih.gov
> <http://eutils.ncbi.nlm.nih.gov/> .
>
>
>
> Also, there is the problem of huge searches in order to build local
> databases. With you package it seems that if one were so inclined you
> would send a search for all human sequences (over 10,000,000  
> sequences)
> and you program would then retrieve these one ID at a time. Regardless
> of the fact that this is an extreme example, we would much prefer if
> your program could webenv from the Esearch  and  use the search  
> history
> and webenv to retrieve sets of sequences at 200 - 200 at a time.
>
>
>
> History: Requests utility to maintain results in user's environment.
> Used in conjunction with WebEnv.
>
> usehistory=y
>
> Web Environment: Value previously returned in XML results from ESearch
> or EPost. This value may change with each utility call. If WebEnv is
> used, History search numbers can be included in an ESummary URL, e.g.,
> term=cancer+AND+%23X (where %23 replaces # and X is the History search
> number).
>
> Note: WebEnv is similar to the cookie that is set on a user's  
> computers
> when accessing PubMed on the web.  If the parameter usehistory=y is
> included in an ESearch URL both a WebEnv (cookie string) and query_key
> (history number) values will be returned in the results. Rather than
> using the retrieved PMIDs in an ESummary or EFetch URL you may simply
> use the WebEnv and query_key values to retrieve the records. WebEnv  
> will
> change for each ESearch query, but a sample URL would be as follows:
>
> http://www.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed
> &WebEnv=%3D%5DzU%5D%3FIJIj%3CC%5E%5DA%3CT%5DEACgdn%3DF%5E%3Eh
> GFA%5D%3CIFKGCbQkA%5E_hDFiFd%5C%3D
> &query_key=6&retmode=html&rettype=medline&retmax=15
>
> WebEnv=WgHmIcDG]B etc.
>
> Display Numbers:
>
> retstart=x  (x= sequential number of the first record retrieved -
> default=0 which will retrieve the first record)
> retmax=y  (y= number of items retrieved)
>
>
>
> Otherwise we will end up blocking more of your users which we are
> unfortunately already doing in some cases.
>
>
>
> Sincerely,
> Scott D. McGinnis, M.S.
> DHHS/NIH/NLM/NCBI
> www.ncbi.nlm.nih.gov
>
>
>



More information about the BioPython mailing list