[Bioperl-l] NCBI eutils

Lincoln Stein lstein@cshl.org
Sun, 01 Dec 2002 15:19:49 -0500


Hi Jim,

We're very nearly ready to release a version of bioperl that uses the NCBI 
eutils (esearch, epost and efetch) and enforces the 3s delay that you folks 
recommend.  The only problem is that in recent days equery/efetch have become 
very unreliable.  The symptom is this:

	1) formulate a query and submit it to equery.
	2) recover the WebEnv and QueryKey fields
	3) use these fields in a request to efetch
	4) efetch returns an OK code, but empty content

Here is an example:

	GET http://www.ncbi.nih.gov/entrez/eutils/esearch.fcgi?db=nucleotide&
		mindate=1%2F1%2F2002&maxdate=1%2F30%2F2002&datetype=mdat&
		usehistory=y&tool=bioperl&term=Onchocerca+volvulus[Organism]&retmax=73

The response looks OK and returns:

<?xml version="1.0"?>
<!DOCTYPE eSearchResult PUBLIC "-//NLM//DTD eSearchResult, 11 May 2002//EN" 
"http://www.ncbi.nlm.nih.gov/entrez/query/DTD/eSearch_020511.dtd">
<eSearchResult>
	<Count>731</Count>
	<RetMax>100</RetMax>
	<RetStart>0</RetStart>
	<QueryKey>1</QueryKey>
	
<WebEnv>M%40uXFBO%5DBt%5E%5ECD_AN%3C%3C%3CcBD%3F%60%5C%3C%40%3Ch%40%5CHfC%3F%3Do%3C%3EE%3FjBAeI%3EKF%5E%3CD</WebEnv>
	<IdList>
		<Id>5835443</Id>
		<Id>12005979</Id>
		<Id>12005977</Id>
		...
	</IdList>
	<TranslationSet>
		<Translation>
			<From>Onchocerca+volvulus%5BOrganism%5D</From>
			<To>%22Onchocerca+volvulus%22%5BOrganism%5D</To>
		</Translation>
	</TranslationSet>
	<TranslationStack>
		<TermSet>
			<Term>"Onchocerca volvulus"[Organism]</Term>
			<Field>Organism</Field>
			<Count>15432</Count>
			<Explode>Y</Explode>
		</TermSet>
		<TermSet>
			<Term>1/1/2002[mdat]</Term>
			<Field>mdat</Field>
			<Count>-1</Count>
			<Explode>Y</Explode>
		</TermSet>
		<TermSet>
			<Term>1/30/2002[mdat]</Term>
			<Field>mdat</Field>
			<Count>-1</Count>
			<Explode>Y</Explode>
		</TermSet>
		<OP>RANGE</OP>
		<OP>AND</OP>
	</TranslationStack>
</eSearchResult>

Now when I run the fetch on the indicated WebEnv, I get

GET 
'http://www.ncbi.nih.gov/entrez/eutils/efetch.fcgi?rettype=gb&db=nucleotide&query_key=1&
	tool=bioperl&retmode=text&	
WebEnv=Oc[%40A%5ECE%5C]B_EJ%3CIiF_B%5CF%40dGheCJGYkka%40A]Kj%3F%5ECJIkJ%3ELKAa%3C%3D&
usehistory=y'

Connection: close
Date: Sun, 01 Dec 2002 20:18:38 GMT
Via: 1.1 www.ncbi.nih.gov
Server: Apache
Content-Type: text/plain
Client-Date: Sun, 01 Dec 2002 20:18:38 GMT
Client-Response-Num: 1
Client-Transfer-Encoding: chunked
Set-Cookie: WebEnv=p=KP_FeGB>ffADfECcNbIIcYchi^FIzb@G?`FkDCAj=DIcD>KFd\E; 
domain=.nlm.nih.gov; path=/; expires=Sun, 01-Dec-2002 21:18:41 GMT
X-Cache: MISS from www.ncbi.nih.gov

There is supposed to be some content following the headers, but it looks to me 
as though the NCBI server crashed.

Note that this isn't always the case.  About one time out of five it works.  
Last week, this was working 100% of the time, but it started to get flaky 
over thanksgiving.

Lincoln

-- 
Lincoln Stein
lstein@cshl.org