[Bioperl-l] NCBI GenBank web retrieval

Jason Stajich jason@cgt.mc.duke.edu
Sat, 19 Jan 2002 17:48:53 -0500 (EST)


[jason having learned way too much about how to reverse engineer CGI]

I've restored the functionality from previous versions of DB::GenBank and
DB::GenPept as we are using the new NCBI cgi /htbin-post/Entrez/query.
I was able to figure out that terms are encoded as being separated by '+'
instead of the previous ',' which had been causing only one sequence to
be retrieved.  Additionally I fixed a bug that retrieved the last rather
than the first sequence for a request that has multiple hits and use
get_Seq_by_(id|acc)

I was unable to reactivate access to Batch entrez through
/entrez/batchentrez.cgi as that only seems to return an HTML table and I
am trying to avoid the 2-step query process at this time.  I attempted to
mimic Lincoln's functionality in Boulder::Genbank here, but alas it
appears that the previous /cgi-bin/Entrez/qserver.cgi/result is disabled.
Lincoln - I believe this breaks Boulder 1.24 Entrez access as well.  I
guess we can go to a 2-step retrieval by parsing HTML if people are
interested.

Are there limits to size of URLs ?  I thought there might be which could
be a problem since the requests are sent as GETs not POSTs.  Otherwise we
basically have batch entrez functionality back in.

(Roger this is essentially the fix we talked about - as best as I can
solve it so you can take it off your queue unless you've got ideas)

-jason


-- 
Jason Stajich
Duke University
jason@cgt.mc.duke.edu