[Bioperl-l] NCBI GenBank web retrieval

Jason Stajich jason@cgt.mc.duke.edu
Sun, 20 Jan 2002 22:40:10 -0500 (EST)


Josiah -

I'm sure Lincoln will have a fix in there at some point - In the meantime
you can use our (bioperl) Bio::DB::GenBank module which achieves the same
end and uses the updated NCBI cgis - try the bioperl 0.9.3 developer
release ftp://bioperl.org/pub/DIST/bioperl-0.9.3.tar.gz .

By 2-step I mean - using the newish batchentrez returns a list of hits for
given accession/GIs so one has to parse this list and reissue requests to
Entrez (and not through batch entrez, but rather then standard entrez).
So it appears non of the advantages in using batch entrez [requesting a
large list of sequences and get the data back in one request] are still
there.  NCBI appears to support using the standard Entrez for all requests
so Batch mode may end up phasing out unless we want to put a layer in
there that recogizes "too many requests" errors and loops through a list
of accessions.  We may run into URL size limits with the GET requests so
the ability to request huge lists of accessions may be moot.  It is
still under investigation as to what the NCBI srvs will support.

The method get_Seq_by_acc() will take as input a single accession or array
ref of accessions so the basic functionality of batch requests is
supported with these.


My suggestion: either download nr/nt if you just want sequence and use
Bio::Index::Fasta or look into downloading genbank for the division you
need and index/retrieve with Bio::Index::GenBank.  If this is not possible
due to local disk space/bandwidth limitations some delayed request
strategies and caching will likely be your best bet.  Hopefully after the
hackathon next week we will have some nice sequence caching
implementations which can help.

-jason

On Sun, 20 Jan 2002, Josiah Altschuler wrote:

> > Hi.  Beginner here.  I was using Boulder 1.25 and my program mysteriously
> > stopped being able to access Genbank through queries.  I guess this is
> > because /cgi-bin/Entrez/qserver.cgi/result is disabled, as Jason Stajich
> > is saying?
> > Any advice on what I can do to fix this, and what do you mean by a 2-step
> > query?
> > Thanks,
> > Josiah
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>

-- 
Jason Stajich
Duke University
jason@cgt.mc.duke.edu