[Bioperl-l] limit on accessing NCBI/GenBank

Brian Osborne brian_osborne at cognia.com
Thu Feb 13 11:58:53 EST 2003


Jason,

> But if you instead loop through multiple get_Seq_by_acc each time with a
> single ID it probably won't use the sleep mechanism properly, I'm not
sure.

It looks like the delay is attributed to the object, so:

$db = Bio::DB::GenBank->new(-delay => 10);
$seq = $db->get_Seq_by_id(2);
$seq = $db->get_Seq_by_id(3);
$seq = $db->get_Seq_by_id(4);

takes about 22 seconds, about 7 seconds without the delay. The sleep() is
not extremely precise, according to perlfunc.

Brian O.

-----Original Message-----
From: bioperl-l-bounces at bioperl.org [mailto:bioperl-l-bounces at bioperl.org]On
Behalf Of Jason Stajich
Sent: Wednesday, February 12, 2003 11:12 AM
To: Prachi Shah
Cc: bioperl
Subject: Re: [Bioperl-l] limit on accessing NCBI/GenBank

There is a built in 3sec delay in the code for multiple IDs by default.

But if you instead loop through multiple get_Seq_by_acc each time with a
single ID it probably won't use the sleep mechanism properly, I'm not sure.

If you read the code for the module Bio::DB::WebDBSeqI you can see where
this is implemented with a _sleep function by Lincoln.


 Title   : new
 Usage   : $gb = Bio::DB::GenBank->new(@options)
 Function: Creates a new genbank handle
 Returns : New genbank handle
 Args    : -delay   number of seconds to delay between fetches (3s)

NOTE:  There are other options that are used internally.  By NCBI policy,
this module introduces a 3s delay between fetches.  If you are fetching
multiple genbank ids, it is a good idea to use get


Still you can clearly abuse these modules if you choose to and NCBI can
cut off your access to their CGI scripts if they feel you are abusing the
servers.  We always reccommend that people download the data locally and
use the Bio::Index modules to index the sequence locally if you are doing
a lot of fetch requests.

Other alternatives to a local flatfile index is the BioSQL project which
allows you to put the sequence data in your own RBDMS.  Also checkout
SeqHound (which may be integrated into Bioperl one day) which provides
additional access to sequence databases which are local or remote.

myGenBank is another way to keep a local copy of genbank for your own uses
which uses a combination of a RDBMS and flatfile indexfiles.

-jason


On Wed, 12 Feb 2003, Prachi Shah wrote:

> Hi all!
>
> I have this question related to BioPerl but not about
> its implementation, bugs or problems. I know NCBI has
> a limit of one request every 3 seconds to their
> server, that includes all of Entrez and all databases.
> I had once tried to use LWP useragents directly to
> access some data from the NCBI website, and thats when
> I realised that they are very strict about not letting
> scripts overload their servers. Even if you follow
> their 3 second rule and you run it at non-peak hours,
> but if your script makes too many requests it is not
> acceptable.
>
> Does that apply to access made through BioPerl? For
> example, Bio::DB::GenBank, Bio::DB::Query::GenBank,
> etc. make queries to NCBI servers straight.
>
> thanks,
> Prachi.
>
> __________________________________________________
> Do you Yahoo!?
> Yahoo! Shopping - Send Flowers for Valentine's Day
> http://shopping.yahoo.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>

--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu
_______________________________________________
Bioperl-l mailing list
Bioperl-l at bioperl.org
http://bioperl.org/mailman/listinfo/bioperl-l




More information about the Bioperl-l mailing list