[Bioperl-l] Bio::DB::Query::GenBank

Marc Logghe Marc.Logghe at devgen.com
Mon Nov 29 09:17:55 EST 2004


Hi,
I think you will always bump into that limit; it is the limit ncbi is using with efetch.
I don't know how it is internally done by Bio::DB::Query::GenBank but it should go via a 2 step process:
1) you perform a query and you get a webenv and query key back
2) you fetch your sequences by passing your webenv and query key and explicitely requesting your record numbers in chunks of 500.
I also never succeeded in fetching more that 500 sequences with Bio::DB::Query::GenBank.
I am currently using a non bioperl script based on http://eutils.ncbi.nlm.nih.gov/entrez/query/static/eutils_example.pl.
NCBI also asks to run these kind of queries at night EST, in the weekend and with a sleep of at least 5 sec between every fetch of 500 records.

HTH,
Marc

> -----Original Message-----
> From: Aaron J. Mackey [mailto:amackey at pcbi.upenn.edu]
> Sent: Monday, November 29, 2004 2:59 PM
> To: Wuming Gong
> Cc: Bioperl-l at portal.open-bio.org
> Subject: Re: [Bioperl-l] Bio::DB::Query::GenBank
> 
> 
> 
> If you try again late at night (meaning late at night EST), 
> you may get 
> all 5000 hits; NCBI seems to have implemented a limit of 500 
> entries in 
> batch retrieval when network load is already high, but you may be 
> successful during non-peak hours ...
> 
> -Aaron
> 
> On Nov 29, 2004, at 4:26 AM, Wuming Gong wrote:
> 
> > Hi Mona,
> >
> > I have met the same kind of problem. You may pull down the sequences
> > once by less than 500 and It works.
> >
> > Wuming
> >
> >
> > On Thu, 04 Nov 2004 21:12:40 -0700, Ligia Mateiu 
> <lmateiu at ualberta.ca> 
> > wrote:
> >> Hi all,
> >> I used a query for which exists >5000 hits in Genbank, but my code
> >> retrieved just the very fist 500.
> >>
> >> Any idea why?
> >>
> >> Thanks a lot,
> >> Mona
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at portal.open-bio.org
> >> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >>
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> --
> Aaron J. Mackey, Ph.D.
> Dept. of Biology, Goddard 212
> University of Pennsylvania       email:  amackey at pcbi.upenn.edu
> 415 S. University Avenue         office: 215-898-1205
> Philadelphia, PA  19104-6017     fax:    215-746-6697
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> 



More information about the Bioperl-l mailing list