[Bioperl-l] Bio::DB::Query::GenBank

Marc Logghe Marc.Logghe at devgen.com
Mon Nov 29 14:43:16 EST 2004


> Lincoln did some fixes this summer which I think did this 2 step 
> process for you in Bio::DB::GenBank (another reason we need to get 
> 1.5.0 out there for people to use).  Any chance you can try 
> the RC1 or 
> CVS live code as well to see if you are hitting the same problems.
No prob, Jason.
Everything goes fine. The query in the test script returned 14565
records as it should. The perl-live release 1.4.0 only returned 50 (!) with the exact same script as shown below.
#!/usr/bin/perl
use Bio::DB::Query::GenBank;
use Bio::DB::GenBank;

my $query_string =
'"Oryza"[Organism] AND EST[Keyword] AND "2004/10/30 15.03"[MDAT] : "2004/11/29 15.03"[MDAT]';
my $query = Bio::DB::Query::GenBank->new( -db    => 'nucleotide',
                                          -query => $query_string );
my $gb     = new Bio::DB::GenBank;
my $stream = $gb->get_Stream_by_query($query);
while ( my $seq = $stream->next_seq )
{
  # do something with the sequence object
  print $seq->accession_number, "\n";
}

HTH,
Marc





> 
> -jason
> On Nov 29, 2004, at 9:17 AM, Marc Logghe wrote:
> 
> > Hi,
> > I think you will always bump into that limit; it is the 
> limit ncbi is 
> > using with efetch.
> > I don't know how it is internally done by 
> Bio::DB::Query::GenBank but 
> > it should go via a 2 step process:
> > 1) you perform a query and you get a webenv and query key back
> > 2) you fetch your sequences by passing your webenv and 
> query key and 
> > explicitely requesting your record numbers in chunks of 500.
> > I also never succeeded in fetching more that 500 sequences with 
> > Bio::DB::Query::GenBank.
> > I am currently using a non bioperl script based on 
> > 
> http://eutils.ncbi.nlm.nih.gov/entrez/query/static/eutils_example.pl.
> > NCBI also asks to run these kind of queries at night EST, in the 
> > weekend and with a sleep of at least 5 sec between every 
> fetch of 500 
> > records.
> >
> > HTH,
> > Marc
> >
> >> -----Original Message-----
> >> From: Aaron J. Mackey [mailto:amackey at pcbi.upenn.edu]
> >> Sent: Monday, November 29, 2004 2:59 PM
> >> To: Wuming Gong
> >> Cc: Bioperl-l at portal.open-bio.org
> >> Subject: Re: [Bioperl-l] Bio::DB::Query::GenBank
> >>
> >>
> >>
> >> If you try again late at night (meaning late at night EST),
> >> you may get
> >> all 5000 hits; NCBI seems to have implemented a limit of 500
> >> entries in
> >> batch retrieval when network load is already high, but you may be
> >> successful during non-peak hours ...
> >>
> >> -Aaron
> >>
> >> On Nov 29, 2004, at 4:26 AM, Wuming Gong wrote:
> >>
> >>> Hi Mona,
> >>>
> >>> I have met the same kind of problem. You may pull down 
> the sequences
> >>> once by less than 500 and It works.
> >>>
> >>> Wuming
> >>>
> >>>
> >>> On Thu, 04 Nov 2004 21:12:40 -0700, Ligia Mateiu
> >> <lmateiu at ualberta.ca>
> >>> wrote:
> >>>> Hi all,
> >>>> I used a query for which exists >5000 hits in Genbank, 
> but my code
> >>>> retrieved just the very fist 500.
> >>>>
> >>>> Any idea why?
> >>>>
> >>>> Thanks a lot,
> >>>> Mona
> >>>>
> >>>> _______________________________________________
> >>>> Bioperl-l mailing list
> >>>> Bioperl-l at portal.open-bio.org
> >>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >>>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at portal.open-bio.org
> >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >>>
> >> --
> >> Aaron J. Mackey, Ph.D.
> >> Dept. of Biology, Goddard 212
> >> University of Pennsylvania       email:  amackey at pcbi.upenn.edu
> >> 415 S. University Avenue         office: 215-898-1205
> >> Philadelphia, PA  19104-6017     fax:    215-746-6697
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at portal.open-bio.org
> >> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> --
> Jason Stajich
> jason.stajich at duke.edu
> http://www.duke.edu/~jes12/
> 
> 



More information about the Bioperl-l mailing list