[Bioperl-l] Fetching > 500 sequences

martin 9904982 at sms.ed.ac.uk
Wed Mar 3 08:39:12 EST 2004


Hi, 

I've experienced a similar problem with the 500 sequence limit.  I've
not found a way round it; my solution has been to download the sequences
you want in GenBank format using SRS from the ncbi website, then open it
thus:

my $stream=Bio::SeqIO->new(-file => 'filename.genbank', -format =>
'GenBank');

#process each record in turn..

while (my $seq=$stream->next_seq()){
do_something
}

hope this helps.

Martin
On Mon, 2004-03-01 at 19:27, henrik nilsson wrote:
> Hi,
> 
> It seems that I have problems with fetching more than 500 sequences from 
> Genbank using Bioperl. It looks like the script (attached below) fetches all 
> the 7000+ sequences, but only 500 make it to the output file. Is there any 
> way to get all these 7000+ sequences written to the file - that is, is it 
> possible to sidestep the 500 seq. limit?
> 
> Thanks for your time,
> 
> Rolf
> 
> 
> 
> 
> Please find the script below. When I run it, I get
> 
> Writing accession number AJ406491
> ... etc ...
> Writing accession number AJ406489
> Writing accession number AJ406471
> Writing accession number AJ406465
> Writing accession number AJ406461
> Total number of records found = 7053
> 
> but when I type 
> 
> [rolf at localhost dir]$ cat data.gb | grep 'BASE COUNT' | wc -l
>     500
> [rolf at localhost dir]$
> 
> It is clear that only 500 seq. were written to the file.
> 
> #!/usr/bin/perl -w
> use strict;
> use Bio::DB::GenBank;
> use Bio::DB::Query::GenBank;
> use IO::String;
> use Bio::SeqIO;
> use Bio::Seq::RichSeq;
> 
> 
> my $query_string = 'Boletales';
> 
> my $query = Bio::DB::Query::GenBank->new(-db=>'nucleotide',
>                                          -query=>$query_string);
> my $out = Bio::SeqIO->new(-file=>">data.gb", -format=>'genbank');
> 
> my $count = $query->count;
> 
> my $gb = new Bio::DB::GenBank();
> 
> my $stream = $gb->get_Stream_by_query($query);
> 
> while (my $seq = $stream->next_seq) {
>         print "Writing accession number ", $seq->accession_number,"\n";
>         $out->write_seq($seq);
>         }
> 
> print "Total number of records found = $count\n";
> 
> exit;
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
-- 
Martin Jones
Blaxter Nematode Genomics Lab
ICAPB
Ashworth Labs
Kings Buildings
University of Edinburgh
Edinburgh

0131 650 6761
9904982 at sms.ed.ac.uk



More information about the Bioperl-l mailing list