: [Bioperl-l] Bio::DB::Query::GenBank retrieves fewer sequences than Webbrowser query

Paulo Almeida paulo.david at netvisao.pt
Thu Mar 25 06:21:45 EST 2004



Yes, escaping characters doesn't seem to have anything to do with this
problem. I suggested it because it worked for me in a different
situation. However, if you use the count method, instead of cycling
through all the results, to get the number of returned sequences, you
get 5066, as it should be:

#!/usr/bin/perl
use strict;
use Bio::DB::GenBank;
use Bio::DB::Query::GenBank;


my $gb=new
Bio::DB::GenBank;


my $query = Bio::DB::Query::GenBank->new
(-query => 'Mus[Organism] exon NOT mRNA NOT cDNA',
-db => 'Nucleotide');

my $seqio = $gb->get_Stream_by_query($query);
print "Num results:" , $query->count , "\n";

I'm looking into it further, but I don't know what the problem could be.
-Paulo Almeida

T.D. Houfek wrote:

 >Hmm...
 >
 >Just for kicks I tried to duplicate the problem (I get the same number
 >of sequences from NCBI's web sequin tool as Jrgen, but using the
 >Bio::DB::Query:Genbank method I get 644 sequences (not less than 100,
 >but not the 5000+ we are expecting). Placing an escape backslash before
 >the brackets does not seem to help me:
 >
 >--- my test script below ---
 >
 >#!/usr/bin/perl
 >use strict;
 >use Bio::DB::GenBank;
 >use Bio::DB::Query::GenBank;
 >
 >my $gb=new
 >Bio::DB::GenBank;
 >my $query = Bio::DB::Query::GenBank->new
 > (-query => 'Mus\[Organism] AND exon NOT mRNA NOT cDNA',
 > -db => 'Nucleotide');
 >my $seqio = $gb->get_Stream_by_query($query);
 >my $numresults=0;
 >while( my $seq = $seqio->next_seq ) { $numresults++; }
 >print "Num results: $numresults\n";
 >




More information about the Bioperl-l mailing list