[Bioperl-l] Bio::DB::GenBank batch retrieval question

CHALFANT_CHRIS_M@Lilly.com CHALFANT_CHRIS_M@Lilly.com
Tue, 23 Apr 2002 13:57:23 -0500


When I use Bio::DB::GenBank::get_Stream_by_acc to retrieve a set of 
accession numbers and include an invalid accession number in with several 
valid accession numbers, I get no sequences back from Entrez.  For 
example, the code below returns no output (though I get output if I remove 
the "bogus" accession).

Is this the expected behavior or am I using the code incorrectly?  If this 
is the correct behavior, how would you suggest requesting a batch of 
genbank records for a list which may include invalid (or missing) 
accession numbers?  I am considering a "divide-and-conquer strategy": 
spliting the list in half and recursively requesting each half until I 
find the offending ID, but I am really trying to minimize the HTTP 
requests.

As an alternative, I considered using Bio::DB::EMBL, but this module seems 
to throw an exception ("MSG: EMBL stream with no ID. Not embl in my book") 
if the list includes invalid accessions.

CODE:

my @accessions = qw(AB000095 AB000220 bogus);
my $gb = new Bio::DB::GenBank;
my $seqio = $gb->get_Stream_by_acc(\@accessions);

while (my $record = $seqio->next_seq) {
  print $record->primary_id, "\n";
}


Chris