[Bioperl-l] GI identifier missing when using Bio::Index::GenBank?

Todd Richmond richmond.todd at gmail.com
Thu Apr 27 01:47:52 UTC 2006


I've got an application where I grab the daily updates from NCBI, pull
out just the plant sequences and store them in a separate flat file.
Then I use Bio::Index::GenBank to index the plant flat file so I can
pull out my sequences of interest. I'm in the midst of converting my
scripts to using bioperl-db/biosql so I can push those sequences into
the database. The problem is that the NCBI GI identifier isn't
returned when using the index file.

When I run the following test script:
***
use Bio::Index::GenBank;
use Bio::SeqIO;
use strict;
my $Index_File_Name = 'nc0425.idx';
my $inx = Bio::Index::GenBank->new('-filename' => $Index_File_Name);

my $seqio = new Bio::SeqIO( '-format' => 'genbank' );
my $seq = $inx->get_Seq_by_acc('CJ521890');
$seqio->write_seq($seq);
***

Diffing to the original GenBank record, the only difference is the GI
identifier:

diff CJ521890_orig.out CJ521890_seqio.out
5c5
< VERSION     CJ521890.1  GI:93266243
---
> VERSION     CJ521890.1

Is this expected behaviour? If so, is there a workaround that will
allow me to retrieve the GI from the index file so I can store it in
the bioentry table?

Thanks, Todd


--
Todd Richmond
richmond.todd at gmail.com




More information about the Bioperl-l mailing list