[Bioperl-l] GI identifier missing when using Bio::Index::GenBank?

Todd Richmond richmond.todd at gmail.com
Thu Apr 27 17:56:31 UTC 2006


I could, but I don't want to store all that information. For instance,
in the past two weeks, 387000 plant sequences have been added to
GenBank. I'm interested in storing complete information for the ~600
sequences from that set that are related to the gene families I'm
interested in.

I can certainly come up with a workaround myself by implementing a
hash of accession/gi numbers or modifiying the load script supplied by
bioperl to accept a list of accession numbers as a filter. I was just
wondering if I'm missing something obvious...

Todd


On 4/27/06, Brian Osborne <osborne1 at optonline.net> wrote:
> Todd,
>
> Can't you go directly from the daily update to the database?
>
> Brian O.
>
>
> On 4/26/06 9:47 PM, "Todd Richmond" <richmond.todd at gmail.com> wrote:
>
> > I've got an application where I grab the daily updates from NCBI, pull
> > out just the plant sequences and store them in a separate flat file.
> > Then I use Bio::Index::GenBank to index the plant flat file so I can
> > pull out my sequences of interest. I'm in the midst of converting my
> > scripts to using bioperl-db/biosql so I can push those sequences into
> > the database. The problem is that the NCBI GI identifier isn't
> > returned when using the index file.
> >
> > When I run the following test script:
> > ***
> > use Bio::Index::GenBank;
> > use Bio::SeqIO;
> > use strict;
> > my $Index_File_Name = 'nc0425.idx';
> > my $inx = Bio::Index::GenBank->new('-filename' => $Index_File_Name);
> >
> > my $seqio = new Bio::SeqIO( '-format' => 'genbank' );
> > my $seq = $inx->get_Seq_by_acc('CJ521890');
> > $seqio->write_seq($seq);
> > ***
> >
> > Diffing to the original GenBank record, the only difference is the GI
> > identifier:
> >
> > diff CJ521890_orig.out CJ521890_seqio.out
> > 5c5
> > < VERSION     CJ521890.1  GI:93266243
> > ---
> >> VERSION     CJ521890.1
> >
> > Is this expected behaviour? If so, is there a workaround that will
> > allow me to retrieve the GI from the index file so I can store it in
> > the bioentry table?
> >
> > Thanks, Todd
> >
> >
> > --
> > Todd Richmond
> > richmond.todd at gmail.com
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>


--
Todd Richmond
richmond.todd at gmail.com




More information about the Bioperl-l mailing list