[Bioperl-l] GI identifier missing when using Bio::Index::GenBank?
Brian Osborne
osborne1 at optonline.net
Thu Apr 27 14:04:18 EDT 2006
Todd,
No, I don't think so, I think this is a bug. Can you put this into Bugzilla
along with that Genbank file, CJ521890, that shows it? Then I'll take a
closer look...
Brian O.
On 4/27/06 1:56 PM, "Todd Richmond" <richmond.todd at gmail.com> wrote:
> I could, but I don't want to store all that information. For instance,
> in the past two weeks, 387000 plant sequences have been added to
> GenBank. I'm interested in storing complete information for the ~600
> sequences from that set that are related to the gene families I'm
> interested in.
>
> I can certainly come up with a workaround myself by implementing a
> hash of accession/gi numbers or modifiying the load script supplied by
> bioperl to accept a list of accession numbers as a filter. I was just
> wondering if I'm missing something obvious...
>
> Todd
>
>
> On 4/27/06, Brian Osborne <osborne1 at optonline.net> wrote:
>> Todd,
>>
>> Can't you go directly from the daily update to the database?
>>
>> Brian O.
>>
>>
>> On 4/26/06 9:47 PM, "Todd Richmond" <richmond.todd at gmail.com> wrote:
>>
>>> I've got an application where I grab the daily updates from NCBI, pull
>>> out just the plant sequences and store them in a separate flat file.
>>> Then I use Bio::Index::GenBank to index the plant flat file so I can
>>> pull out my sequences of interest. I'm in the midst of converting my
>>> scripts to using bioperl-db/biosql so I can push those sequences into
>>> the database. The problem is that the NCBI GI identifier isn't
>>> returned when using the index file.
>>>
>>> When I run the following test script:
>>> ***
>>> use Bio::Index::GenBank;
>>> use Bio::SeqIO;
>>> use strict;
>>> my $Index_File_Name = 'nc0425.idx';
>>> my $inx = Bio::Index::GenBank->new('-filename' => $Index_File_Name);
>>>
>>> my $seqio = new Bio::SeqIO( '-format' => 'genbank' );
>>> my $seq = $inx->get_Seq_by_acc('CJ521890');
>>> $seqio->write_seq($seq);
>>> ***
>>>
>>> Diffing to the original GenBank record, the only difference is the GI
>>> identifier:
>>>
>>> diff CJ521890_orig.out CJ521890_seqio.out
>>> 5c5
>>> < VERSION CJ521890.1 GI:93266243
>>> ---
>>>> VERSION CJ521890.1
>>>
>>> Is this expected behaviour? If so, is there a workaround that will
>>> allow me to retrieve the GI from the index file so I can store it in
>>> the bioentry table?
>>>
>>> Thanks, Todd
>>>
>>>
>>> --
>>> Todd Richmond
>>> richmond.todd at gmail.com
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>>
>
>
> --
> Todd Richmond
> richmond.todd at gmail.com
More information about the Bioperl-l
mailing list