[Bioperl-l] Extracting gi no from refseq record

Hilmar Lapp hlapp at gnf.org
Fri Apr 4 12:17:53 EST 2003


On Thursday, April 3, 2003, at 03:27  PM, Hilmar Lapp wrote:

> The problem is not in SeqIO. The offending statement is in the method  
> fetch() in Bio::Index::AbstractSeq, which is the method that does the  
> actual retrieval work. The chunk of input is handed off to  
> SeqIO::genbank, which I'm sure does the right job, only to be  
> overwritten after it returns the parsed sequence object to fetch():
>
>     # we essentially assumme that the primary_id for the database
>     # is the display_id
>     $seq->primary_id($seq->display_id()) if( defined $seq && ref($seq)  
> &&
> 					     $seq->isa('Bio::PrimarySeqI') );
>
> Can anyone explain the usefulness of this statement? Otherwise we just  
> delete it and the bug is fixed. (Siddharta, as an immediate fix you  
> might just want to do this.)
>

So is everybody OK with me removing this on both the main trunk and the  
stable branch? Siddhartha, did it fix the problem?

	-hilmar


> 	-hilmar
>
> On Thursday, April 3, 2003, at 03:13  PM, Siddhartha Basu wrote:
>
>> Hi hilmar,
>>
>> Hilmar Lapp wrote:
>>> RefSeq does feature this line, at least the last time I checked. We  
>>> also do test for this being parsed correctly, namely in t/SeqIO.t.  
>>> It's not always as bad as it sometimes seems. I ran an entire  
>>> download of RefSeq a couple weeks ago and the GI# was parsed out of  
>>> every single record.
>>> Siddharta, do you run at least bioperl 1.2?
>> Yes i am running bioperl 1.2
>>
>> If you do, can you run
>>>     $ make test_SeqIO
>>> from the root directory of the bioperl distribution? Can you please  
>>> mail the output?
>>
>> No tests failed it seems. Here is the output.
>> ====================================================================== 
>> ====
>> PERL_DL_NONLAZY=1 /usr/bin/perl -Iblib/arch -Iblib/lib  
>> -I/usr/lib/perl5/5.8.0/i386-linux-thread-multi -I/usr/lib/perl5/5.8.0  
>> -e 'use Test::Harness qw(&runtests $verbose); $verbose=0; runtests  
>> @ARGV;' t/SeqIO.t
>> t/SeqIO....ok
>>         3/146 skipped:
>> All tests successful, 3 subtests skipped.
>> Files=1, Tests=146,  1 wallclock secs ( 0.88 cusr +  0.03 csys =   
>> 0.91 CPU)
>> ====================================================================== 
>> =====
>>
>> If a test fails, could you please run
>>>     $ make test_SeqIO TEST_VERBOSE=1
>>> and mail the output for the failed test(s)?
>>> If no test fails, could you email the accession# of the sequence you  
>>> retrieved and for which there was no GI#?
>>
>> The accession no i have tried is NP_031416.
>> This is a mouse refseq NP no. which i have choosen randomly from  
>> mouse refseq flat files. I have downloaded the refseq flat files that  
>> ends with gbff and gnp and then used the following code to index all  
>> of them.
>>
>> ====================================================================== 
>> =======
>> #!/usr/bin/perl -w
>>
>> use strict;
>> use Bio::Index::GenBank;
>>
>>
>> my $IdFile = "$ENV{BIOPERL_INDEX}/genebankall";
>>
>>
>> my $Idx = Bio::Index::GenBank->new( -filename => $IdFile,  
>> -write_flag=> 'WRITE');
>>
>> my $Folder = "/home/basu/scriptanalysis/Genepept";
>>
>> opendir(FL,$Folder) || die "Can't open folder:$!";
>>
>> my @Files = grep (!/^\.\.?$/,readdir(FL));
>>
>>
>> my @FullPath = ();
>>
>> foreach (@Files) { push (@FullPath, "$Folder/$_"); }
>>
>> $Idx->make_index(@FullPath);
>>
>> print "Indexed successfully\n";
>>
>> closedir(FL);
>>
>> exit;
>> ====================================================================== 
>> ===========
>>
>> Then used a simple script for fetching ......
>>
>> ====================================================================== 
>> ============
>> #!/usr/bin/perl -w
>> #
>>
>> use Bio::Index::GenBank;
>>
>> use strict;
>>
>> my $GenIndName = "$ENV{BIOPERL_INDEX}/genebankall";
>>
>> my $Idx = Bio::Index::GenBank->new(-filename => $GenIndName);
>>
>> my $Seq = $Idx->get_Seq_by_acc('NP_031416');
>>
>>
>>
>> print $Seq->primary_id(),"\n";
>> exit;
>> ====================================================================== 
>> =============
>>
>> The output is Chrna7. My wish is to get the gi no that is 6671503.
>>
>> So how do i get that value.
>>
>>
>> -siddhartha
>>
>>
>>>     -hilmar
>>
>>
>>
> -- 
> -------------------------------------------------------------
> Hilmar Lapp                            email: lapp at gnf.org
> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> -------------------------------------------------------------
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------



More information about the Bioperl-l mailing list