[Bioperl-l] get_sequence - acc does not exist

Ewan Birney birney at ebi.ac.uk
Wed Aug 31 08:50:59 EDT 2005



Paul G Cantalupo wrote:
> Hello,
> 
> I discovered that Bio::Perl get_sequence does not handle Genbank GI 
> numbers properly due to the following code in get_sequence:
> 
>    if( $identifier =~ /^\w+\d+$/ ) {
>        $seq = $db->get_Seq_by_acc($identifier);
>    } else {
>        $seq = $db->get_Seq_by_id($identifier);
>    }
> 
> Genbank GI numbers (i.e. 51527264) match the regular expression 
> /^\w+\d+$/ therefore unsuprisingly the method get_Seq_by_acc fails (with 
> a warning like: MSG: acc (gb|51527264) does not exist). Instead, the 
> method get_Seq_by_id works when called with GI numbers:
> 
> 
>   use Bio::DB::GenBank;
>   my $genbank_db = Bio::DB::GenBank->new();
>   $seq = $genbank_db->get_Seq_by_id(51527264);
>   print $seq->desc;
> 
> Shouldn't the regular expression in get_sequence be changed to look for 
> identifiers that are all digits and then call get_Seq_by_id? Or am I not 
> understanding something?
> 

traditionally "GI" numbers are _not_ accession numbers: GI numbers
are internal numbers given out by NCBI for sequences in-house. However, this
is all about heuristics guessing the right thing, and probably the right thing
to do is try the get_Seq_by_acc, and then if this is undef, try get_Seq_by_id




> Thank you,
> 
> Paul
> 
> Paul Cantalupo
> Research Specialist/Systems Programmer
> 559 Crawford Hall
> Department of Biological Sciences
> University of Pittsburgh
> Pittsburgh, PA 15260
> Work: 412-624-4687
> Fax: 412-624-4759
> 
> Ask me about Toastmasters: www.toastmasters.org
> Midday Club Treasurer
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l


More information about the Bioperl-l mailing list