[Bioperl-l] retrieving top_SeqFeatures for RefSeq proteins fails

Chris Fields cjfields at uiuc.edu
Sun Apr 9 17:22:49 UTC 2006


That's because Bio::DB::RefSeq retrieves the data from EBI, not from NCBI.
The EBI version has no seq. features but the NCBI version does (I believe
NCBI adds them provisionally).  Retrieving via Bio::DB::GenBank is the way
to go, but you must add the -no_redirect flag to prevent Bio::DB::GenBank
from redirecting to Bio::DB::RefSeq:

my $factory = Bio::DB::GenBank->new(-no_redirect => 1);
my $seq = $factory->get_Sec_by_acc(' NM_001008292');

As a warning, RefSeqs are nonstandard GenBank, but I believe I have parsed
them for seq features before w/o problems.  From perldoc for
Bio::DB::RefSeq:

DESCRIPTION
    Allows the dynamic retrieval of sequence objects Bio::Seq from the
    RefSeq database using the dbfetch script at EBI:
    <http://www.ebi.ac.uk/cgi-bin/dbfetch>.

    In order to make changes transparent we have host type (currently only
    ebi) and location (defaults to ebi) separated out. This allows later
    additions of more servers in different geographical locations.

    The functionality of this module is inherited from Bio::DB::DBFetch
    which implements Bio::DB::WebDBSeqI.

    This module retrieves entries from EBI although it retrives database
    entries produced at NCBI. When read into bioperl objects, the parser for
    GenBank format it used. RefSeq is a NONSTANDARD GenBank file so be ready
    for surprises.

Chris

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of
> e.rapsomaniki at mail.cryst.bbk.ac.uk
> Sent: Saturday, April 08, 2006 8:08 AM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] retrieving top_SeqFeatures for RefSeq proteins fails
> 
> 
> Hi
> 
> I am trying to retrieve coding sequences associated with RefSeq proteins.
> My
> code (below) works for non-refseq proteins (e.g BAB26271) but not for
> refseq
> (no sequence
> features are retrieved although I checked the web-page and a coded_by
> feature
> should be there). Any suggestions? I am using bioperl 1.4
> 
> Here's my code:
> use Bio::Seq;
> use Bio::DB::GenPept;
> use Bio::DB::GenBank;
> use Bio::DB::RefSeq;
> my $gb = new Bio::DB::GenBank;
> my $gp = new Bio::DB::RefSeq;
> my $prot_obj = $gp->get_Seq_by_acc("NP_001008293");
> return unless defined($prot_obj);
> 
> # factory to turn strings into Bio::Location objects
> my $loc_factory = new Bio::Factory::FTLocationFactory;
> my $orf;
> 
> my @f=$prot_obj->top_SeqFeatures();
> print "@f\n"; #returns nothing
> foreach my $feat ( $prot_obj->top_SeqFeatures ) {
> print  $feat->primary_tag, "\n";
> 	if ( $feat->primary_tag eq 'CDS' ) {
> 
> 	   my @coded_by = $feat->each_tag_value('coded_by');
> 	   print @coded_by, "\n";
> 	   my ($nuc_acc,$loc_str) = split /\:/, $coded_by[0];
> 	   #$nuc_acc=~ s/\..*//;
> 	   my $nuc_obj = $gb->get_Seq_by_acc($nuc_acc);
> 	   return unless defined($nuc_obj);
> 	   my $loc_object = $loc_factory->from_string($loc_str);
> 	   # create a Feature object by using a Location
> 	   my $feat_obj = new Bio::SeqFeature::Generic(-location
> =>$loc_object);
> 	   # associate the Feature object with the nucleotide Seq object
> 	   $nuc_obj->add_SeqFeature($feat_obj);
> 	   my $cds_obj = $feat_obj->spliced_seq;
> 	   $orf=$cds_obj->seq;
> 	}
> }
> print "$orf\n";
> 
> 
> ----------------------------------------------------------------
> This message was sent using IMP, the Internet Messaging Program.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l




More information about the Bioperl-l mailing list