[Bioperl-l] retrieving top_SeqFeatures for RefSeq proteins fails

Chris Fields cjfields at uiuc.edu
Sun Apr 9 17:43:32 UTC 2006


Forgot to add: I'm not sure when -no_redirect was added, but if this doesn't
work you might consider upgrading to 1.5.1 or CVS if possible; 1.5.1 is
stable considering it is a developer's release and CVS versions are also
pretty stable.  v. 1.4 is ~2 years old.  

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Chris Fields
> Sent: Sunday, April 09, 2006 12:23 PM
> To: e.rapsomaniki at mail.cryst.bbk.ac.uk; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] retrieving top_SeqFeatures for RefSeq proteins
> fails
> 
> That's because Bio::DB::RefSeq retrieves the data from EBI, not from NCBI.
> The EBI version has no seq. features but the NCBI version does (I believe
> NCBI adds them provisionally).  Retrieving via Bio::DB::GenBank is the way
> to go, but you must add the -no_redirect flag to prevent Bio::DB::GenBank
> from redirecting to Bio::DB::RefSeq:
> 
> my $factory = Bio::DB::GenBank->new(-no_redirect => 1);
> my $seq = $factory->get_Sec_by_acc(' NM_001008292');
> 
> As a warning, RefSeqs are nonstandard GenBank, but I believe I have parsed
> them for seq features before w/o problems.  From perldoc for
> Bio::DB::RefSeq:
> 
> DESCRIPTION
>     Allows the dynamic retrieval of sequence objects Bio::Seq from the
>     RefSeq database using the dbfetch script at EBI:
>     <http://www.ebi.ac.uk/cgi-bin/dbfetch>.
> 
>     In order to make changes transparent we have host type (currently only
>     ebi) and location (defaults to ebi) separated out. This allows later
>     additions of more servers in different geographical locations.
> 
>     The functionality of this module is inherited from Bio::DB::DBFetch
>     which implements Bio::DB::WebDBSeqI.
> 
>     This module retrieves entries from EBI although it retrives database
>     entries produced at NCBI. When read into bioperl objects, the parser
> for
>     GenBank format it used. RefSeq is a NONSTANDARD GenBank file so be
> ready
>     for surprises.
> 
> Chris
> 
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of
> > e.rapsomaniki at mail.cryst.bbk.ac.uk
> > Sent: Saturday, April 08, 2006 8:08 AM
> > To: bioperl-l at lists.open-bio.org
> > Subject: [Bioperl-l] retrieving top_SeqFeatures for RefSeq proteins
> fails
> >
> >
> > Hi
> >
> > I am trying to retrieve coding sequences associated with RefSeq
> proteins.
> > My
> > code (below) works for non-refseq proteins (e.g BAB26271) but not for
> > refseq
> > (no sequence
> > features are retrieved although I checked the web-page and a coded_by
> > feature
> > should be there). Any suggestions? I am using bioperl 1.4
> >
> > Here's my code:
> > use Bio::Seq;
> > use Bio::DB::GenPept;
> > use Bio::DB::GenBank;
> > use Bio::DB::RefSeq;
> > my $gb = new Bio::DB::GenBank;
> > my $gp = new Bio::DB::RefSeq;
> > my $prot_obj = $gp->get_Seq_by_acc("NP_001008293");
> > return unless defined($prot_obj);
> >
> > # factory to turn strings into Bio::Location objects
> > my $loc_factory = new Bio::Factory::FTLocationFactory;
> > my $orf;
> >
> > my @f=$prot_obj->top_SeqFeatures();
> > print "@f\n"; #returns nothing
> > foreach my $feat ( $prot_obj->top_SeqFeatures ) {
> > print  $feat->primary_tag, "\n";
> > 	if ( $feat->primary_tag eq 'CDS' ) {
> >
> > 	   my @coded_by = $feat->each_tag_value('coded_by');
> > 	   print @coded_by, "\n";
> > 	   my ($nuc_acc,$loc_str) = split /\:/, $coded_by[0];
> > 	   #$nuc_acc=~ s/\..*//;
> > 	   my $nuc_obj = $gb->get_Seq_by_acc($nuc_acc);
> > 	   return unless defined($nuc_obj);
> > 	   my $loc_object = $loc_factory->from_string($loc_str);
> > 	   # create a Feature object by using a Location
> > 	   my $feat_obj = new Bio::SeqFeature::Generic(-location
> > =>$loc_object);
> > 	   # associate the Feature object with the nucleotide Seq object
> > 	   $nuc_obj->add_SeqFeature($feat_obj);
> > 	   my $cds_obj = $feat_obj->spliced_seq;
> > 	   $orf=$cds_obj->seq;
> > 	}
> > }
> > print "$orf\n";
> >
> >
> > ----------------------------------------------------------------
> > This message was sent using IMP, the Internet Messaging Program.
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l




More information about the Bioperl-l mailing list