<html><head><meta http-equiv="Content-Type" content="text/html charset=windows-1252"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;">Jason,<div><br></div><div>Attached is a minimal script that illustrates my problem - I am expecting to get a print of an UPDATE line with a nucleotide sequence.</div><div><br></div><div>I must be missing some BioPerl subtlety because this is happening with every one of some hundred gi numbers that I try.</div><div><br></div><div>Thanks for looking at this - I am sure that I have a blind spot here somewhere.</div><div><br></div><div>Warren</div><div><br><div><div>On Apr 15, 2014, at 3:55 PM, Jason Stajich <<a href="mailto:jason@bioperl.org">jason@bioperl.org</a>> wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><div dir="ltr">Warren -<div><br></div><div>Can you provide a specific accession as an example, there shouldn't be any call to the translation function the way this code is running for the object so I am guessing the accession number you are pointing to is protein (though Bio::DB::GenBank would complain if that were so, so I'm a little confused how this would be happening).<div>
<br></div><div>Jason</div></div></div><div class="gmail_extra"><br clear="all"><div><div dir="ltr">Jason Stajich<br><a href="mailto:jason@bioperl.org" target="_blank">jason@bioperl.org</a><br><a href="http://bioperl.org/wiki/User:Jason" target="_blank">http://bioperl.org/wiki/User:Jason</a><br>
<a href="http://twitter.com/hyphaltip" target="_blank">http://twitter.com/hyphaltip</a></div></div>
<br><br><div class="gmail_quote">On Tue, Apr 15, 2014 at 2:23 PM, Warren Gallin <span dir="ltr"><<a href="mailto:wgallin@ualberta.ca" target="_blank">wgallin@ualberta.ca</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Jason,<br>
<br>
Works almost perfectly, except I am getting back the protein sequence rather than the underlying nucleotide sequence.<br>
<br>
My specific code fragment is:<br>
<br>
<br>
<br>
my $gb_db = Bio::DB::GenBank->new();<br>
<br>
<...Bunch of code that retrieves a protein GenBank formatted file and walks through the features until...><br>
<br>
my $feature = $feature_object->primary_tag;<br>
<br>
if ( $feature ne "CDS" ) { next; }<br>
else {<br>
$spliced_cds = $feature_object->spliced_seq($gb_db);<br>
$na_seq = $spliced_cds->seq;<br>
<br>
}<br>
<br>
< More code, that leads to printing the value for $na_seq …><br>
<br>
So somehow the nucleotide sequence is being translated into protein sequence - is there some option that needs setting to prevent translation?<br>
<span class="HOEnZb"><font color="#888888"><br>
Warren<br>
</font></span><div class="HOEnZb"><div class="h5"><br>
<br>
On Apr 15, 2014, at 1:11 PM, Jason Stajich <<a href="mailto:jason@bioperl.org">jason@bioperl.org</a>> wrote:<br>
<br>
> This is supported in bioperl with the feature objects and the Bio::SeqFeatureI method spliced_seq -<br>
> You would just have Bio::DB::GenBank object which you provide to the function;<br>
><br>
> my $db = Bio::DB::Genbank->new();<br>
> my $spliced_cds = $feature_with_remote_locations->spliced_seq($db);<br>
><br>
><br>
><br>
><br>
> Jason Stajich<br>
> <a href="mailto:jason@bioperl.org">jason@bioperl.org</a><br>
> <a href="http://bioperl.org/wiki/User:Jason" target="_blank">http://bioperl.org/wiki/User:Jason</a><br>
> <a href="http://twitter.com/hyphaltip" target="_blank">http://twitter.com/hyphaltip</a><br>
><br>
><br>
> On Tue, Apr 15, 2014 at 11:39 AM, Warren Gallin <<a href="mailto:wgallin@ualberta.ca">wgallin@ualberta.ca</a>> wrote:<br>
> I am having a problem finding a general method of recovering the nucleotide coding sequence for a protein sequence record.<br>
><br>
> Generally tracking the CDS annotation back to the nucleotide sequence record using the accession number of the nucleotide sequence is working.<br>
><br>
> One problem arises when the underlying coding sequence is spliced from multiple nucleotide records. Is there a general approach to automatically track down and joint the different sequence fragments from different sequence entries? An example of the problem can be seen if you start from the protein record with GI number 7715882. It is annotated as coming from three different nucleotide records. Is there an approach in Bioperl that will detect and download these three records and splice together the appropriate parts to get the coding sequence?<br>
><br>
> The other problem that I am having is the ongoing issue of protein records annotated as highly redundant sequences , with WP-XXXXXX accession numbers. Has anyone found a way to retrieve the set of different nucleotide sequences that all encode a single AP-annotated protein sequence?<br>
><br>
> Any help would be appreciated,<br>
><br>
> Warren Gallin<br>
> _______________________________________________<br>
> Bioperl-l mailing list<br>
> <a href="mailto:Bioperl-l@lists.open-bio.org">Bioperl-l@lists.open-bio.org</a><br>
> <a href="http://lists.open-bio.org/mailman/listinfo/bioperl-l" target="_blank">http://lists.open-bio.org/mailman/listinfo/bioperl-l</a><br>
><br>
<br>
</div></div></blockquote></div><br></div>
</blockquote></div><br></div><div><br></div><div></div></body></html>