[Bioperl-l] Frame translation gets an extra aa?

Chris Fields cjfields at illinois.edu
Mon Jan 17 16:07:46 UTC 2011


Amir,

Completely missed the frame argument you passed.  Yes, the behavior between PrimarySeqI::translate and CodonTable::translate seems inconsistent here, particularly with the '-complete' parameter (implying a complete CDS) defaulting to false.  If the default assumption by PrimarySeqI::translate() is any sequence to be translated isn't complete, why should CodonTable::translate() automatically 'complete' the translation for incomplete codons by default?  I would consider this a bug.  

However, as '-complete' also assumes a complete CDS, using it doesn't quite fit either, so we probably need some argument that allows for more finitely defining this.  '-strict' ?

Anyway, that is easily fixed; just currying the flag to the call to CodonTable::translate, then bypassing translation of partial codons is present, corrects the problem.  Would just need to decide on the above.

chris

On Jan 16, 2011, at 1:00 AM, Karger, Amir wrote:

> Wait, what? Aaron, I'm not a biologist, so please give me a couple more sentences here.
> 
> Also, the docs (and code) don't seem to support your numbers. From http://www.bioperl.org/wiki/BioPerl_Tutorial: 
> 
>    You can also determine the frame of the translation. The default frame starts at the first nucleotide (frame 0). To get translation in the next frame we would write: 
>    $prot_obj = $my_seq_object->translate(-frame => 1);
> 
>> From http://doc.bioperl.org/releases/bioperl-1.6.1/ PrimarySeqI documentation (and my 1.5 perldoc Bio::PrimarySeqI):
>    Args:...
>    -frame         - frame                           default is 0
> 
>> From the code linked to at the doc.bioperl link above:
> 
> 	 ## use frame, error if frame is not 0, 1 or 2
> 		 $self->throw("Valid values for frame are 0, 1, or 2, not $frame.")
> 			unless ($frame == 0 or $frame == 1 or $frame == 2);
> 		 $seq = substr($seq,$frame);
> 
> What am I missing here? All the docs I see seem to use frame as "the number of bp we move to the right before we start translating codons 3 bp at a time". But if that code is being run when I do a translate() I should really be getting the answer I expect, and not four aas. And yet the Deobfuscator tells me that Bio::Seq::translate is inheriting from PrimarySeqI. And I get the same four-aa result if I create a PrimarySeq instead of a Seq.
> 
> Aha. Now I see that PrimarySeq::translate calls CodonTable::translate after taking the substr. CodonTable::translate() says:
> 
>  if the codon is two nucleotides long and if by adding
>               an [sic] a third character 'N', it codes for a single amino
>               acid (with exceptions above), return that, otherwise
>               return empty string.
> 
> Are you sure that's what every user of PrimarySeq::translate wants? If so, please put something in the docs about it. Also, is there an option that will let me say "translate 11 bp to only 3 aa"? From looking at the code, it looks like no. I guess I can do this on my own if frame is 1.
> 
> Slightly less confused,
> 
> -Amir
> 
> ________________________________________
> From: ajmackey at gmail.com [ajmackey at gmail.com] On Behalf Of Aaron Mackey [amackey at virginia.edu]
> Sent: Saturday, January 15, 2011 18:34
> To: Chris Fields
> Cc: Karger, Amir; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Frame translation gets an extra aa?
> 
> I'm guessing the confusion might be the differences in terminology between reading frame (taking a value of 1, 2 or 3) and leading intron phase (a value of 0, 1 or 2, which corresponds to a reading frame of 1, 3 or 2, respectively) ... ?
> 
> -Aaron
> 
> On Fri, Jan 14, 2011 at 1:25 PM, Chris Fields <cjfields at illinois.edu<mailto:cjfields at illinois.edu>> wrote:
> Amir,
> 
> Um, the sequence you have has 4 codons:
> 
> AAA CCC TTT GGG
> 
> Taking the final 'G' gives the correct response:
> 
> perl -l -MBio::Seq -e '$x=Bio::Seq->new(-display_id=>"foo",-seq=>"AAACCCTTTGG"); print $x->translate(-frame=>1)->seq'
> NPL
> 
> chris
> 
> On Jan 14, 2011, at 12:06 PM, Karger, Amir wrote:
> 
>> Apologies if this question has been asked before, or if it's so stupid that nobody was silly enough to ask it before.
>> 
>> (Using Bioperl 1.6.1)
>> 
>> perl -l -MBio::Seq -e '$x=Bio::Seq->new(-display_id=>"foo",-seq=>"AAACCCTTTGGG"); print $x->translate(-frame=>1)->seq'
>> NPLG
>> 
>> Um, why is GG being translated to G? Shouldn't you not translate if you only have 2 bp left? That is, even if you know that GGX translates to amino acid G for X in (A,C,G,T) you don't actually have that third bp right now. In real life, would an mRNA get translated even if it's missing the third base pair?
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l





More information about the Bioperl-l mailing list