[Bioperl-l] Frame translation gets an extra aa?

Karger, Amir akarger at CGR.Harvard.edu
Sun Jan 16 07:00:15 UTC 2011


Wait, what? Aaron, I'm not a biologist, so please give me a couple more sentences here.

Also, the docs (and code) don't seem to support your numbers. From http://www.bioperl.org/wiki/BioPerl_Tutorial: 

    You can also determine the frame of the translation. The default frame starts at the first nucleotide (frame 0). To get translation in the next frame we would write: 
    $prot_obj = $my_seq_object->translate(-frame => 1);

>From http://doc.bioperl.org/releases/bioperl-1.6.1/ PrimarySeqI documentation (and my 1.5 perldoc Bio::PrimarySeqI):
    Args:...
    -frame         - frame                           default is 0

>From the code linked to at the doc.bioperl link above:

	 ## use frame, error if frame is not 0, 1 or 2
		 $self->throw("Valid values for frame are 0, 1, or 2, not $frame.")
			unless ($frame == 0 or $frame == 1 or $frame == 2);
		 $seq = substr($seq,$frame);

What am I missing here? All the docs I see seem to use frame as "the number of bp we move to the right before we start translating codons 3 bp at a time". But if that code is being run when I do a translate() I should really be getting the answer I expect, and not four aas. And yet the Deobfuscator tells me that Bio::Seq::translate is inheriting from PrimarySeqI. And I get the same four-aa result if I create a PrimarySeq instead of a Seq.

Aha. Now I see that PrimarySeq::translate calls CodonTable::translate after taking the substr. CodonTable::translate() says:

  if the codon is two nucleotides long and if by adding
               an [sic] a third character 'N', it codes for a single amino
               acid (with exceptions above), return that, otherwise
               return empty string.

Are you sure that's what every user of PrimarySeq::translate wants? If so, please put something in the docs about it. Also, is there an option that will let me say "translate 11 bp to only 3 aa"? From looking at the code, it looks like no. I guess I can do this on my own if frame is 1.

Slightly less confused,

-Amir

________________________________________
From: ajmackey at gmail.com [ajmackey at gmail.com] On Behalf Of Aaron Mackey [amackey at virginia.edu]
Sent: Saturday, January 15, 2011 18:34
To: Chris Fields
Cc: Karger, Amir; bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Frame translation gets an extra aa?

I'm guessing the confusion might be the differences in terminology between reading frame (taking a value of 1, 2 or 3) and leading intron phase (a value of 0, 1 or 2, which corresponds to a reading frame of 1, 3 or 2, respectively) ... ?

-Aaron

On Fri, Jan 14, 2011 at 1:25 PM, Chris Fields <cjfields at illinois.edu<mailto:cjfields at illinois.edu>> wrote:
Amir,

Um, the sequence you have has 4 codons:

AAA CCC TTT GGG

Taking the final 'G' gives the correct response:

perl -l -MBio::Seq -e '$x=Bio::Seq->new(-display_id=>"foo",-seq=>"AAACCCTTTGG"); print $x->translate(-frame=>1)->seq'
NPL

chris

On Jan 14, 2011, at 12:06 PM, Karger, Amir wrote:

> Apologies if this question has been asked before, or if it's so stupid that nobody was silly enough to ask it before.
>
> (Using Bioperl 1.6.1)
>
> perl -l -MBio::Seq -e '$x=Bio::Seq->new(-display_id=>"foo",-seq=>"AAACCCTTTGGG"); print $x->translate(-frame=>1)->seq'
> NPLG
>
> Um, why is GG being translated to G? Shouldn't you not translate if you only have 2 bp left? That is, even if you know that GGX translates to amino acid G for X in (A,C,G,T) you don't actually have that third bp right now. In real life, would an mRNA get translated even if it's missing the third base pair?




More information about the Bioperl-l mailing list