[Bioperl-l] Frame translation gets an extra aa?
cjfields at illinois.edu
Mon Jan 17 16:07:46 UTC 2011
Completely missed the frame argument you passed. Yes, the behavior between PrimarySeqI::translate and CodonTable::translate seems inconsistent here, particularly with the '-complete' parameter (implying a complete CDS) defaulting to false. If the default assumption by PrimarySeqI::translate() is any sequence to be translated isn't complete, why should CodonTable::translate() automatically 'complete' the translation for incomplete codons by default? I would consider this a bug.
However, as '-complete' also assumes a complete CDS, using it doesn't quite fit either, so we probably need some argument that allows for more finitely defining this. '-strict' ?
Anyway, that is easily fixed; just currying the flag to the call to CodonTable::translate, then bypassing translation of partial codons is present, corrects the problem. Would just need to decide on the above.
On Jan 16, 2011, at 1:00 AM, Karger, Amir wrote:
> Wait, what? Aaron, I'm not a biologist, so please give me a couple more sentences here.
> Also, the docs (and code) don't seem to support your numbers. From http://www.bioperl.org/wiki/BioPerl_Tutorial:
> You can also determine the frame of the translation. The default frame starts at the first nucleotide (frame 0). To get translation in the next frame we would write:
> $prot_obj = $my_seq_object->translate(-frame => 1);
>> From http://doc.bioperl.org/releases/bioperl-1.6.1/ PrimarySeqI documentation (and my 1.5 perldoc Bio::PrimarySeqI):
> -frame - frame default is 0
>> From the code linked to at the doc.bioperl link above:
> ## use frame, error if frame is not 0, 1 or 2
> $self->throw("Valid values for frame are 0, 1, or 2, not $frame.")
> unless ($frame == 0 or $frame == 1 or $frame == 2);
> $seq = substr($seq,$frame);
> What am I missing here? All the docs I see seem to use frame as "the number of bp we move to the right before we start translating codons 3 bp at a time". But if that code is being run when I do a translate() I should really be getting the answer I expect, and not four aas. And yet the Deobfuscator tells me that Bio::Seq::translate is inheriting from PrimarySeqI. And I get the same four-aa result if I create a PrimarySeq instead of a Seq.
> Aha. Now I see that PrimarySeq::translate calls CodonTable::translate after taking the substr. CodonTable::translate() says:
> if the codon is two nucleotides long and if by adding
> an [sic] a third character 'N', it codes for a single amino
> acid (with exceptions above), return that, otherwise
> return empty string.
> Are you sure that's what every user of PrimarySeq::translate wants? If so, please put something in the docs about it. Also, is there an option that will let me say "translate 11 bp to only 3 aa"? From looking at the code, it looks like no. I guess I can do this on my own if frame is 1.
> Slightly less confused,
> From: ajmackey at gmail.com [ajmackey at gmail.com] On Behalf Of Aaron Mackey [amackey at virginia.edu]
> Sent: Saturday, January 15, 2011 18:34
> To: Chris Fields
> Cc: Karger, Amir; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Frame translation gets an extra aa?
> I'm guessing the confusion might be the differences in terminology between reading frame (taking a value of 1, 2 or 3) and leading intron phase (a value of 0, 1 or 2, which corresponds to a reading frame of 1, 3 or 2, respectively) ... ?
> On Fri, Jan 14, 2011 at 1:25 PM, Chris Fields <cjfields at illinois.edu<mailto:cjfields at illinois.edu>> wrote:
> Um, the sequence you have has 4 codons:
> AAA CCC TTT GGG
> Taking the final 'G' gives the correct response:
> perl -l -MBio::Seq -e '$x=Bio::Seq->new(-display_id=>"foo",-seq=>"AAACCCTTTGG"); print $x->translate(-frame=>1)->seq'
> On Jan 14, 2011, at 12:06 PM, Karger, Amir wrote:
>> Apologies if this question has been asked before, or if it's so stupid that nobody was silly enough to ask it before.
>> (Using Bioperl 1.6.1)
>> perl -l -MBio::Seq -e '$x=Bio::Seq->new(-display_id=>"foo",-seq=>"AAACCCTTTGGG"); print $x->translate(-frame=>1)->seq'
>> Um, why is GG being translated to G? Shouldn't you not translate if you only have 2 bp left? That is, even if you know that GGX translates to amino acid G for X in (A,C,G,T) you don't actually have that third bp right now. In real life, would an mRNA get translated even if it's missing the third base pair?
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
More information about the Bioperl-l