[BioPython] [DETECTED AS SPAM] Re: back-translation method for Seq object?

Bruce Southey bsouthey at gmail.com
Tue Oct 21 16:36:31 EDT 2008


Peter wrote:
> Hi everyone,
>
> I think we all agree that if we want a back-translation
> method/function to return a simple string or Seq object (given no
> additional information about the codon use), this cannot fully capture
> all the possible codons.
>   


For completeness as these are not 100% correct,
Leu/L =(TTA|TTG|CTT|CTC|CTA|CTG) = (TTN|CTR) = YTN
Arg/R =(CGT|CGC|CGA|CGG|AGA|AGG) =(CGV | AGR) = MGV
Ser/S =(TCT|TCC|TCA|TCG|AGT|AGC) =(TCN|AGY) = WSN

Ser is really so bad that one would suggest providing a strong warning 
and just use NTN, NGN, and NNN for Leu, Arg and Ser, respectively.


> If we want to provide a simple string or Seq object, we can either
> pick an arbitrary codon in each case (as in the first attachment on
> Bug 2618), or perhaps represent some of the possible codons using
> ambiguous nucleotides.
>
> e.g.
> back_translate("MR") = "ATGCGT" #arbitrary codon for R unambiguous nucleotides
>
> or,
> back_translate("MR") = "ATGCGN" #arbitrary codon for R using ambiguous
> nucleotides
>
> Note in either example, the following nice property holds:
> translate(back_translate("MR")) == "MR"
>
> Even if improved by typical codon usage figures to give a more
> biologically likely answer, neither of these simple approaches covers
> the full set of six possible codons for Arg in the standard codon
> table.
>
> It was something like this that I envisioned as a candidate for a Seq
> method (based on the behaviour of the existing Bio.Translate
> functionality), but only if such a simple back_translate
> method/function had any real uses.  And thus far, I haven't seen any.
>   
For you perhaps but my reasons are very real to me!
> A back translation method/function which dealt with all the possible
> codon choices would have to use a more advanced representation
> (possibly as Bruce suggested using regular expressions or some sort of
> tree structure - ideally as a sub-class of the Seq object).  There is
> also the option of returning multiple simple strings or Seq objects
> (either as a list or preferable a generator) giving all possible back
> translations, but I don't think this would be useful, except perhaps
> on small examples, due to the potentially vast number of return
> values.
>
> Peter
>
>   
In any situation, we are left with a ambiguous codons, a regular 
expression or some combination of sequence type (e.g., strings or Seq 
objects). None of these options are fully compatible with the Seq 
object.  So I do agree that back-translation can not be part of the Seq 
object. Also I agree that while first two could be return types for a 
Seq object method, the usage is probably too infrequent and too 
specialized for inclusion especially to handle codon usage frequencies.

Bruce




More information about the BioPython mailing list