[BioPython] [DETECTED AS SPAM] Re: back-translation method for Seq object?

Peter biopython at maubp.freeserve.co.uk
Tue Oct 21 16:07:46 UTC 2008


Hi everyone,

I think we all agree that if we want a back-translation
method/function to return a simple string or Seq object (given no
additional information about the codon use), this cannot fully capture
all the possible codons.

If we want to provide a simple string or Seq object, we can either
pick an arbitrary codon in each case (as in the first attachment on
Bug 2618), or perhaps represent some of the possible codons using
ambiguous nucleotides.

e.g.
back_translate("MR") = "ATGCGT" #arbitrary codon for R unambiguous nucleotides

or,
back_translate("MR") = "ATGCGN" #arbitrary codon for R using ambiguous
nucleotides

Note in either example, the following nice property holds:
translate(back_translate("MR")) == "MR"

Even if improved by typical codon usage figures to give a more
biologically likely answer, neither of these simple approaches covers
the full set of six possible codons for Arg in the standard codon
table.

It was something like this that I envisioned as a candidate for a Seq
method (based on the behaviour of the existing Bio.Translate
functionality), but only if such a simple back_translate
method/function had any real uses.  And thus far, I haven't seen any.

A back translation method/function which dealt with all the possible
codon choices would have to use a more advanced representation
(possibly as Bruce suggested using regular expressions or some sort of
tree structure - ideally as a sub-class of the Seq object).  There is
also the option of returning multiple simple strings or Seq objects
(either as a list or preferable a generator) giving all possible back
translations, but I don't think this would be useful, except perhaps
on small examples, due to the potentially vast number of return
values.

Peter



More information about the Biopython mailing list