[BioPython] back-translation method for Seq object?

Peter biopython at maubp.freeserve.co.uk
Thu Oct 16 15:11:27 UTC 2008


Quoting from the recent thread about adding a translation method to
the Seq object, Bruce brought up back-translation:

Peter wrote:
> Bruce wrote:
>> Obviously reverse translation of a protein sequence to a DNA sequence is
>> complex if there are many solutions.
>
> Yes, back-translation is tricky because there is generally more than
> one codon for any amino acid.  Ambiguous nucleotides can be used to
> describe several possible codons giving that amino acid, but in
> general it is not possible to do this and describe all the possible
> codons which could have been used.  This topic is worth of an entire
> thread... for the record, I would envisage a back_translate method for
> the Seq object (assuming we settle on translate as the name for the
> forward translation from nucleotide to protein).

Do we actually need a back_translate method?  Can anyone suggest an
actual use-case for this?  It seems difficult to imagine that any
simple version would please everyone.

Bio.Translate (a semi-obsolete module whose deprecation has been
suggested) provides a back_translate method which picks an essentially
arbitrary but unambiguous codon for each amino acid.  Crude but
simple.  A more meaningful choice would require suppling codon
frequencies for the organism under consideration.

Other possibilities include using ambiguous nucleotides to try and
cover all the possibilities (e.g. "L" -> "CTN"), but even here in some
cases this is arbritary.  e.g. The standard three stop codons ['TAA',
'TAG', 'TGA'] could be represented as ['TAR', 'TGA'] or ['TRA', 'TAG']
but not by a single ambiguous codon ('TRR' also covers 'TGG' which
codes for 'W').

Potentially of use would be a generator function which returned all
possible back translations - but this would be complex and typically
overkill.

As a final point, a Seq object back-translation method could give RNA
or DNA.  From a biological point of view giving DNA by default would
make sense.  This choice is handled in Bio.Translate when creating the
translator object (part of what makes Bio.Translate relatively complex
to use).

Peter



More information about the Biopython mailing list