[EMBOSS] transeq and ambiguous codons

Peter Rice pmr at ebi.ac.uk
Thu Jul 9 09:08:37 UTC 2009


Peter C. wrote:
> However, consider the codon TRR. R means A or G, so this can mean TAA,
> TGA, TAG or TGG which translate to stop or W (both EMBOSS and the NCBI
> standard table agree here). Therefore the translation of TRR should be
> "* or W", which I would expect based on the above examples to result
> in "X". But instead EMBOSS transeq gives "*":

This is a side effect of the way backtranslation works.

EMBOSS calculates the "most ambiguous codon" for each amino acid and
stop, and uses this for back translation. Thus a '*' in a protein
sequence would be rendered as 'TRR' by backtranseq. To provide
consistent translation of the backtranseq results, TRR is assumed to be
a backtranslated stop. Similarly, MGN is 'R' because it could reasonably
result from a backtranslation of 'R'

I agree that it would also be reasonable to be strict about translation
in transeq and render TRR as 'X'

It depends on your philosophy of where the ambiguity codes came from -
from backtranslation, or the curious mind of a bioinformatician :-)

So .... it's not a bug, it's a feature ... which means I can relax for
now and contemplate some extras in the next release.

In future, we will at least make sure TRA and other 'unambiguous
ambiguous codons' get understood as '*' etc. TRR I would prefer to leave
as it is by default, with option for rendering it as 'X' or an
alternative to transeq with the strict translation rules enforced.

regards,

Peter Rice





More information about the EMBOSS mailing list