[BioPython] Translation of ambiguous codons like NNN and TAN

Peter biopython at maubp.freeserve.co.uk
Mon Jul 21 21:04:34 UTC 2008


> The relevant document here is the IUPAC 'Nomenclature for Incompletely
> Specified Bases in Nucleic Acid Sequences'
>  (http://www.chem.qmul.ac.uk/iubmb/misc/naseq.html).
>
> Table 4 provides some examples such as showing the correspondence between
> the triplet  'NNN' and the amino acid 'Unknown' with one letter code 'X'.
> Consequently, here must also be a correspondence between triplet 'TAN' (or
> any other related triplet for that matter) and the amino acid 'X' because
> the triple 'NNN' must also include the stop codons which have no
> corresponding amino acid (okay ignoring selenocysteine etc.).
>
> Bruce

I'd seen that table, but hadn't interpreted it in quite the same way
as you seem to have.  Table 4 shows a mapping from amino acids to
ambiguous codons - but it is not a bijection (it is NOT a reversible
mapping).  Or to put it another way, Table 4 just shows the narrowest
possible ambiguous codon which covers all possible codons for that
amino acid (or stop).  This does NOT mean that ambiguous codon will
allways translate to the given amino acid.

Here are two explicit examples to try and explain myself (using a
little set notation).  In the standard genetic code, stop codons are
{TAA, TGA, TAG} = {TAR, TGA} = {TAA, TRA}.  If you have to use a
single codon, the best you can do is TRR = {TAA, TAG, TGA, TGG} =
{stop codons}U{TGG}.  Table 4 maps "*" to "TRR".  While TRR does cover
all the possible stop codons, it also includes a codon for Trp (W).  I
would NOT want TRR translated as a stop!  It codes for a stop or a W.

Similarly, for Leucine (L) the possible codons are {TTA, TTG, CTT,
CTC, CTA, CTG} = {TTR, CTN}.  If you have to use a single codon then
YTN covers this - but it also covers CTA and CTG which code for Phe
(F).  I would NOT want YTN translated as L - it codes for L or F, so
would have to be translated as X.

So on the basis of Table 4, there is no reason to infer that IUPAC
explicitly suggest NNN should be translated as X.

Peter



More information about the Biopython mailing list