[EMBOSS] transeq and ambiguous codons
Peter
biopython at maubp.freeserve.co.uk
Fri Jul 10 05:14:42 EDT 2009
On Thu, Jul 9, 2009 at 10:08 AM, Peter Rice<pmr at ebi.ac.uk> wrote:
>
> Peter C. wrote:
>> However, consider the codon TRR. R means A or G, so this can mean TAA,
>> TGA, TAG or TGG which translate to stop or W (both EMBOSS and the NCBI
>> standard table agree here). Therefore the translation of TRR should be
>> "* or W", which I would expect based on the above examples to result
>> in "X". But instead EMBOSS transeq gives "*":
>
> This is a side effect of the way backtranslation works...
OK, leaving TRR aside for the moment (I'm not sure I'd have done it that
way, but I think I follow your logic), I have some more problem cases for
you to consider (all using the default standard NCBI table 1).
Most of these are 'unambiguous ambiguous codons' as you put it, and
I would agree using X when a more specific letter is possible isn't ideal
but isn't actually wrong. The "ATS" and related codons (see below)
however are simply wrong.
--------------------------------------------------------------------------------------
TRA means TAA or TGA, which are both stop codons. Therefore TRA
should translate as a stop, not as an X:
$ transeq asis:TAATGATRA -stdout -auto -osformat raw
**X
--------------------------------------------------------------------------------------
Now look at YTA, which means CTA or TTA which encode L, so
YTA should be L not X:
$ transeq asis:CTATTAYTA -stdout -auto -osformat raw
LLX
Likewise for YTG and YTR, and YTN.
--------------------------------------------------------------------------------------
Another example, ATW means ATA or ATT, which both translate as I,
so ATW should translate as I not X:
$ transeq asis:ATAATTATW -stdout -auto -osformat raw
IIX
--------------------------------------------------------------------------------------
Conversely, ATS which means ATC or ATG which translate as I and M.
Remember S means G or C. Therefore ATS should translate as X, and
not I:
$ transeq asis:ATCATGATS -stdout -auto -osformat raw
IMI
Likewise H means A, G or C, so ATH shows the same bug, as do some
other AT* codons:
$ transeq asis:ATAATCATGATH -stdout -auto -osformat raw
IIMI
[*** This one strikes me as a clear bug ***]
--------------------------------------------------------------------------------------
Now for another debatable one, RAT means AAT or GAT which code
for N and D. So, you could use B (Asx) here rather than the broader X.
$ transeq asis:AATGATRAT -stdout -auto -osformat raw
NDX
Again, the same thing for others like RAC -> X not B, and RAY -> X not B.
Similarly, you don't use J to mean leucine (L) or to isoleucine (I), and
opt for X (again, this is justifiable). e.g. WTA
$ transeq asis:ATATTAWTA -stdout -auto -osformat raw
ILX
--------------------------------------------------------------------------------------
This list is only partial, and only for the standard table.
Peter C.
More information about the EMBOSS
mailing list