[BioPython] Translation of ambiguous codons like NNN and TAN

Mon Jul 21 19:23:34 UTC 2008

Peter wrote:
> Dear all,
>
> I've recently filed Bug 2547 about changing the behaviour of
> Bio.Seq.translate() when given ambiguous codons like NNN or TAN which
> could be either an amino acid OR a stop codon.
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=2547
>
> In my opinion it would be nice if there was an established convention
> for how to represent an ambiguous character meaning {X or stop}, for
> the TAN example even {Y or stop}.  However, as far as I am aware,
> people just use X to mean any amino acid OR a stop codon.  This is the
> behaviour in both the EMBOSS transeq tool and in BioPerl.
>
> I am proposing to change Bio.Seq.translate() be able to translate
> codons like NNN and TAN as X (rather than throwing a translation error
> as happens now).
>
> Comments please?
>
> [Any implementation suggestions on the development mailing list please.]
>
> Peter
> _______________________________________________
> BioPython mailing list  -  BioPython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>
>   
The relevant document here is the IUPAC 'Nomenclature for Incompletely 
Specified Bases in Nucleic Acid Sequences'  
(http://www.chem.qmul.ac.uk/iubmb/misc/naseq.html).

Table 4 provides some examples such as showing the correspondence 
between the triplet  'NNN' and the amino acid 'Unknown' with one letter 
code 'X'.  Consequently, here must also be a correspondence between 
triplet 'TAN' (or any other related triplet for that matter) and the 
amino acid 'X' because the triple 'NNN' must also include the stop 
codons which have no corresponding amino acid (okay ignoring 
selenocysteine etc.).

Bruce