[BioPython] Translation of ambiguous codons like NNN and TAN
Bruce Southey
bsouthey at gmail.com
Mon Jul 21 19:23:34 UTC 2008
Peter wrote:
> Dear all,
>
> I've recently filed Bug 2547 about changing the behaviour of
> Bio.Seq.translate() when given ambiguous codons like NNN or TAN which
> could be either an amino acid OR a stop codon.
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=2547
>
> In my opinion it would be nice if there was an established convention
> for how to represent an ambiguous character meaning {X or stop}, for
> the TAN example even {Y or stop}. However, as far as I am aware,
> people just use X to mean any amino acid OR a stop codon. This is the
> behaviour in both the EMBOSS transeq tool and in BioPerl.
>
> I am proposing to change Bio.Seq.translate() be able to translate
> codons like NNN and TAN as X (rather than throwing a translation error
> as happens now).
>
> Comments please?
>
> [Any implementation suggestions on the development mailing list please.]
>
> Peter
> _______________________________________________
> BioPython mailing list - BioPython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>
>
The relevant document here is the IUPAC 'Nomenclature for Incompletely
Specified Bases in Nucleic Acid Sequences'
(http://www.chem.qmul.ac.uk/iubmb/misc/naseq.html).
Table 4 provides some examples such as showing the correspondence
between the triplet 'NNN' and the amino acid 'Unknown' with one letter
code 'X'. Consequently, here must also be a correspondence between
triplet 'TAN' (or any other related triplet for that matter) and the
amino acid 'X' because the triple 'NNN' must also include the stop
codons which have no corresponding amino acid (okay ignoring
selenocysteine etc.).
Bruce
More information about the Biopython
mailing list