[Biopython-dev] [Bug 2547] New: Translation of ambiguous codons like NNN and TAN
bugzilla-daemon at portal.open-bio.org
bugzilla-daemon at portal.open-bio.org
Sun Jul 20 10:46:23 EDT 2008
http://bugzilla.open-bio.org/show_bug.cgi?id=2547
Summary: Translation of ambiguous codons like NNN and TAN
Product: Biopython
Version: 1.47
Platform: PC
OS/Version: All
Status: NEW
Severity: enhancement
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk
It is often useful to want to translate ambiguous nucleotide sequences (e.g.
EST sequences), and this may contain codons which could code for an amino acid
OR a stop codon (e.g. NNN, TNN or TAN).
See for example Bug 2530 comment 6 and comment 9.
Currently Bio.Seq.translate() will not translate such sequences and raises an
exception.
The following example shows correct translation of ambiguous codons which only
encode valid amino acid(s) OR valid stop codons (but not both):
from Bio.Seq import translate
assert translate("TAA") == "*"
assert translate("TAG") == "*"
assert translate("TAT") == "Y"
assert translate("TAC") == "Y"
#Recall ambiguous nucleotide Y means T or C (pYrimidine)
#so TAY = TAT or TAC which both code for Y (Tyr, Tyrosine)
assert translate("TAY") == "Y"
#Recall ambigous nucleoide R means G or A (puRine)
#so TAR = TAG or TAA which both code for a stop codon
assert translate("TAR") == "*"
However, in Biopython 1.47 the following all raise an exception:
translate("TAN")
translate("TAM")
translate("TAK")
translate("TRR")
translate("TNN")
translate("NNN")
TAN, TAM, TAK, ... can code for Y or stop. More generally, "TRR" and "TNN" can
code multiple amino acids or a stop codon, and "NNN" can code for any amino
acid or a stop codon.
According to IUPAC, the single letter protein code X is an "unknown or 'other'
amino acid" (igoring its historic and obsolete usage for selenocysteine, now
U).
http://www.chem.qmul.ac.uk/iupac/AminoAcid/A2021.html
This document does NOT cover the idea of stop codons, and I am not aware of any
additional symbol to mean "any amino acid OR a stop codon" which would be ideal
for this situation.
For comparison, the EMBOSS transeq tool will use X when given a codon which
could be either an amino acid OR a stop codon:
$ transeq -filter asis:NNNTANTARTAGTAYTAC
XX**YY
Therefore one solution would be to follow EMBOSS and return X for codons which
could be an amino acid OR a stop codon.
See also Bug 2530 on the related issue that Bio.Seq.translate() currently
translates invalid codons as "*" (presumably an accidental side effect of the
implementation).
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
More information about the Biopython-dev
mailing list