[Biopython] unusual genetic code

Tue Sep 14 14:58:05 UTC 2010

On Tue, Sep 14, 2010 at 2:44 PM, Jessica Grant <jgrant at smith.edu> wrote:
> Hi Peter,
>
> Here is the codon table, in the format I found in CodonTable.py.
>
> I will look at the links you sent, but I don't know if I will be able to
> follow it all.  Thanks,
>
> Jessica
>
>                    table = {
>     'TTT': 'F', 'TTC': 'F', 'TTA': 'L', 'TTG': 'L', 'TCT': 'S',
>     'TCC': 'S', 'TCA': 'S', 'TCG': 'S', 'TAT': 'Y', 'TAC': 'Y',
>     'TGT': 'C', 'TGC': 'C', 'TGG': 'W', 'CTT': 'L', 'CTC': 'L',
>     'CTA': 'L', 'CTG': 'L', 'CCT': 'P', 'CCC': 'P', 'CCA': 'P',
>     'CCG': 'P', 'CAT': 'H', 'CAC': 'H', 'CAA': 'Q', 'CAG': 'Q',
>     'CGT': 'R', 'CGC': 'R', 'CGA': 'R', 'CGG': 'R', 'ATT': 'I',
>     'ATC': 'I', 'ATA': 'I', 'ATG': 'M', 'ACT': 'T', 'ACC': 'T',
>     'ACA': 'T', 'ACG': 'T', 'AAT': 'N', 'AAC': 'N', 'AAA': 'K',
>     'AAG': 'K', 'AGT': 'S', 'AGC': 'S', 'AGA': 'R', 'AGG': 'R',
>     'GTT': 'V', 'GTC': 'V', 'GTA': 'V', 'GTG': 'V', 'GCT': 'A',
>     'GCC': 'A', 'GCA': 'A', 'GCG': 'A', 'GAT': 'D', 'GAC': 'D',
>     'GAA': 'E', 'GAG': 'E', 'GGT': 'G', 'GGC': 'G', 'GGA': 'G',
>     'GGG': 'G', 'TAG': 'Q', 'TGA': 'W',},
>                    stop_codons = ['TAA' ],
>                    start_codons = [ 'ATG']
>                    )

OK, don't worry about the git branch stuff - I've just merged
this to the main repository. Are you happy with installing
Biopython from source? If so grab the latest source code
as described here:

http://www.biopython.org/wiki/SourceCode

Alternatively all you need to update is the Bio/Seq.py file
to the latest version:

http://github.com/biopython/biopython/raw/master/Bio/Seq.py

To use the new functionality, first you need to create a
CodonData object with your special table, and assuming
you are just working with unambiguous DNA that means:

from Bio.Data.CodonTable import CodonTable
c_uncinata_table = CodonTable(forward_table={
    'TTT': 'F', 'TTC': 'F', 'TTA': 'L', 'TTG': 'L',
    'TCT': 'S', 'TCC': 'S', 'TCA': 'S', 'TCG': 'S',
    'TAT': 'Y', 'TAC': 'Y',             'TAG': 'Q',
    'TGT': 'C', 'TGC': 'C', 'TGA': 'W', 'TGG': 'W',
    'CTT': 'L', 'CTC': 'L', 'CTA': 'L', 'CTG': 'L',
    'CCT': 'P', 'CCC': 'P', 'CCA': 'P', 'CCG': 'P',
    'CAT': 'H', 'CAC': 'H', 'CAA': 'Q', 'CAG': 'Q',
    'CGT': 'R', 'CGC': 'R', 'CGA': 'R', 'CGG': 'R',
    'ATT': 'I', 'ATC': 'I', 'ATA': 'I', 'ATG': 'M',
    'ACT': 'T', 'ACC': 'T', 'ACA': 'T', 'ACG': 'T',
    'AAT': 'N', 'AAC': 'N', 'AAA': 'K', 'AAG': 'K',
    'AGT': 'S', 'AGC': 'S', 'AGA': 'R', 'AGG': 'R',
    'GTT': 'V', 'GTC': 'V', 'GTA': 'V', 'GTG': 'V',
    'GCT': 'A', 'GCC': 'A', 'GCA': 'A', 'GCG': 'A',
    'GAT': 'D', 'GAC': 'D', 'GAA': 'E', 'GAG': 'E',
    'GGT': 'G', 'GGC': 'G', 'GGA': 'G', 'GGG': 'G'},
    start_codons = [ 'ATG'],
    stop_codons = ['TAA' ])

Note that order of the forward table dictionary entries
does not actually matter, however, I have moved the
TAG and TGA entries from the end to keep the whole
table in a standard order - I found this easier to check.

If you have the updated Bio.Seq module, then you
can do this:

>>> from Bio.Alphabet import generic_dna
>>> from Bio.Seq import Seq
>>> seq =  Seq("AAATAGTGATAA", generic_dna)
>>> print seq.translate()
K***
>>> print seq.translate(table=c_uncinata_table)
KQW*

Or using strings,

>>> from Bio.Seq import translate
>>> print translate("AAATAGTGATAA")
K***
>>> print translate("AAATAGTGATAA", table=c_uncinata_table)
KQW*

Does that make sense? Does it do what you expect?
Don't hesitate to ask for clarification.

Peter