[Biopython] unusual genetic code
Peter Cock
p.j.a.cock at googlemail.com
Tue Sep 14 10:58:05 EDT 2010
On Tue, Sep 14, 2010 at 2:44 PM, Jessica Grant <jgrant at smith.edu> wrote:
> Hi Peter,
>
> Here is the codon table, in the format I found in CodonTable.py.
>
> I will look at the links you sent, but I don't know if I will be able to
> follow it all. Thanks,
>
> Jessica
>
> table = {
> 'TTT': 'F', 'TTC': 'F', 'TTA': 'L', 'TTG': 'L', 'TCT': 'S',
> 'TCC': 'S', 'TCA': 'S', 'TCG': 'S', 'TAT': 'Y', 'TAC': 'Y',
> 'TGT': 'C', 'TGC': 'C', 'TGG': 'W', 'CTT': 'L', 'CTC': 'L',
> 'CTA': 'L', 'CTG': 'L', 'CCT': 'P', 'CCC': 'P', 'CCA': 'P',
> 'CCG': 'P', 'CAT': 'H', 'CAC': 'H', 'CAA': 'Q', 'CAG': 'Q',
> 'CGT': 'R', 'CGC': 'R', 'CGA': 'R', 'CGG': 'R', 'ATT': 'I',
> 'ATC': 'I', 'ATA': 'I', 'ATG': 'M', 'ACT': 'T', 'ACC': 'T',
> 'ACA': 'T', 'ACG': 'T', 'AAT': 'N', 'AAC': 'N', 'AAA': 'K',
> 'AAG': 'K', 'AGT': 'S', 'AGC': 'S', 'AGA': 'R', 'AGG': 'R',
> 'GTT': 'V', 'GTC': 'V', 'GTA': 'V', 'GTG': 'V', 'GCT': 'A',
> 'GCC': 'A', 'GCA': 'A', 'GCG': 'A', 'GAT': 'D', 'GAC': 'D',
> 'GAA': 'E', 'GAG': 'E', 'GGT': 'G', 'GGC': 'G', 'GGA': 'G',
> 'GGG': 'G', 'TAG': 'Q', 'TGA': 'W',},
> stop_codons = ['TAA' ],
> start_codons = [ 'ATG']
> )
OK, don't worry about the git branch stuff - I've just merged
this to the main repository. Are you happy with installing
Biopython from source? If so grab the latest source code
as described here:
http://www.biopython.org/wiki/SourceCode
Alternatively all you need to update is the Bio/Seq.py file
to the latest version:
http://github.com/biopython/biopython/raw/master/Bio/Seq.py
To use the new functionality, first you need to create a
CodonData object with your special table, and assuming
you are just working with unambiguous DNA that means:
from Bio.Data.CodonTable import CodonTable
c_uncinata_table = CodonTable(forward_table={
'TTT': 'F', 'TTC': 'F', 'TTA': 'L', 'TTG': 'L',
'TCT': 'S', 'TCC': 'S', 'TCA': 'S', 'TCG': 'S',
'TAT': 'Y', 'TAC': 'Y', 'TAG': 'Q',
'TGT': 'C', 'TGC': 'C', 'TGA': 'W', 'TGG': 'W',
'CTT': 'L', 'CTC': 'L', 'CTA': 'L', 'CTG': 'L',
'CCT': 'P', 'CCC': 'P', 'CCA': 'P', 'CCG': 'P',
'CAT': 'H', 'CAC': 'H', 'CAA': 'Q', 'CAG': 'Q',
'CGT': 'R', 'CGC': 'R', 'CGA': 'R', 'CGG': 'R',
'ATT': 'I', 'ATC': 'I', 'ATA': 'I', 'ATG': 'M',
'ACT': 'T', 'ACC': 'T', 'ACA': 'T', 'ACG': 'T',
'AAT': 'N', 'AAC': 'N', 'AAA': 'K', 'AAG': 'K',
'AGT': 'S', 'AGC': 'S', 'AGA': 'R', 'AGG': 'R',
'GTT': 'V', 'GTC': 'V', 'GTA': 'V', 'GTG': 'V',
'GCT': 'A', 'GCC': 'A', 'GCA': 'A', 'GCG': 'A',
'GAT': 'D', 'GAC': 'D', 'GAA': 'E', 'GAG': 'E',
'GGT': 'G', 'GGC': 'G', 'GGA': 'G', 'GGG': 'G'},
start_codons = [ 'ATG'],
stop_codons = ['TAA' ])
Note that order of the forward table dictionary entries
does not actually matter, however, I have moved the
TAG and TGA entries from the end to keep the whole
table in a standard order - I found this easier to check.
If you have the updated Bio.Seq module, then you
can do this:
>>> from Bio.Alphabet import generic_dna
>>> from Bio.Seq import Seq
>>> seq = Seq("AAATAGTGATAA", generic_dna)
>>> print seq.translate()
K***
>>> print seq.translate(table=c_uncinata_table)
KQW*
Or using strings,
>>> from Bio.Seq import translate
>>> print translate("AAATAGTGATAA")
K***
>>> print translate("AAATAGTGATAA", table=c_uncinata_table)
KQW*
Does that make sense? Does it do what you expect?
Don't hesitate to ask for clarification.
Peter
More information about the Biopython
mailing list