[Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq)

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Sat Oct 11 12:37:49 UTC 2008


http://bugzilla.open-bio.org/show_bug.cgi?id=2381





------- Comment #17 from biopython-bugzilla at maubp.freeserve.co.uk  2008-10-11 08:37 EST -------
For the sake of discussion, here is a simple (i.e. minimal) translate method
for the Seq object (any checked in code should also simplify the current Seq
module's translate function to call this for Seq objects).

def translate(self, table = "Standard", stop_symbol = "*"):
    """Terms a nucleotide sequence into a protein sequence (amino acids).

    This method will translate DNA or RNA sequences, but for a protein
    sequence an exception is raised.

    table - Which codon table to use?  This can be either a name
            (string) or an NCBI identifier (integer).

    NOTE - Ambiguous codons like "TAN" or "NNN" could be an amino acid
    or a stop codon.  These are translated as "X".  Any invalid codon
    (e.g. "TA?" or "T-A") will throw a TranslationError.

    NOTE - Does NOT support gapped sequences.

    NOTE - This does NOT behave like the python string's translate
    method.  For that use str(my_seq).translate(...) instead.
    """
    try:
        table_id = int(table)
    except ValueError:
        table_id = None
    if isinstance(self.alphabet, Alphabet.ProteinAlphabet) :
        raise ValueError, "Proteins cannot be translated!"
    if self.alphabet==IUPAC.unambiguous_dna:
        if table_id is None:
            codon_table = CodonTable.unambiguous_dna_by_name[table]
        else:
            codon_table = CodonTable.unambiguous_dna_by_id[table_id]
    elif self.alphabet==IUPAC.ambiguous_dna:
        if table_id is None:
            codon_table = CodonTable.ambiguous_dna_by_name[table]
        else:
            codon_table = CodonTable.ambiguous_dna_by_id[table_id]
    elif self.alphabet==IUPAC.unambiguous_rna:
        if table_id is None:
            codon_table = CodonTable.unambiguous_rna_by_name[table]
        else:
            codon_table = CodonTable.unambiguous_rna_by_id[table_id]
    elif self.alphabet==IUPAC.ambiguous_rna:
        if table_id is None:
            codon_table = CodonTable.ambiguous_rna_by_name[table]
        else:
            codon_table = CodonTable.ambiguous_rna_by_id[table_id]
    else:
        if table_id is None:
            codon_table = CodonTable.ambiguous_generic_by_name[table]
        else:
            codon_table = CodonTable.ambiguous_generic_by_id[table_id]
    protein = _translate_str(str(self), codon_table, stop_symbol)
    if stop_symbol in protein :
        alphabet = Alphabet.HasStopCodon(codon_table.protein_alphabet,
                                         stop_symbol = stop_symbol)
    else :
        alphabet = codon_table.protein_alphabet
    return Seq(protein, alphabet)

Unlike my earlier comment 11, I'm now leaning to a single trnaslation method
(perhaps with extra arguments).  You'll notice here I am suggesting using the
method name "translate" even though this clashes with the python string method
of the same name.  This could cause confusion if the Seq object is passed to
non-Biopython code which expects a string, but overall seems much simpler for
end users.  Other method names could be:

* translate_ (trailing underscore, see PEP8) which I think is ugly.
* translation (noun rather than verb), differs from established style.
* bio_translate which is I think too long.

I'm thinking we could also support "start" and "end" optional arguments (named
after those used in the python string methods, and behaving in the same way)
for specifying a sub-sequence to be translated.  Using start=0, 1 or 2 would
give the three forward reading frames.

An optional boolean argument could enable treating the sequence as a CDS -
verifying it starts with a start codon (which would always be translated as M)
and verifying it ends with a stop codon (with no other stop codons in frame),
which would not be translated.  Following BioPerl, this argument could be
called "complete".


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.



More information about the Biopython-dev mailing list