[BioPython] codons and complements

Andrew Dalke dalke@bioreason.com
Wed, 13 Oct 1999 11:34:16 -0600


Iddo Friedberg <idoerg@cc.huji.ac.il>:
> So: maybe just the simple "translate" (read past stop codons, use
> standard-or-passed-code-table, do _no_ check for modulu 3) should be a
> method.

I still have a problem with the "read past stop codons" since it
makes things less `alphabet safe', but as that seems to be the
desired interface, I'll move to have two translate functions, one
for each behaviour.  (Two functions rather than a boolean flag.)

Iddo:
> My proposal: translate is a method, special translates are user-written
> functions.

Jeff replied:
> Thus, having a translate method on a sequence object implies that
> the genetic code should be carried around as well.

My proposal does have that the genetic code is carried around as
well, but only if it is known.  If it isn't known, and you want
to translate and you don't give a code, then it uses the universal
code.

I think I understand the difference.  If translate() is a method
then you have the expectation that the sequence is translateable
and that the results will be appropriate.  If translate() is a
function (which takes a sequence) then the assumption of
appropriateness is weaker; it doesn't have to give the correct
result, meaning, the biologically appropriate output, it only
needs to give the computationally defined result.

So pushing the function from a method to a function makes it
easier to understand that it won't always give the right answer.
(And we've all decided that it cannot always give the right
answer!)

If that's what you are saying, then I think that's an interesting
viewpoint.  If it isn't, then, I still think it's an interesting
viewpoint :)

Anyway, I'm for having the translate function(s) be stand-alone/
not a method.

Oh, and as a case example, consider having one of these pieces
of DNA where UGA creates SEC in one point and a stop codon in
another.  If there is a translate() method, the special information
must be passed around when creating, eg, subsequences.

dna = get_selenocysteine_coding_dna()
sub_dna = dna[sec_start - 6: sec_start + 6]
protein = sub_dna.translate()

Then str(protein[2]) should be "X" (for IUPAC, or "U" for one of
the alphabets Jeff pointed out yesterday.)

It is entirely possible to pass this information around so that
the different subsequence translates will be correct.  I'll just
point out that it places a lot of onus on the implementor to be
very careful about the implementation.

So Jeff's statement, which is "if I don't care about translations
then I don't want to worry about them", is quite relevant.

BTW, there's another possibility for the standard "translate" function,
which is to check if a special method exists on the sequence, and
if so, use it.

def translate(seq, codon_table = None):
  func = getattr(seq, "_translate", None)
  if func is not None:
    return seq._translate(codon_table)
  if codon_table is None:
    codon_table = getattr(seq, "codon_table", UniversalCode)
  ...

This is similar to how Python implements "len" and "str" .

Hence, if it is needed that there be a single function which can
handle the different cases needed for translate, then there is a
more generic solution which is backward compatible with the function
solution.

But doing that now is, in my opinion, too general.  We need to
start writing some code that implements these to see their advantages
and pitfalls.


						Andrew Dalke
						dalke@acm.org