[BioPython] [DETECTED AS SPAM] Re: back-translation method for Seq object?

Peter biopython at maubp.freeserve.co.uk
Wed Oct 22 09:17:23 UTC 2008


On Wed, Oct 22, 2008 at 9:31 AM, Leighton Pritchard <lpritc at scri.ac.uk> wrote:
> On 21/10/2008 21:36, "Bruce Southey" <bsouthey at gmail.com> wrote:
>
>> For completeness as these are not 100% correct,
>> Leu/L =(TTA|TTG|CTT|CTC|CTA|CTG) = (TTN|CTR) = YTN
>> Arg/R =(CGT|CGC|CGA|CGG|AGA|AGG) =(CGV | AGR) = MGV
>> Ser/S =(TCT|TCC|TCA|TCG|AGT|AGC) =(TCN|AGY) = WSN

I was going to jump up and down and disagree with you here Bruce, but
Leighton has already made the same point, (CGV | AGR) != MGV etc.

It is true that the ambiguous codon MGV would cover all the possible
Arg codons, but it includes more than that.  While this could be a
useful thing for certain back-translation reasons, it does break the
expectation that translate(back_translate(sequence)) == sequence
[currently the behaviour available in Bio.Translate].

>>> If we want to provide a simple string or Seq object, we can either
>>> pick an arbitrary codon in each case (as in the first attachment on
>>> Bug 2618), or perhaps represent some of the possible codons using
>>> ambiguous nucleotides.
>>> ...
>>> It was something like this that I envisioned as a candidate for a Seq
>>> method (based on the behaviour of the existing Bio.Translate
>>> functionality), but only if such a simple back_translate
>>> method/function had any real uses.  And thus far, I haven't seen any.
>>>
>> For you perhaps but my reasons are very real to me!

I was saying I don't see the need for a *simple* back_translate
function (giving a Seq object or a string), and that such a simple
function didn't seem to help with your examples.

I'm not denying that a complex back translation operation has real
utility (although I suspect there are multiple different solutions
which won't suit every problem - and makes justifying adding this to
the core Seq object hard to justify).  Perhaps a function in
Bio.SeqUtils to create a nucleotide regex describing possible back
translations from a protein sequence would suffice?

If one of your real-world examples can be solved with a back_translate
which returns a simple string or Seq object, could you clarify this.

Peter



More information about the Biopython mailing list