[BioPython] Trying to transcribe RNA = error?

Peter biopython at maubp.freeserve.co.uk
Wed Jul 16 14:37:43 UTC 2008


I asked Martin:
>> You would really prefer a single function call to do a combined
>> "transcribe reverse-complement" or "transcribe the other strand" over
>> doing the reverse complement and transcription in two steps?

Martin wrote:
> Yes, a single function handling arguments would be nice.
> ...
> This one handling optional argument strand with default value -1,
> but accepting [-1, 1, +1, 'watson', 'crick'].

No.  The default would have to be the coding strand (aka strand +1, or
the Crick strand).  I know that biologically translation works on the
template strand (aka strand -1, or the Watson strand), but that is not
the convention in computational biology (at least in Biopython).

In case we are talking at cross-purposes, consider this double
stranded bit of DNA,

5' ATGGGAAAGGTAGGGCAAGCTTCCAGTTAG 3'
3' TACCCTTTCCATCCCGTTCGAAGGTCAATC 5'

and associated mRNA,
5' AUGGGAAAGGUAGGGCAAGCUUCCAGUUAG 3'

This DNA made up of two strands, which by convention are always read
from the 5' end to the 3' end.  i.e.

DNA coding strand (aka Crick strand, strand +1):
5' ATGGGAAAGGTAGGGCAAGCTTCCAGTTAG 3'

DNA template strand (aka Watson strand, strand -1):
5' CTAACTGGAAGCTTGCCCTACCTTTCCCAT 3'

If this region were to be transcribed into mRNA and translated it would give:

protein,
MGKVGQASS (and stop)

In Bioinformatics in general, people tend to work with the coding
strand rather than the template strand.  In Biopython using strings
you could represent all this as follows:

from Bio.Seq import reverse_complement, translate, transcribe, back_transcribe

coding_strand_dna = "ATGGGAAAGGTAGGGCAAGCTTCCAGTTAG"
template_strand_dna = reverse_complement(coding_strand_dna)
assert template_strand_dna == "CTAACTGGAAGCTTGCCCTACCTTTCCCAT"
assert coding_strand_dna == reverse_complement(template_strand_dna)

#Note transcribe() goes from the CODING strand to the mRNA
messenger_rna = transcribe(coding_strand_dna)
assert coding_strand_dna == back_transcribe(messenger_rna)

protein = translate(messenger_rna)
#Note you can also use translate() directly on the CODING strand:
assert protein == translate(coding_strand_dna)
assert protein == translate(reverse_complement(template_strand_dna))
assert protein == "MGKVGQASS*"

If that makes things clearer, maybe I should put something like this
in the Biopython tutorial... and/or the Bio.Seq docstring
documentation?  What do you think?

Peter



More information about the Biopython mailing list