[BioPython] Trying to transcribe RNA = error?
Martin MOKREJŠ
mmokrejs at ribosome.natur.cuni.cz
Wed Jul 16 21:50:37 UTC 2008
Hi Peter,
thank you for nice explanation. Definitely pasting the text&code from
your answer into BioPython Wiki or Tutorial would be sufficient and
should be there.
Regarding the issue with transcribing coding strand ... probably
you have convinced me. ;-) I agree that one usually has annotated
sequence in GenBank format with annotated introns and wants to
concatenate some exons and get it translated. From this perspective
it does not make sense to un-necessarily reverse-complement the
sequence.
Martin
Peter wrote:
> I asked Martin:
>>> You would really prefer a single function call to do a combined
>>> "transcribe reverse-complement" or "transcribe the other strand" over
>>> doing the reverse complement and transcription in two steps?
>
> Martin wrote:
>> Yes, a single function handling arguments would be nice.
>> ...
>> This one handling optional argument strand with default value -1,
>> but accepting [-1, 1, +1, 'watson', 'crick'].
>
> No. The default would have to be the coding strand (aka strand +1, or
> the Crick strand). I know that biologically translation works on the
> template strand (aka strand -1, or the Watson strand), but that is not
> the convention in computational biology (at least in Biopython).
>
> In case we are talking at cross-purposes, consider this double
> stranded bit of DNA,
>
> 5' ATGGGAAAGGTAGGGCAAGCTTCCAGTTAG 3'
> 3' TACCCTTTCCATCCCGTTCGAAGGTCAATC 5'
>
> and associated mRNA,
> 5' AUGGGAAAGGUAGGGCAAGCUUCCAGUUAG 3'
>
> This DNA made up of two strands, which by convention are always read
> from the 5' end to the 3' end. i.e.
>
> DNA coding strand (aka Crick strand, strand +1):
> 5' ATGGGAAAGGTAGGGCAAGCTTCCAGTTAG 3'
>
> DNA template strand (aka Watson strand, strand -1):
> 5' CTAACTGGAAGCTTGCCCTACCTTTCCCAT 3'
>
> If this region were to be transcribed into mRNA and translated it would give:
>
> protein,
> MGKVGQASS (and stop)
>
> In Bioinformatics in general, people tend to work with the coding
> strand rather than the template strand. In Biopython using strings
> you could represent all this as follows:
>
> from Bio.Seq import reverse_complement, translate, transcribe, back_transcribe
>
> coding_strand_dna = "ATGGGAAAGGTAGGGCAAGCTTCCAGTTAG"
> template_strand_dna = reverse_complement(coding_strand_dna)
> assert template_strand_dna == "CTAACTGGAAGCTTGCCCTACCTTTCCCAT"
> assert coding_strand_dna == reverse_complement(template_strand_dna)
>
> #Note transcribe() goes from the CODING strand to the mRNA
> messenger_rna = transcribe(coding_strand_dna)
> assert coding_strand_dna == back_transcribe(messenger_rna)
>
> protein = translate(messenger_rna)
> #Note you can also use translate() directly on the CODING strand:
> assert protein == translate(coding_strand_dna)
> assert protein == translate(reverse_complement(template_strand_dna))
> assert protein == "MGKVGQASS*"
>
> If that makes things clearer, maybe I should put something like this
> in the Biopython tutorial... and/or the Bio.Seq docstring
> documentation? What do you think?
>
> Peter
>
>
--
Dr. Martin Mokrejs
Dept. of Genetics and Microbiology
Faculty of Science, Charles University
Vinicna 5, 128 43 Prague, Czech Republic
tel: +420-2-2195 1716
http://www.iresite.org
http://www.iresite.org/~mmokrejs
More information about the Biopython
mailing list