[BioPython] Trying to transcribe RNA = error?

Tue Jul 15 23:32:48 UTC 2008

Hi,

Peter wrote:
> On Tue, Jul 15, 2008 at 11:53 PM, Martin MOKREJŠ
> <mmokrejs at ribosome.natur.cuni.cz> wrote:
>> I agree but it should require whether a plus or minus strand will be
>> transcribed. Otherwise, zapping 'U's with 'T's is enough. ;-) Still
>> somewhat artificial function.
> 
> By convention a DNA sequence is taken as the coding strand, so a
> simple U to T is enough to transcribe to RNA.  This is somewhat
> artificial (and doesn't really capture double strandedness), but does
> seem to be a widley used convention.

I can't say anything else except that this is extremely awkward function.
I would always rather take it as a string and replace 'U's with 'T's
myself. Then I know what is going on and do not have to read the manual
and can be sure this will never change.
> 
> You could do my_seq.reverse_complement().transcribe() if you did want
> the other strand transcribed.

Exactly, to get rid of this I would see some advantage to have
transcribe() which is more intuitive but only if it would be able to cope
with reverse strand as well.

>> What would be really meaningful if the object would be a genbank record
>> with introns and transcribe() would rip out all the introns or even
>> better allow me to choose which-ones to splice-out. Like to retain
>> exons 1, 5, 7.
> 
> I was thinking it would be nice to take a SeqRecord with SeqFeatures
> (i.e. a parsed GenBank file) and ask Biopython to give you the
> nucleotide sequence for the feature.  In the case of intron/exons
> these are held using the subfeatures, so it would make sense to obey
> those.  I see this as a variation on splicing/subsetting (in python
> speak) a sequence.

Personally I find this the only reason to justify existence of such
a function. It it can abstract me from the underlying format and give
me a true, spliced transcript. Of course I would rather prefer to raise
an exception if there would be more than one intron and I would have
not specified which intron index-positions to use.

> 
> Maybe something like my_record[my_feature] or
> my_record.extract_feature(my_feature)
> Or my_record.extract_feature(i) which would refer to my_record.features[i]

I am not much used to biopython coding style so cannot comment on this.
Again, I would never use a unknown function prone to be changed unless
I cannot code the simple transformation myself. If it cannot handle
splicing I don't need it I think. But maybe I am just a desperate user.
;-)

M.