[Biopython-dev] Rethinking Seq objects

Fri Apr 29 01:15:28 EDT 2005

Michael Hoffman wrote:
 > On Wed, 27 Apr 2005, Michiel Jan Laurens de Hoon wrote:
>> Another option would be to get rid of alphabets altogether. What good 
>> are they otherwise?
> 
> They're useful for transcription/translation/reverse complement
> operations. And as far as I'm concerned, that's a good place to do
> error checking, should it be necessary.
> 

For transcription and translation, we don't need to know the alphabet. 
Effectively, by calling translate or transcribe, the user is telling us that the 
input sequence object is DNA or RNA, and that the output sequence is RNA (for 
transcription) or protein (for translation). Of course, when a character other 
than ACGTU is encountered, we need to raise an error. But the point is that 
knowing the Alphabet doesn't tell us anything we don't already know.

For reverse complement, we also don't need to know the alphabet; it is either 
DNA or RNA. The only exception is when a user wants to reverse complement a 
sequence that does not contain a T or a U. But the current situation, where we 
have IUPACProtein, ExtendedIUPACProtein, IUPACAmbiguousDNA, IUPACUnambiguousDNA, 
ExtendedIUPACDNA, IUPACAmbiguousRNA, IUPACUnambiguousRNA alphabets, is an 
overkill. It would be much easier to have a reverse_complement and a 
rna_reverse_complement function (or something like that).

So I still don't see any use for alphabets other than input checking. Or am I 
missing something here?

--Michiel.

-- 
Michiel de Hoon, Assistant Professor
University of Tokyo, Institute of Medical Science
Human Genome Center
4-6-1 Shirokane-dai, Minato-ku
Tokyo 108-8639
Japan
http://bonsai.ims.u-tokyo.ac.jp/~mdehoon