[Biopython-dev] Rethinking Seq objects
Michiel Jan Laurens de Hoon
mdehoon at ims.u-tokyo.ac.jp
Fri Apr 29 01:15:28 EDT 2005
Michael Hoffman wrote:
> On Wed, 27 Apr 2005, Michiel Jan Laurens de Hoon wrote:
>> Another option would be to get rid of alphabets altogether. What good
>> are they otherwise?
>
> They're useful for transcription/translation/reverse complement
> operations. And as far as I'm concerned, that's a good place to do
> error checking, should it be necessary.
>
For transcription and translation, we don't need to know the alphabet.
Effectively, by calling translate or transcribe, the user is telling us that the
input sequence object is DNA or RNA, and that the output sequence is RNA (for
transcription) or protein (for translation). Of course, when a character other
than ACGTU is encountered, we need to raise an error. But the point is that
knowing the Alphabet doesn't tell us anything we don't already know.
For reverse complement, we also don't need to know the alphabet; it is either
DNA or RNA. The only exception is when a user wants to reverse complement a
sequence that does not contain a T or a U. But the current situation, where we
have IUPACProtein, ExtendedIUPACProtein, IUPACAmbiguousDNA, IUPACUnambiguousDNA,
ExtendedIUPACDNA, IUPACAmbiguousRNA, IUPACUnambiguousRNA alphabets, is an
overkill. It would be much easier to have a reverse_complement and a
rna_reverse_complement function (or something like that).
So I still don't see any use for alphabets other than input checking. Or am I
missing something here?
--Michiel.
--
Michiel de Hoon, Assistant Professor
University of Tokyo, Institute of Medical Science
Human Genome Center
4-6-1 Shirokane-dai, Minato-ku
Tokyo 108-8639
Japan
http://bonsai.ims.u-tokyo.ac.jp/~mdehoon
More information about the Biopython-dev
mailing list