[Biopython-dev] New Biopython release coming up / Alphabets

Tue Jul 11 16:01:15 UTC 2006

On Jul 6, 2006, at 12:39 PM, Michiel Jan Laurens de Hoon wrote:

> Michael Hoffman wrote:
>> [Peter]
>>> But to be honest, I have generally used plain strings in my own
>>> programs, and meddled with alphabets only when needed (e.g. for
>>> translating from DNA to protein sequences).
>
> Note that there is a function "translate" in Bio.Seq that  
> translates DNA
> to protein using plain strings.
>>
>> I agree. In general, I think that the alphabet stuff adds unnecessary
>> complexity to perhaps 95 % of the sort of things I would do with
>> Biopython. But as it stands I usually use strs myself instead.
>
> It appears that most people (myself included) use plain strings  
> instead
> of Seq objects (= string + Alphabet). We should check on the biopython
> mailing list if anybody really needs alphabets, and if not get rid of
> them (after the upcoming Brooklyn-release (1.42) though).
>
> --Michiel.

I am strongly arguing  against removing the alphabets. You would loss  
all of the cool features of Seq Objects (complement,  
reverse_complement).  There are similar functions under Bio.SeqUtils  
but those are "Deprecated". From just looking around, I think this  
would break many things.

Having said that, I do find them a pain to deal with, but that might  
have more to do with the structure/layout of the classes. My simple  
suggestion is to fix/change the base Alphabet classes in  
Bio.Alphabet.__init__. I am trying to think of a way that we can have  
a "true" GenericAlphabet class (not generic_alphabet = Alphabet() )  
and using just strings. The problem is, is that I don't know if just  
using letters = None (or letters = []) will cause problems down the  
road (things like if x in aplabet.letters is used in many classes).

Also, I'm really confused as to what is going on in IUPAC.py with the  
default_manager stuff and _bootstrap.

Marc