[Biopython-dev] reduced alphabets

Iddo Friedberg idoerg at burnham.org
Fri Mar 12 16:45:57 EST 2004

Hi all,

I am thinking of incorporating reduced alphabets into biopython. Reduced 
(or redundant) alphabets are used to represent protein sequences using 
an alternative alphabet which lumps together several amino-acids into 
one letter, based on physico-chemical traits. For example, all the 
aliphatics (I,L,V) are usually quite interchangeable, so many sequence 
studies lump them into one letter. We don't have that, do we? This can 
also be applied to DNA, although I only heard of a 4->2 reduction (to 
purines & pyrimidines), and it is usually less useful.

You can see examples of reduced alphabets here:


I was thinking of making additions in two places:

1) in util.py I will add a function "reduce_sequence":

def reduce_sequence(seq, reduction_table,new_alphabet=None):
    """ given an amino-acid sequence, return it in reduced alphabet form 
based on the
letter-translation table passed
        seq: a Seq.Seq type sequence
        reduction_table: a dictionary whose
        keys are the "from" alphabet, and values
        are the "to" alphabet"""

    if new_alphabet is None:
       new_alphabet = Alphabet.single_letter_alphabet
       new_alphabet.letters = ''
       for letter in reduction_table:
          new_alphabet.letters += letter
       new_alphabet.size = len(new_alphabet.letters)
    new_seq = Seq.Seq('',new_alphabet)
    for letter in seq:
       new_seq += aa_table[letter]
    return new_seq


2) In Bio.Alphabets I will
2.1) add a module some dictionaries mapping the 20 and 23 aa alphabet 
"brand name" reduced alphabets,
2.2) Add another module, along the lines of IUPAC.py with the brand name 
alphabets as instances of SingleLetterAlphabet

Comments, suggestions?



Iddo Friedberg, Ph.D.
The Burnham Institute
10901 N. Torrey Pines Rd.
La Jolla, CA 92037
Tel: +1 (858) 646 3100 x3516
Fax: +1 (858) 713 9930

More information about the Biopython-dev mailing list