[Biopython-dev] reduced alphabets
Iddo Friedberg
idoerg at burnham.org
Fri Mar 12 16:45:57 EST 2004
Hi all,
I am thinking of incorporating reduced alphabets into biopython. Reduced
(or redundant) alphabets are used to represent protein sequences using
an alternative alphabet which lumps together several amino-acids into
one letter, based on physico-chemical traits. For example, all the
aliphatics (I,L,V) are usually quite interchangeable, so many sequence
studies lump them into one letter. We don't have that, do we? This can
also be applied to DNA, although I only heard of a 4->2 reduction (to
purines & pyrimidines), and it is usually less useful.
You can see examples of reduced alphabets here:
http://viscose.ifg.uni-muenster.de/html/alphabets.html
I was thinking of making additions in two places:
1) in util.py I will add a function "reduce_sequence":
def reduce_sequence(seq, reduction_table,new_alphabet=None):
""" given an amino-acid sequence, return it in reduced alphabet form
based on the
letter-translation table passed
seq: a Seq.Seq type sequence
reduction_table: a dictionary whose
keys are the "from" alphabet, and values
are the "to" alphabet"""
if new_alphabet is None:
new_alphabet = Alphabet.single_letter_alphabet
new_alphabet.letters = ''
for letter in reduction_table:
new_alphabet.letters += letter
new_alphabet.size = len(new_alphabet.letters)
new_seq = Seq.Seq('',new_alphabet)
for letter in seq:
new_seq += aa_table[letter]
return new_seq
******************
2) In Bio.Alphabets I will
2.1) add a module some dictionaries mapping the 20 and 23 aa alphabet
"brand name" reduced alphabets,
2.2) Add another module, along the lines of IUPAC.py with the brand name
alphabets as instances of SingleLetterAlphabet
Comments, suggestions?
Thanks,
./I
--
Iddo Friedberg, Ph.D.
The Burnham Institute
10901 N. Torrey Pines Rd.
La Jolla, CA 92037
USA
Tel: +1 (858) 646 3100 x3516
Fax: +1 (858) 713 9930
http://ffas.ljcrf.edu/~iddo
More information about the Biopython-dev
mailing list