[BioPython] Alphabets

Iddo Friedberg idoerg@cc.huji.ac.il
Tue, 12 Jun 2001 12:27:55 +0300 (GMT+0300)


Hi Andrew,

Well, I'm a bit wiser now, and yes, probably isinstance is the best way of
going about it. I still have a couple of reservations about the whole way
of typing alphabets, but I can think of no better way, for now.

On Mon, 11 Jun 2001, Andrew Dalke wrote:

: Can you give some example of why you want to check for the alphabet?


Yes. In my case this is for determining the information content of a
multiple alignment. In order to determine random expected frequencies for
letters, I need to know the alphabet type. For a 20 letter alphabet
(standard protein) that would be 0.05. But the same goes for  a 23 letter
alphabet, assuming that instances of B, Z and X are rare enough to ignore.
(So using alphabet size is no good here).

Andrew:
: Still, I don't follow why this is needed.Why do you need to
: distinguish between protein and nucleic acids?Shouldn't there be
: a way to align proteins also using 3-letter codes?Or even
: non-biological sequences (like words of english text)?

True. In case of a multiple alignment of sequences composed of a redundant
protein alphabet, (or any other "special"  alphabet) users have to provide
their own expected frequency table. AlignInfo makes provisions for that.
Still, typechecking is a good thing, even if it just serves the purpose of
an application raising an exception due to bad input (a mix of DNA and
proteins sequences). On the other hand, there might be good input which
needs to be handled in a special way. (Say , a codon, 64 letter alphabet,
which is actually DNA, a 4-letter protein alphabet, a representation of
the protein in terms of secondary structure, and so on).


I'll just finish with my opening remark: seems like the current approach
is best, something which I learned while experimenting in the past several
hours. So, for now, unless someone can shed any new light on this, I'm
leaving the while alphabet typing question as is.

Thanks Andrew,

Iddo

--

Iddo Friedberg                                  | Tel: +972-2-6758647
Dept. of Molecular Genetics and Biotechnology   | Fax: +972-2-6757308
The Hebrew University - Hadassah Medical School | email: idoerg@cc.huji.ac.il
POB 12272, Jerusalem 91120                      |
Israel                                          |
http://bioinfo.md.huji.ac.il/marg/people-home/iddo/