[BioPython] Alphabets

Mon, 11 Jun 2001 12:34:58 -0600

I'm at a client site right now so this will be brief.

Iddo Friedberg:
>I came across a problem when trying to do something with alphabets.

>I would like to check whether a given alignment's alphabet is a protein or 
>a nucleic acid.

>This, quite rightly, raises an exception when doing the following:
>
>gapped_prot = Alphabet.Gapped(IUPAC.IUPACProtein(),'-') 
>gapped_prot.contains(IUPAC.IUPACProtein())

Can you give some example of why you want to check for the alphabet?
I still don't have enough experience working with gaps and alignments.
My assumption was that an alignment program would either receive
sequences with a gapped alphabet already, or (since that makes things
more complicated) convert to wrapped form if needed.

def align(seq1, seq1):
    if not isinstance(seq1.alphabet, Alphabet.Gapped):
        seq1 = Seq(seq1, Alphabet.Gapped(seq1.alphabet, "-"))
        # (or however is the right way to create a new sequence
        # with the same letters but different alphabet; don't recall)
    ... same with seq2 ...

Then at this point you can check if seq1's (which is a Gapped Seq
object) underlying sequence's alphabet is-a Protein.  Though I've
found that too much friendlyness like this may lead to complications
later on.

Still, I don't follow why this is needed.  Why do you need to
distinguish between protein and nucleic acids?  Shouldn't there be
a way to align proteins also using 3-letter codes?  Or even
non-biological sequences (like words of english text)?

So for me to better understand this, I would like to see some
proposed examples of use.

                            Andrew
                            dalke@acm.org

_________________________________________________________________
Get your FREE download of MSN Explorer at http://explorer.msn.com