[BioPython] Alphabets
Andrew Dalke
andrew_dalke@hotmail.com
Mon, 11 Jun 2001 12:34:58 -0600
I'm at a client site right now so this will be brief.
Iddo Friedberg:
>I came across a problem when trying to do something with alphabets.
>I would like to check whether a given alignment's alphabet is a protein or
>a nucleic acid.
>This, quite rightly, raises an exception when doing the following:
>
>gapped_prot = Alphabet.Gapped(IUPAC.IUPACProtein(),'-')
>gapped_prot.contains(IUPAC.IUPACProtein())
Can you give some example of why you want to check for the alphabet?
I still don't have enough experience working with gaps and alignments.
My assumption was that an alignment program would either receive
sequences with a gapped alphabet already, or (since that makes things
more complicated) convert to wrapped form if needed.
def align(seq1, seq1):
if not isinstance(seq1.alphabet, Alphabet.Gapped):
seq1 = Seq(seq1, Alphabet.Gapped(seq1.alphabet, "-"))
# (or however is the right way to create a new sequence
# with the same letters but different alphabet; don't recall)
... same with seq2 ...
Then at this point you can check if seq1's (which is a Gapped Seq
object) underlying sequence's alphabet is-a Protein. Though I've
found that too much friendlyness like this may lead to complications
later on.
Still, I don't follow why this is needed. Why do you need to
distinguish between protein and nucleic acids? Shouldn't there be
a way to align proteins also using 3-letter codes? Or even
non-biological sequences (like words of english text)?
So for me to better understand this, I would like to see some
proposed examples of use.
Andrew
dalke@acm.org
_________________________________________________________________
Get your FREE download of MSN Explorer at http://explorer.msn.com