[BioPython] seq interface

Andrew Dalke dalke@acm.org
Wed, 05 Apr 2000 05:58:05 -0600


BTW, all that text I wrote boils down to a rather
simple class definition.  Here's the Seq object I'm
working with.  The required API methods/attributes are
so marked:


class Seq:
    def __init__(self, data, alphabet = Alphabet.generic_alphabet):
        # Enforce string storage
        assert type(data) == type("")  # must use a string
        
        self.data = data                           # Seq API
        self.alphabet = alphabet                   # Seq API
    def __repr__(self):
        return "%s(%s, %s)" % (self.__class__.__name__,
                               repr(self.data),
                               repr(self.alphabet))
    def __str__(self):
        if len(self.data) > 60:
            s = repr(self.data[:60] + " ...")
        else:
            s = repr(self.data)
        return "%s(%s, %s)" % (self.__class__.__name__, s,
                               repr(self.alphabet))
    # I don't think I like this method...
##    def __cmp__(self, other):
##        if isinstance(other, Seq):
##            return cmp(self.data, other.data)
##        else:
##            return cmp(self.data, other)
    def __len__(self): return len(self.data)       # Seq API
    def __getitem__(self, i): return self.data[i]  # Seq API
    def __getslice__(self, i, j):                  # Seq API
        i = max(i, 0); j = max(j, 0)
        return Seq(self.data[i:j], self.alphabet)

    def tostring(self):                            # Seq API
        return self.data

    def tomutable(self):   # Needed?  Or use a function?
        return MutableSeq(self.data, self.alphabet)

I haven't really described the "Alphabet" part, though...

Also, here's a prosite to re pattern converter in 6 lines
of code!

import string
_prosite_trans = string.maketrans(
      "abcdefghijklmnopqrstuvwxyzX}()<>",
      "ABCDEFGHIJKLMNOPQRSTUVW.YZ.]{}^$")

def prosite_to_re(pattern):
    """convert a valid Prosite pattern into an re string"""
    s = string.replace(pattern, "{", "[^")
    return string.translate(s, _prosite_trans, "-.")

(It assumes the input code is syntactically correct.
I've been working on a strict checker as well, but the
regular expression for the pattern is:

prosite_re = re.compile(r"""
^<?                   # starts with an optional "<"
(
  [A-Zx]|             # a character OR
  \[[A-Z]+\]|         # something in []s OR
  \{[A-Z]+\}          # something in {}s
)(\(\d+(,\d+)?\))?    # optional count of the form "(i,j)" (",j" is
optional)
(-                    # new terms seperated by a '-'
 (
  [A-Zx]|             # a character OR
  \[[A-Z]+\]|         # something in []s OR
  \{[A-Z]+\}          # something in {}s
 )(\(\d+(,\d+)?\))?   # optional count
)*                    # repeat until done
>?                    # pattern ends with an optional ">"
\.$                   # description ends with a required "."
""", re.VERBOSE)


Which also found that PS00539 and PS00267 are badly formatted -
I've sent mail to Prosite about it :)

I'm hoping to get working code out on Thursday, so people
can see how the alphabets and the (as yet to be mentioned)
property manager work.  

					Andrew
					dalke@acm.org