[BioPython] seq interface
Andrew Dalke
dalke@acm.org
Wed, 05 Apr 2000 05:58:05 -0600
BTW, all that text I wrote boils down to a rather
simple class definition. Here's the Seq object I'm
working with. The required API methods/attributes are
so marked:
class Seq:
def __init__(self, data, alphabet = Alphabet.generic_alphabet):
# Enforce string storage
assert type(data) == type("") # must use a string
self.data = data # Seq API
self.alphabet = alphabet # Seq API
def __repr__(self):
return "%s(%s, %s)" % (self.__class__.__name__,
repr(self.data),
repr(self.alphabet))
def __str__(self):
if len(self.data) > 60:
s = repr(self.data[:60] + " ...")
else:
s = repr(self.data)
return "%s(%s, %s)" % (self.__class__.__name__, s,
repr(self.alphabet))
# I don't think I like this method...
## def __cmp__(self, other):
## if isinstance(other, Seq):
## return cmp(self.data, other.data)
## else:
## return cmp(self.data, other)
def __len__(self): return len(self.data) # Seq API
def __getitem__(self, i): return self.data[i] # Seq API
def __getslice__(self, i, j): # Seq API
i = max(i, 0); j = max(j, 0)
return Seq(self.data[i:j], self.alphabet)
def tostring(self): # Seq API
return self.data
def tomutable(self): # Needed? Or use a function?
return MutableSeq(self.data, self.alphabet)
I haven't really described the "Alphabet" part, though...
Also, here's a prosite to re pattern converter in 6 lines
of code!
import string
_prosite_trans = string.maketrans(
"abcdefghijklmnopqrstuvwxyzX}()<>",
"ABCDEFGHIJKLMNOPQRSTUVW.YZ.]{}^$")
def prosite_to_re(pattern):
"""convert a valid Prosite pattern into an re string"""
s = string.replace(pattern, "{", "[^")
return string.translate(s, _prosite_trans, "-.")
(It assumes the input code is syntactically correct.
I've been working on a strict checker as well, but the
regular expression for the pattern is:
prosite_re = re.compile(r"""
^<? # starts with an optional "<"
(
[A-Zx]| # a character OR
\[[A-Z]+\]| # something in []s OR
\{[A-Z]+\} # something in {}s
)(\(\d+(,\d+)?\))? # optional count of the form "(i,j)" (",j" is
optional)
(- # new terms seperated by a '-'
(
[A-Zx]| # a character OR
\[[A-Z]+\]| # something in []s OR
\{[A-Z]+\} # something in {}s
)(\(\d+(,\d+)?\))? # optional count
)* # repeat until done
>? # pattern ends with an optional ">"
\.$ # description ends with a required "."
""", re.VERBOSE)
Which also found that PS00539 and PS00267 are badly formatted -
I've sent mail to Prosite about it :)
I'm hoping to get working code out on Thursday, so people
can see how the alphabets and the (as yet to be mentioned)
property manager work.
Andrew
dalke@acm.org