[Biopython-dev] Bio.SubstMat (was: Re: Calculating motif scores)
Michiel de Hoon
mjldehoon at yahoo.com
Sat Jul 25 15:28:35 UTC 2009
Hi everybody,
Over the weekend I was looking at Bio.SubsMat and its documentation. There are a few points in Bio.SubstMat that would be handled differently in modern Python, but I'd thought I'd raise them here first before I make any changes:
1) The matrix types (NOTYPE = 0, ACCREP = 1, OBSFREQ = 2, SUBS = 3, EXPFREQ = 4, LO = 5) are now global variables (at the level of Bio.SubsMat). I think that these should be class variables of the Bio.SubsMat.SeqMat class.
2) The print_mat method. It would be more Pythonic to use __str__, __format__ for this, though the latter is only available for Python versions >= 2.6.
3) The __sum__ method. I guess that this was intended to be __add__?
4) The sum_letters attribute. To calculate the sum of all values for a given letter, currently the following two functions are involved:
def all_letters_sum(self):
for letter in self.alphabet.letters:
self.sum_letters[letter] = self.letter_sum(letter)
def letter_sum(self,letter):
assert letter in self.alphabet.letters
sum = 0.
for i in self.keys():
if letter in i:
if i[0] == i[1]:
sum += self[i]
else:
sum += (self[i] / 2.)
return sum
As you can see, the result is not returned, but stored in an attribute called sum_letters. I suggest to replace this with the following:
def sum(self):
result = {}
for letter in self.alphabet.letters:
result[letter] = 0.0
for pair, value in self:
i1, i2 = pair
if i1==i2:
result[i1] += value
else:
result[i1] += value / 2
result[i2] += value / 2
return result
so without storing the result in an attribute.
Any comments, objections?
--Michiel
--- On Fri, 7/24/09, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> From: Michiel de Hoon <mjldehoon at yahoo.com>
> Subject: Re: [Biopython-dev] Calculating motif scores
> To: "Bartek Wilczynski" <bartek at rezolwenta.eu.org>
> Cc: biopython-dev at biopython.org
> Date: Friday, July 24, 2009, 5:34 AM
>
> > As for the PWM being a separate class and used by the
> motif:
> > I don't know. I'm using Bio.SubsMat.FreqTable for
> implementing
> > frequency table, so I understand that the new PWM
> class would
> > be basically a "smarter" FreqTable. I'm not sure
> whether it
> > solves any problems...
>
> Wow, I didn't even know the Bio.SubsMat module existed.
> As we have several different but related modules
> (Bio.Motif, Bio.SubstMat, Bio.Align), I think we should
> define the purpose and scope of each of these modules.
> Maybe a good way to start is the documentation. Bio.SubsMat
> is currently divided into two chapters (14.4 and 16.2). I'll
> have a look at this over the weekend to see if this can be
> cleaned up a bit.
>
> --Michiel.
>
>
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>
More information about the Biopython-dev
mailing list