[Biopython-dev] Bio.SubstMat (was: Re: Calculating motif scores)

Michiel de Hoon mjldehoon at yahoo.com
Sat Jul 25 15:28:35 UTC 2009


Hi everybody,

Over the weekend I was looking at Bio.SubsMat and its documentation. There are a few points in Bio.SubstMat that would be handled differently in modern Python, but I'd thought I'd raise them here first before I make any changes:

1) The matrix types (NOTYPE = 0, ACCREP = 1, OBSFREQ = 2, SUBS = 3, EXPFREQ = 4, LO = 5) are now global variables (at the level of Bio.SubsMat). I think that these should be class variables of the Bio.SubsMat.SeqMat class.

2) The print_mat method. It would be more Pythonic to use __str__, __format__ for this, though the latter is only available for Python versions >= 2.6.

3) The __sum__ method. I guess that this was intended to be __add__?

4) The sum_letters attribute. To calculate the sum of all values for a given letter, currently the following two functions are involved:

   def all_letters_sum(self):
      for letter in self.alphabet.letters:
         self.sum_letters[letter] = self.letter_sum(letter)

   def letter_sum(self,letter):
      assert letter in self.alphabet.letters
      sum = 0.
      for i in self.keys():
         if letter in i:
            if i[0] == i[1]:
               sum += self[i]
            else:
               sum += (self[i] / 2.)
      return sum

As you can see, the result is not returned, but stored in an attribute called sum_letters. I suggest to replace this with the following:

    def sum(self):
        result = {}
        for letter in self.alphabet.letters:
            result[letter] = 0.0
        for pair, value in self:
            i1, i2 = pair
            if i1==i2:
                result[i1] += value
            else:
                result[i1] += value / 2
                result[i2] += value / 2
        return result

so without storing the result in an attribute.


Any comments, objections?

--Michiel

--- On Fri, 7/24/09, Michiel de Hoon <mjldehoon at yahoo.com> wrote:

> From: Michiel de Hoon <mjldehoon at yahoo.com>
> Subject: Re: [Biopython-dev] Calculating motif scores
> To: "Bartek Wilczynski" <bartek at rezolwenta.eu.org>
> Cc: biopython-dev at biopython.org
> Date: Friday, July 24, 2009, 5:34 AM
> 
> > As for the PWM being a separate class and used by the
> motif:
> > I don't know. I'm using Bio.SubsMat.FreqTable for
> implementing
> > frequency table, so I understand that the new PWM
> class would
> > be basically a "smarter" FreqTable. I'm not sure
> whether it
> > solves any problems...
> 
> Wow, I didn't even know the Bio.SubsMat module existed. 
> As we have several different but related modules
> (Bio.Motif, Bio.SubstMat, Bio.Align), I think we should
> define the purpose and scope of each of these modules.
> Maybe a good way to start is the documentation. Bio.SubsMat
> is currently divided into two chapters (14.4 and 16.2). I'll
> have a look at this over the weekend to see if this can be
> cleaned up a bit.
> 
> --Michiel.
> 
> 
>       
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
> 


      




More information about the Biopython-dev mailing list