[Biopython-dev] Bio.SubstMat (was: Re: Calculating motif scores)

Iddo Friedberg idoerg at gmail.com
Sat Jul 25 20:57:59 UTC 2009


I'm the author of subsmat IIRC. Everything sounds good, but I would not make
2.6 changes that will break on 2.5. Ubuntu still uses 2.5 and I imagine
other linux distros do too.

Thanks,

Iddo

Would code those in myself, but I'm moving.

Iddo Friedberg
http://iddo-friedberg.net/contact.html

On Jul 25, 2009 8:35 AM, "Michiel de Hoon" <mjldehoon at yahoo.com> wrote:


Hi everybody,

Over the weekend I was looking at Bio.SubsMat and its documentation. There
are a few points in Bio.SubstMat that would be handled differently in modern
Python, but I'd thought I'd raise them here first before I make any changes:

1) The matrix types (NOTYPE = 0, ACCREP = 1, OBSFREQ = 2, SUBS = 3, EXPFREQ
= 4, LO = 5) are now global variables (at the level of Bio.SubsMat). I think
that these should be class variables of the Bio.SubsMat.SeqMat class.

2) The print_mat method. It would be more Pythonic to use __str__,
__format__ for this, though the latter is only available for Python versions
>= 2.6.

3) The __sum__ method. I guess that this was intended to be __add__?

4) The sum_letters attribute. To calculate the sum of all values for a given
letter, currently the following two functions are involved:

  def all_letters_sum(self):
     for letter in self.alphabet.letters:
        self.sum_letters[letter] = self.letter_sum(letter)

  def letter_sum(self,letter):
     assert letter in self.alphabet.letters
     sum = 0.
     for i in self.keys():
        if letter in i:
           if i[0] == i[1]:
              sum += self[i]
           else:
              sum += (self[i] / 2.)
     return sum

As you can see, the result is not returned, but stored in an attribute
called sum_letters. I suggest to replace this with the following:

   def sum(self):
       result = {}
       for letter in self.alphabet.letters:
           result[letter] = 0.0
       for pair, value in self:
           i1, i2 = pair
           if i1==i2:
               result[i1] += value
           else:
               result[i1] += value / 2
               result[i2] += value / 2
       return result

so without storing the result in an attribute.


Any comments, objections?

--Michiel

--- On Fri, 7/24/09, Michiel de Hoon <mjldehoon at yahoo.com> wrote:

> From: Michiel de Hoon <mjldehoon at yahoo.com>
> Subject: Re: [Biopython-dev] Calculating motif scores
> To: "Bartek Wilczynski" <bartek at rezolwenta.eu.org>
> Cc: biopython-dev at biopython.org
> Date: Friday, July 24, 2009, 5:34 AM
>
> > As for the PWM being a separate class and used by the
> motif:
> > I don't know. I'm using Bio.SubsMat.FreqTable for
> implementing
> > frequency table, so I understand that the new PWM
> class would
> > be basically a "smarter" FreqTable. I'm not sure
> whether it
> > solves any problems...
>
> Wow, I didn't even know the Bio.SubsMat module existed.
> As we have several different but related modules
> (Bio.Motif, Bio.SubstMat, Bio.Align), I think we should
> define the purpose and scope of each of these modules.
> Maybe a good way to start is the documentation. Bio.SubsMat
> is currently divided into two chapters (14.4 and 16.2). I'll
> have a look at this over the weekend to see if this can be
> cleaned up a bit.
>
> --Michiel.
>
>
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>




_______________________________________________
Biopython-dev mailing list
Biopython-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython-dev



More information about the Biopython-dev mailing list