[Biopython-dev] Bio.SubstMat (was: Re: Calculating motif scores)
Michiel de Hoon
mjldehoon at yahoo.com
Tue Aug 25 10:41:20 UTC 2009
I did (3) and (4) below, and I added a __str__ method but I didn't touch the other print functions (2).
For (1), maybe a better way is to subclass the SeqMat class for each of the matrix types instead of storing the matrix type in self.mat_type. Any comments or objections (especially Iddo)?
--Michiel.
--- On Sat, 7/25/09, Iddo Friedberg <idoerg at gmail.com> wrote:
> I'm the author of subsmat IIRC.
> Everything sounds good, but I would not make 2.6 changes
> that will break on 2.5. Ubuntu still uses 2.5 and I imagine
> other linux distros do too.
> 1) The matrix types (NOTYPE = 0, ACCREP = 1, OBSFREQ = 2,
> SUBS = 3, EXPFREQ = 4, LO = 5) are now global variables (at
> the level of Bio.SubsMat). I think that these should be
> class variables of the Bio.SubsMat.SeqMat class.
>
>
>
>
> 2) The print_mat method. It would be more Pythonic to use
> __str__, __format__ for this, though the latter is only
> available for Python versions >= 2.6.
>
>
>
> 3) The __sum__ method. I guess that this was intended to be
> __add__?
>
>
>
> 4) The sum_letters attribute. To calculate the sum of all
> values for a given letter, currently the following two
> functions are involved:
>
>
>
> def all_letters_sum(self):
>
> for letter in self.alphabet.letters:
>
> self.sum_letters[letter] =
> self.letter_sum(letter)
>
>
>
> def letter_sum(self,letter):
>
> assert letter in self.alphabet.letters
>
> sum = 0.
>
> for i in self.keys():
>
> if letter in i:
>
> if i[0] == i[1]:
>
> sum += self[i]
>
> else:
>
> sum += (self[i] / 2.)
>
> return sum
>
>
>
> As you can see, the result is not returned, but stored in
> an attribute called sum_letters. I suggest to replace this
> with the following:
>
>
>
> def sum(self):
>
> result = {}
>
> for letter in self.alphabet.letters:
>
> result[letter] = 0.0
>
> for pair, value in self:
>
> i1, i2 = pair
>
> if i1==i2:
>
> result[i1] += value
>
> else:
>
> result[i1] += value / 2
>
> result[i2] += value / 2
>
> return result
>
>
>
> so without storing the result in an attribute.
>
>
>
>
>
> Any comments, objections?
>
>
>
> --Michiel
>
>
>
> --- On Fri, 7/24/09, Michiel de Hoon <mjldehoon at yahoo.com>
> wrote:
>
>
>
> > From: Michiel de Hoon <mjldehoon at yahoo.com>
>
> > Subject: Re: [Biopython-dev] Calculating motif scores
>
> > To: "Bartek Wilczynski" <bartek at rezolwenta.eu.org>
>
> > Cc: biopython-dev at biopython.org
>
> > Date: Friday, July 24, 2009, 5:34 AM
>
> >
>
> > > As for the PWM being a separate class and used by
> the
>
> > motif:
>
> > > I don't know. I'm using
> Bio.SubsMat.FreqTable for
>
> > implementing
>
> > > frequency table, so I understand that the new
> PWM
>
> > class would
>
> > > be basically a "smarter" FreqTable.
> I'm not sure
>
> > whether it
>
> > > solves any problems...
>
> >
>
> > Wow, I didn't even know the Bio.SubsMat module
> existed.
>
> > As we have several different but related modules
>
> > (Bio.Motif, Bio.SubstMat, Bio.Align), I think we
> should
>
> > define the purpose and scope of each of these
> modules.
>
> > Maybe a good way to start is the documentation.
> Bio.SubsMat
>
> > is currently divided into two chapters (14.4 and
> 16.2). I'll
>
> > have a look at this over the weekend to see if this
> can be
>
> > cleaned up a bit.
>
> >
>
> > --Michiel.
>
> >
>
> >
>
> >
>
> > _______________________________________________
>
> > Biopython-dev mailing list
>
> > Biopython-dev at lists.open-bio.org
>
> > http://lists.open-bio.org/mailman/listinfo/biopython-dev
>
> >
>
>
>
>
>
>
>
>
>
> _______________________________________________
>
> Biopython-dev mailing list
>
> Biopython-dev at lists.open-bio.org
>
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>
>
>
More information about the Biopython-dev
mailing list