[Biopython-dev] Bio.motifs.matrix.PositionSpecificScoringMatrix.calculate - scoring ambiguous sequences

Sefa Kilic sefa1 at umbc.edu
Wed Jun 22 01:37:15 UTC 2016


Any thoughts?

On Mon, Jun 13, 2016 at 10:58 AM, Peter Cock <p.j.a.cock at googlemail.com>
wrote:

> What do you think Michiel?
>
> Also related, earlier today I filed this issue:
> https://github.com/biopython/biopython/issues/851
>
> Peter
>
> On Mon, Jun 13, 2016 at 3:26 PM, Sefa Kilic <sefa1 at umbc.edu> wrote:
> > Hello all,
> >
> > I have been using the Bio.motifs PSSM search for a long time.
> Occasionally,
> > I work with genome sequences containing ambiguous bases. Biopython
> currently
> > does not support scoring sequences with ambiguous bases and I would like
> to
> > propose a change to fix that.
> >
> > Currently, the "calculate" function in PositionSpecificScoringMatrix
> class
> > checks if alphabets of both motif and sequence are
> > IUPAC.IUPACUnambiguousDNA. If they are not, a ValueError exception is
> > raised.
> >
> > The code itself, however, tolerates ambiguous bases on the sequence as
> NaN.
> > That is, given a PSSM of length L, all L-mer subsequences of the given
> > sequence are scored as NaN. I would like to extend it and do the scoring
> > properly for ambiguous sequences. For instance, if the base is Y (C or
> T),
> > it should be scored as the average of scoring it as C and as T. If the
> base
> > is N, it should be scored as the average of all bases [S(A) + S(T) +
> S(C) +
> > S(G)] / 4.
> >
> > The change needs to be done on both Python and C (_pwm.c) sides. What do
> you
> > think? If you agree, I can implement it and send a pull request.
> >
> > Cheers,
> >
> > _______________________________________________
> > Biopython-dev mailing list
> > Biopython-dev at mailman.open-bio.org
> > http://mailman.open-bio.org/mailman/listinfo/biopython-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython-dev/attachments/20160621/224f4ea5/attachment.html>


More information about the Biopython-dev mailing list