[Biopython] some problems with motif processing

Bartek Wilczynski bartek at rezolwenta.eu.org
Wed Apr 13 07:26:22 UTC 2011


Hi,

On Wed, Apr 13, 2011 at 4:43 AM, Philip Machanick <
philip.machanick at gmail.com> wrote:

> I don't have time to fix any of this now and have other options for my
> current project but in case anyone else is maintaining the Motif code:
>
>   1. the score_hit function is wrong; if it hits a character that isn't in
>   the alphabet it simply skips it; if e.g. you hit some repeat-masked
> sequence
>   that's all Ns this will give that position the maximum possible score. 2
>   possible fixes: don't score any site that contains ambiguous characters,
> or
>   score them as if they are the average of the characters they represent
>
I don't think I agree with you.  Skipping a character gives it a score of 0,
which is far from maximum and it is exactly the average log-odds (log(1)=0).
Doing something similar for other IUPAC ambiguous characters would make
sense, I'll  look into it.

  2. the MEME parser is way too strict. It demands many features of a MEME
>   file that aren't in the minimum MEME motif
> spec<http://meme.sdsc.edu/meme4_6_1/doc/meme-format.html>.
>   Since MEME isn't the only program that generates MEME motifs, this
> restricts
>   the code to working pretty much only with MEME outputs.
>
> I'm not the original author of MEME parser, but I guess it would be easy to
make the changes necessary to accept this meme minimal format. Could you
give some examples of programs producing this format, so I could test the
code better?


> Fixing these problems will make this functionality a whole lot more useful.
> Great if someone has time now, otherwise I'll put it down as a future
> student project.
>
> Thanks for pointing out the issues you have with the library.

-- 
Bartek Wilczynski



More information about the Biopython mailing list