[Biopython] some problems with motif processing
Bartek Wilczynski
bartek at rezolwenta.eu.org
Wed Apr 13 07:26:22 UTC 2011
Hi,
On Wed, Apr 13, 2011 at 4:43 AM, Philip Machanick <
philip.machanick at gmail.com> wrote:
> I don't have time to fix any of this now and have other options for my
> current project but in case anyone else is maintaining the Motif code:
>
> 1. the score_hit function is wrong; if it hits a character that isn't in
> the alphabet it simply skips it; if e.g. you hit some repeat-masked
> sequence
> that's all Ns this will give that position the maximum possible score. 2
> possible fixes: don't score any site that contains ambiguous characters,
> or
> score them as if they are the average of the characters they represent
>
I don't think I agree with you. Skipping a character gives it a score of 0,
which is far from maximum and it is exactly the average log-odds (log(1)=0).
Doing something similar for other IUPAC ambiguous characters would make
sense, I'll look into it.
2. the MEME parser is way too strict. It demands many features of a MEME
> file that aren't in the minimum MEME motif
> spec<http://meme.sdsc.edu/meme4_6_1/doc/meme-format.html>.
> Since MEME isn't the only program that generates MEME motifs, this
> restricts
> the code to working pretty much only with MEME outputs.
>
> I'm not the original author of MEME parser, but I guess it would be easy to
make the changes necessary to accept this meme minimal format. Could you
give some examples of programs producing this format, so I could test the
code better?
> Fixing these problems will make this functionality a whole lot more useful.
> Great if someone has time now, otherwise I'll put it down as a future
> student project.
>
> Thanks for pointing out the issues you have with the library.
--
Bartek Wilczynski
More information about the Biopython
mailing list