[Biopython] some problems with motif processing

Philip Machanick philip.machanick at gmail.com
Wed Apr 13 02:43:30 UTC 2011


I don't have time to fix any of this now and have other options for my
current project but in case anyone else is maintaining the Motif code:

   1. the score_hit function is wrong; if it hits a character that isn't in
   the alphabet it simply skips it; if e.g. you hit some repeat-masked sequence
   that's all Ns this will give that position the maximum possible score. 2
   possible fixes: don't score any site that contains ambiguous characters, or
   score them as if they are the average of the characters they represent
   2. the MEME parser is way too strict. It demands many features of a MEME
   file that aren't in the minimum MEME motif
spec<http://meme.sdsc.edu/meme4_6_1/doc/meme-format.html>.
   Since MEME isn't the only program that generates MEME motifs, this restricts
   the code to working pretty much only with MEME outputs.

Fixing these problems will make this functionality a whole lot more useful.
Great if someone has time now, otherwise I'll put it down as a future
student project.
-- 
Philip Machanick (still in Australia for a while; note new mail address)
Rhodes University, Grahamstown 6140, South Africa
http://opinion-nation.blogspot.com/
+61-7-3871-0963 mobile +61 42 234 6909 skype philipmach



More information about the Biopython mailing list