[Biopython] some problems with motif processing
Philip Machanick
philip.machanick at gmail.com
Tue Apr 12 22:43:30 EDT 2011
I don't have time to fix any of this now and have other options for my
current project but in case anyone else is maintaining the Motif code:
1. the score_hit function is wrong; if it hits a character that isn't in
the alphabet it simply skips it; if e.g. you hit some repeat-masked sequence
that's all Ns this will give that position the maximum possible score. 2
possible fixes: don't score any site that contains ambiguous characters, or
score them as if they are the average of the characters they represent
2. the MEME parser is way too strict. It demands many features of a MEME
file that aren't in the minimum MEME motif
spec<http://meme.sdsc.edu/meme4_6_1/doc/meme-format.html>.
Since MEME isn't the only program that generates MEME motifs, this restricts
the code to working pretty much only with MEME outputs.
Fixing these problems will make this functionality a whole lot more useful.
Great if someone has time now, otherwise I'll put it down as a future
student project.
--
Philip Machanick (still in Australia for a while; note new mail address)
Rhodes University, Grahamstown 6140, South Africa
http://opinion-nation.blogspot.com/
+61-7-3871-0963 mobile +61 42 234 6909 skype philipmach
More information about the Biopython
mailing list