[Biojava-l] [HMM] detecting several instances of the same motif fails
Evert-Jan Blom
e.j.blom at rug.nl
Tue May 22 12:22:10 UTC 2007
Dear all,
Using a page from the CookBook
http://www.biojava.org/wiki/BioJava:CookBook:DP:HMM we implemented a
profile HMM
in our application to detect regulatory motif instances. To test, we
created a model based on 10 identical sequences
(the test sequence was: TGCTGCTGCGGGCCC):
The model is subsequently trained using a BaumWelchTrainer and decoded
using the ScoreType.ODDS, ScoreType.Probability and ScoreType.NullModel
The sequence we use for testing contains 2 motifs, a perfect motif and a
motif with one mismatch:.
AAAATGCTGCTGCGGGCCCAAAAATGCTGCGGCGGGCCCAAA
The results of the original HMMER package tell me that there are 2
instances of the motif present in the test string whereas the biojava
package yields very strange results:
results using the ScoreType.ODDS, only the second motif is detected:
{AAAATGCTGCTGCGGGCCCAAAAATGCTGCGGCGGGCCCAAA}
Log Odds = 7.65779871993799
i-0
i-0
i-0
i-0
i-0
i-0
i-0
i-0
i-0
i-0
i-0
i-0
i-0
i-0
i-0
i-0
i-0
i-0
i-0
i-0
i-0
i-0
i-0
i-0
m-1
m-2
m-3
m-4
m-5
m-6
d-7
m-8
m-9
m-10
m-11
m-12
m-13
d-14
d-15
i-15
i-15
i-15
i-15
i-15
i-15
Now the second scorer, only the first motif is detected:
Prob = -95.9806747848816
i-0
i-0
i-0
i-0
m-1
m-2
m-3
m-4
m-5
m-6
m-7
m-8
m-9
m-10
i-10
i-10
i-10
i-10
i-10
i-10
i-10
i-10
i-10
i-10
i-10
i-10
i-10
i-10
m-11
i-11
m-12
i-12
i-12
i-12
m-13
m-14
m-15
i-15
i-15
i-15
i-15
i-15
Now the null model which seems to make no sense at all:
Null = -94.11166855273558
m-1
m-2
m-3
m-4
m-5
m-6
m-7
m-8
m-9
m-10
m-11
m-12
m-13
m-14
m-15
i-15
i-15
i-15
i-15
i-15
i-15
i-15
i-15
i-15
i-15
i-15
i-15
i-15
i-15
i-15
i-15
i-15
i-15
i-15
i-15
i-15
i-15
i-15
i-15
i-15
i-15
i-15
Is there an option to detect the second motif in the same run just like
the original HMMER? Or am I missing some
option that is not described in the tutorial.
Thanks in advance
E.J.Blom
More information about the Biojava-l
mailing list