[Biojava-l] [HMM] detecting several instances of the same motif fails

Evert-Jan Blom e.j.blom at rug.nl
Tue May 22 12:22:10 UTC 2007


Dear all,

Using a page from the CookBook 
http://www.biojava.org/wiki/BioJava:CookBook:DP:HMM we implemented a 
profile HMM
in our application to detect regulatory motif instances. To test, we 
created a model based on 10 identical sequences
(the test sequence was: TGCTGCTGCGGGCCC):
The model is subsequently trained using a BaumWelchTrainer and decoded 
using the ScoreType.ODDS, ScoreType.Probability and ScoreType.NullModel

The sequence we use for testing contains 2 motifs, a perfect motif and a 
motif with one mismatch:.

AAAATGCTGCTGCGGGCCCAAAAATGCTGCGGCGGGCCCAAA

The results of the original HMMER package tell me that there are 2 
instances of the motif present in the test string whereas the biojava
package yields very strange results:

results using the ScoreType.ODDS, only the second motif is detected:

{AAAATGCTGCTGCGGGCCCAAAAATGCTGCGGCGGGCCCAAA}
Log Odds = 7.65779871993799
i-0
i-0
i-0
i-0
i-0
i-0
i-0
i-0
i-0
i-0
i-0
i-0
i-0
i-0
i-0
i-0
i-0
i-0
i-0
i-0
i-0
i-0
i-0
i-0
m-1
m-2
m-3
m-4
m-5
m-6
d-7
m-8
m-9
m-10
m-11
m-12
m-13
d-14
d-15
i-15
i-15
i-15
i-15
i-15
i-15

Now the second scorer, only the first motif is detected:

Prob = -95.9806747848816
i-0
i-0
i-0
i-0
m-1
m-2
m-3
m-4
m-5
m-6
m-7
m-8
m-9
m-10
i-10
i-10
i-10
i-10
i-10
i-10
i-10
i-10
i-10
i-10
i-10
i-10
i-10
i-10
m-11
i-11
m-12
i-12
i-12
i-12
m-13
m-14
m-15
i-15
i-15
i-15
i-15
i-15

Now the null model which seems to make no sense at all:
Null = -94.11166855273558
m-1
m-2
m-3
m-4
m-5
m-6
m-7
m-8
m-9
m-10
m-11
m-12
m-13
m-14
m-15
i-15
i-15
i-15
i-15
i-15
i-15
i-15
i-15
i-15
i-15
i-15
i-15
i-15
i-15
i-15
i-15
i-15
i-15
i-15
i-15
i-15
i-15
i-15
i-15
i-15
i-15
i-15

Is there an option to detect the second motif in the same run just like 
the original HMMER? Or am I missing some
option that is not described in the tutorial.

Thanks in advance

E.J.Blom






More information about the Biojava-l mailing list