[Biojava-dev] OutOfMemory when using a big Weight Matrix to find
Motifs in 1.3.1 but not in 1.3
mark.schreiber at group.novartis.com
mark.schreiber at group.novartis.com
Wed Jan 28 01:36:27 EST 2004
Hi Again,
I've found the problem.
The code starting at line 153 in DP needs changing from
for (int c = 0; c < cols; c++) {
score += scoreType.calculateScore(matrix.getColumn(c),
symList.symbolAt(c + start));
}
to
for (int c = 0; c < cols; c++) {
score += Math.log(scoreType.calculateScore(matrix.getColumn(c),
symList.symbolAt(c + start)));
}
so it will be consistent with the scoreWeightMatrix() method that doesn't
use a ScoreType. Actually, changing it to a log will prevent underflow
errors on large WeightMatrices. Interestingly the WeightMatrixAnnotator
converts it back to a normal probability with a Math.exp() operation
before annotation. I'm sure it doesn't need to be this conveluted??
Can someone add that fix to CVS. I'm having trouble with CVS just know so
I can't.
Mark Schreiber
Principal Scientist (Bioinformatics)
Novartis Institute for Tropical Diseases (NITD)
1 Science Park Road
#04-14 The Capricorn
Singapore 117528
phone +65 6722 2973
fax +65 6722 2910
Bruno Aranda - e-BioIntel <elmosca at terra.es>
Sent by: biojava-dev-bounces at portal.open-bio.org
01/27/2004 07:30 PM
Please respond to biodev
To: biojava-dev at biojava.org
cc:
Subject: [Biojava-dev] OutOfMemory when using a big Weight Matrix to find Motifs in
1.3.1 but not in 1.3
Hi Mark,
I've tried to increase the memory heap to 512 Mb but my little linux
almost died... However I've found the origin of the problem. The class I
tested followed the steps of your wonderful tutorial, and I used the low
score treshold of "0.1". With the new ScoreType System I got too many
results for my motif (every base in the sequence), so too many features
were created and the OutOutMemoryError was raised.
Now, for instance, I can put a treshold of 4000 (?) and I get some
results (some of them with a probability higher than 5000 (?)... but I
don't understand why probability scores are that high. Well, I will send
to your home a beer truck if you can explain which probability is used
for these score matrices ;-). Thanks,
Bruno Aranda
ebioIntel
_______________________________________________
biojava-dev mailing list
biojava-dev at biojava.org
http://biojava.org/mailman/listinfo/biojava-dev
More information about the biojava-dev
mailing list