[Biojava-dev] OutOfMemory when using a big Weight Matrix to find
Motifs in 1.3.1 but not in 1.3
mark.schreiber at group.novartis.com
mark.schreiber at group.novartis.com
Wed Jan 28 01:10:43 EST 2004
Hi Bruno -
WeightMatrices used to be scored by the DP class using the following
method (called from within the WeightMatrixAnnotator)
public static double scoreWeightMatrix(
WeightMatrix matrix, SymbolList symList, int start)
throws IllegalSymbolException {
double score = 0;
int cols = matrix.columns();
for (int c = 0; c < cols; c++) {
score += Math.log(
matrix.getColumn(c).getWeight(symList.symbolAt(c + start)));
}
return score;
}
They are now score using this method from the DP class with (by default)
ScoreType.PROBABILITY
public static double scoreWeightMatrix(
WeightMatrix matrix,
SymbolList symList,
ScoreType scoreType,
int start)
throws IllegalSymbolException {
double score = 0;
int cols = matrix.columns();
for (int c = 0; c < cols; c++) {
score += Math.log(scoreType.calculateScore(
matrix.getColumn(c), symList.symbolAt(c + start)));
}
return score;
}
As far as I can tell ScoreType.PROBABILITY does exactly the same thing as
before. It returns the weight of the symbol at that position. I'm not sure
I understand what is going on.
- Mark
Bruno Aranda - e-BioIntel <elmosca at terra.es>
Sent by: biojava-dev-bounces at portal.open-bio.org
01/27/2004 07:30 PM
Please respond to biodev
To: biojava-dev at biojava.org
cc:
Subject: [Biojava-dev] OutOfMemory when using a big Weight Matrix to find Motifs in
1.3.1 but not in 1.3
Hi Mark,
I've tried to increase the memory heap to 512 Mb but my little linux
almost died... However I've found the origin of the problem. The class I
tested followed the steps of your wonderful tutorial, and I used the low
score treshold of "0.1". With the new ScoreType System I got too many
results for my motif (every base in the sequence), so too many features
were created and the OutOutMemoryError was raised.
Now, for instance, I can put a treshold of 4000 (?) and I get some
results (some of them with a probability higher than 5000 (?)... but I
don't understand why probability scores are that high. Well, I will send
to your home a beer truck if you can explain which probability is used
for these score matrices ;-). Thanks,
Bruno Aranda
ebioIntel
_______________________________________________
biojava-dev mailing list
biojava-dev at biojava.org
http://biojava.org/mailman/listinfo/biojava-dev
More information about the biojava-dev
mailing list