[Biojava-dev] SymbolTokenizer for Meme class

mark.schreiber at novartis.com mark.schreiber at novartis.com
Tue Jul 12 04:51:30 EDT 2005


>I have used MEME for DNA sequences and produced text output (no html).
>BioJava version: 1.3

I would strongly recommend upgrading to biojava 1.4 (now the official 
release) unless you have a strong attachment to version 1.3, that version 
is over 2 years old now. Looking in CVS at least one change was made to 
update the file to read meme v3 output. That should fix the bug you see 
with "log", i believe I made the same change you did.

>I don't how, jow java's StreamTokenizer works, but the Meme constructor 
>seems to look for the keyword "ALPHABET". Then i guess it looks for the 
>first TT_WORD after that keyword, which is ACGT
>(ALPHABET: ACGT)
>It breaks when trying to build a SimpleSymbolList from ACGT using the
>SymbolTokenization I gave as parameter.
>
>However it works when I construct the parser in another way:
>SymbolTokenization ct = DNATools.getDNA().getTokenization("token");
>
>instead of
>
>SymbolTokenization ct = new 
CharacterTokenization(DNATools.getDNA(),true);

Sorry, I didn't read your email carefully. As you have discovered the 
technique you use is the best way to get a SymbolTokenization. I should 
put this in Biojava in Anger.


>There is another thing that does not work.
>The column distributions of the weight matrix class
>are not allowed to get negative values. On the one hand this is 
>semantically correct since it is a probability distribution. On the 
>other hand the Meme constructor tries to read the log-odds-score matrix.
>(looks for keyword "log"). I've changed the constructor (at my local 
>installation) to look for keyword "letter". Now it reads the 
>letter-probability matrix which is also given in the result files.

I believe this is fixed in biojava 1.4 (see above). Let me know if this 
doesn't work.

>Is there a class for log-odds matrices?

Not really, WeightMatrices are backed by Distributions which are not 
log-odds. However WeightMatrices can use a log-odds ScoreType which 
calculates the log odds of a Distribution versus its Null Distribution.

- Mark





More information about the biojava-dev mailing list