[Biojava-l] Generalized HMM in biojava?
Matthew Pocock
matthew.pocock at ncl.ac.uk
Mon Jan 23 06:58:41 EST 2006
On Monday 23 January 2006 11:43, wendy wong wrote:
> > OK - so you have a single HMM that emits whole columns of an alignment?
> > Usually to a lign three sequences, you would use a 3-head HMM where each
> > head emits one of the sequences.
>
> I am not sure if it would work with a 3 head HMM, as in here the
> sequences are related to each other by the phylogenetic tree. so if
> the sequences order is the same, the column ACC would have a different
> likelihood than CCA.
So you already have the alignment from a phylogenetic program and you are
using biojava to compute some other statistic over it?
>
> > You shouldn't be getting exceptions. This is almost certainly a bug.
> > Could you send the stack-trace?
>
> sure, here it is:
Thanks. I am not arround untill the end of the week. Could somebody take a
look at this?
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 0
> at
> org.biojava.bio.symbol.LinearAlphabetIndex.buildIndex(LinearAlphabetIndex.j
>ava:108) at
> org.biojava.bio.symbol.LinearAlphabetIndex.<init>(LinearAlphabetIndex.java:
>66) at
> org.biojava.bio.symbol.AlphabetManager.getAlphabetIndex(AlphabetManager.jav
>a:1796) at
> edu.cornell.bscb.evopromoter.TestingFunctions.main(TestingFunctions.java:61
>)
>
> I think I don't need the full alphabet of getDNA(), which has 16
> symbols. I reduced it to 5 (A,T, C, G, N), so I can have a state that
> contains more sites...
While this is a good idea, it actually will be counter-productive in BioJava.
The DNA alphabet only has 4 'real' symbols - the nucleotides. The other
symbols (n included) are 'virtual' symbols constructed from sets of the
'real' symbols. By introducing 'N' as a 1st class symbol, you have actually
grown the problem from being exp(4,n) to exp(5,n) which is probably not what
you wanted :-)
>
> thanks,
> wendy
Matthew
More information about the Biojava-l
mailing list