[Biojava-l] what it needs to do this with (Bio)Java...
Matthew Pocock
mrp@sanger.ac.uk
Thu, 20 Apr 2000 17:53:20 +0100
Gerald,
Gerald Loeffler wrote:
> o given 2 "NT-Sequence" objects, construct the optimal alignment
> between them, return it as an "Alignment" object, and also give a Score
> for the alignment. (I know, this is gonna be slow, but it's fine for my
> application...)--
I think we should end up with some "well known" alignment algorithms included for
free as public-static-final variables somewhere. For now, it is easy enough to
build an alignment algorithm that look like this:
i2\
|\/
start/end -*- match\
| \/
i1\
\/
// just alias the singleton localy to save typing fatigue
AlphabetManager am = AlphabetManager.instance();
// define what type of things you are going to align
Alphabet DNA = DNATools.getAlphabet(); // basicaly this is DNA
Alphabet gappedDNA = am.getGappedAlphabet(DNA); // with gaps
CrossProductAlphabet pairDNA = am.getCrossProductAlphabet(
Collections.nCopies(2, gappedDNA)
);
Residue gap = am.getGapResidue();
// the match state will consume a token from each sequence
EmissionState match = StateFactory.DEFAULT.createState(
pairDNA, new int [] { 1, 1 }, "Match"
);
// insert1 consumes a token from sequence 1 & skips one in sequence 2
EmissionState i1 = StateFactory.DEFAULT.createState(
pairDNA, new int [] { 1, 0 }, "Insert-1"
);
// insert2 consumes a token from sequence 2 & skips one in sequence 1
EmissionState i2 = StateFactory.DEFAULT.createState(
pairDNA, new int [] { 0, 1 }, "Insert-2"
);
Residue [] rA = new Residue[2];
List rL = Arrays.asList(rA);
// set the emission probabilities to some sensible set of values
for(Iterator i = DNA.iterator(); i.hasNext(); ) {
rA[0] = (Residue) i.next();
rA[1] = gap;
i1.setWeight(pairDNA.getResidue(rL), some_suitable_score);
for(Iterator j = DNA.iterator(); j.hasNext(); ) {
rA[1] = (Residue) j.next();
match.setWeight(pairDNA.getResidue(rL), some_suitable_score);
}
rA[1] = rA[0];
rA[0] = gap;
i2.setWeight(pairDNA.getResidue(rL), some_suitable_score);
}
DotState joiner = new SimpleDotState("Joiner");
// create the model
SimpleMarkovModel pairwise = new SimpleMarkovModel(
2, pairDNA
);
// add the states
pairwise.addState(i1);
pairwise.addState(i2);
pairwise.addState(match);
pairwise.addState(joiner);
// wire joiner to/from the beginning/end state
pairwise.createTransition(pairwise.magicalState(), joiner);
pairwise.setTransitionScore(pairwise.magicalState(), joiner, 0);
pairwise.createTransition(joiner, pairwise.magicalState());
pairwise.setTransitionScore(joiner, pairwise.magicalState(), end);
// wire joiner to the match state
pairwise.createTransition(joiner, match);
pairwise.setTransitionScore(joiner, match, matchPrior);
pairwise.createTransition(match, match);
pairwise.setTransitionScore(match, match, matchExcent);
paiwrise.createTransition(match, joiner);
pairwise.setTransitionScore(match, joiner, matchEnd);
// wire joiner to i1
paiwrise.createTransition(joiner, i1);
pairwise.setTransitionScore(joiner, i1, gapPenalty);
pairwise.createTransition(i1, i1);
pairwise.setTransitionScore(i1, i1, gapExtend);
paiwrise.createTransition(i1, joiner);
pairwise.setTransitionScore(i1, joiner, gapEnd);
//wire joiner to i2
pairwise.createTransition(joiner, i2);
pairwise.setTransitionScore(joiner, i2, gapPenalty);
pairwise.createTransition(i2, i2);
pairwise.setTransitionScore(i2, i2, gapExtend);
paiwrise.createTransition(i2, joiner);
pairwise.setTransitionScore(i2, joiner, gapEnd);
// create the DP object that implements this algorithm
DP pairwiseDP = DPFactory.createDP(pairwise);
// now perform an alignment
// you can re-use the DP object many times
StatePath sp = pairwiseDP.viterbi(new ResidueList [] { seq1, seq2 });
Of course, you need to figure out what your probabilities/lods are going to be for
the emission probabilities & also the transition scores. I always end up working
with probabilities estimated from training data so I haven't yet delved into the
magic of blosum & the like.
Have fun
Matthew
Joon: You're out of your tree
Sam: It wasn't my tree
(Benny & Joon)