[Dynamite] July (attn Ewan)
Guy Slater
guy@ebi.ac.uk
Tue, 23 May 2000 20:43:32 +0100 (BST)
On Fri, 19 May 2000, Ian Holmes wrote:
> I actually found something at
>
> http://www.iro.umontreal.ca/~lapalme/Algorithms-functional.html
>
> It has general DP code, with applications for matrix multiplication and
> other things, but not string comparison.
>
> I'm going to get this book, it looks cool..
I've had a play around with Haskell now ...
it requires a /very/ different way of thinking.
Unsuprisingly, (I suppose), you can't just describe
the recurrence relations, because you end up
with exponential running time. You need to describe
the dynamic programming itself.
/* ie. naive edit distance calculation C: */
static int naiveEditDistance(char *a, char *b){
register int match, insert, delete;
if(!*a) return strlen(b);
if(!*b) return strlen(a);
match = ((*a == *b)?0:1) + naive_edit(a+1, b+1);
insert = 1 + naive_edit(a+1, b);
delete = 1 + naive_edit(a, b+1);
return (insert < delete)
? ((insert < match)?insert:match)
: ((delete < match)?delete:match);
}
-- the same in Haskell:
naiveEdit :: Eq a => [a] -> [a] -> Int
naiveEdit xs [] = length xs
naiveEdit [] ys = length ys
naiveEdit (x:xs) (y:ys)
= minThree (matchPair x y + naiveEdit xs ys)
(1 + naiveEdit xs (y:ys))
(1 + naiveEdit (x:xs) ys)
Anyway, these are useless,
so I've attempted a proper implementation ...
------------------
--- BEGIN CODE ---
----------------------------------------
-- Edit distance calculations in haskell
-- Guy St.C. Slater. 20:03:2000
----------------------------------------
-- Find score for equivalenced symbol pair
matchPair :: Eq a => a -> a -> Int
matchPair x y
| x==y = 0
| otherwise = 1
-- Initialise an edge position
initPos :: Int -> Int
initPos x = x
-- Initialise an edge row
initEdge :: Int -> [Int]
initEdge 0 = [initPos 0]
initEdge x = initEdge (x-1) ++ [ initPos x ]
-- Find the minimum of three Ordered values
minThree :: Ord a => a -> a -> a -> a
minThree x y z = min (min x y) z
-- Calculate a cell
calcCell :: Int -> Int -> Int -> Char -> Char -> Int
calcCell diag above left qysym dbsym =
minThree (diag + matchPair qysym dbsym)
(1 + left)
(1 + above)
-- Calculate a row
calcRow :: [Char] -> [Char] -> Int -> Int -> [Int] -> [Int]
calcRow qys dbs 0 db_pos prev_row = [ initPos db_pos ]
calcRow qys dbs qy_pos db_pos prev_row =
curr_row ++ [ calcCell (prev_row!!(qy_pos-1))
(prev_row!!(qy_pos))
(curr_row!!(qy_pos-1))
(qys!!(qy_pos-1))
(dbs!!(db_pos-1)) ]
where curr_row = (calcRow qys dbs
(qy_pos-1) db_pos prev_row)
-- Calculate the whole matrix (one row at a time)
calcMat :: [Char] -> [Char] -> Int -> Int -> [Int]
calcMat qys dbs qylen 0 = initEdge qylen
calcMat qys dbs qylen dblen =
calcRow qys dbs qylen dblen
(calcMat qys dbs qylen (dblen-1))
-- Calculate the edit distance
alignScore :: [Char] -> [Char] -> Int
alignScore xs ys = (calcMat xs ys lxs (length ys))!!lxs
where lxs = (length xs)
-- Example (the answer is six)
main = print((alignScore "TRACEBACK" "BACKSPACE"))
--- END CODE ---
----------------
... as this is the first code I've written in Haskell, I should imagine
the quality is very poor (Thinking back to my first C). I haven't learnt
much of the language yet.
This should have quadratic running time, but the way I've maniuplated
lists is very inefficient, and should be rewritten.
I reckon Haskell could be useful for prototyping, or microcode generation,
(plenty of polymorphism) but I doubt that the performance would be up
to much else.
Anyway, I hope you get the idea,
Guy.
--