[Dynamite] July (attn Ewan)

Guy Slater guy@ebi.ac.uk
Tue, 23 May 2000 20:43:32 +0100 (BST)


On Fri, 19 May 2000, Ian Holmes wrote:

> I actually found something at
> 
> http://www.iro.umontreal.ca/~lapalme/Algorithms-functional.html
> 
> It has general DP code, with applications for matrix multiplication and 
> other things, but not string comparison.
> 
> I'm going to get this book, it looks cool..

I've had a play around with Haskell now ...
it requires a /very/ different way of thinking.

Unsuprisingly, (I suppose), you can't just describe
the recurrence relations, because you end up
with exponential running time.  You need to describe
the dynamic programming itself.

/* ie. naive edit distance calculation C: */

static int naiveEditDistance(char *a, char *b){
    register int match, insert, delete;
    if(!*a) return strlen(b);
    if(!*b) return strlen(a);
    match  = ((*a == *b)?0:1) + naive_edit(a+1, b+1);
    insert = 1 + naive_edit(a+1, b);
    delete = 1 + naive_edit(a, b+1);
    return (insert < delete)
        ? ((insert < match)?insert:match)
        : ((delete < match)?delete:match);
    }

-- the same in Haskell:

naiveEdit :: Eq a => [a] -> [a] -> Int
naiveEdit xs [] = length xs
naiveEdit [] ys = length ys
naiveEdit (x:xs) (y:ys)
  = minThree (matchPair x y + naiveEdit xs ys)
             (1 + naiveEdit xs (y:ys)) 
             (1 + naiveEdit (x:xs) ys)

Anyway, these are useless,
so I've attempted a proper implementation ...

------------------
--- BEGIN CODE ---

----------------------------------------
-- Edit distance calculations in haskell
-- Guy St.C. Slater. 20:03:2000
----------------------------------------

-- Find score for equivalenced symbol pair
matchPair :: Eq a => a -> a -> Int
matchPair x y
          | x==y      = 0     
          | otherwise = 1

-- Initialise an edge position
initPos :: Int -> Int
initPos x = x

-- Initialise an edge row
initEdge :: Int -> [Int]
initEdge 0 = [initPos 0]
initEdge x = initEdge (x-1) ++ [ initPos x ]

-- Find the minimum of three Ordered values
minThree :: Ord a => a -> a -> a -> a
minThree x y z = min (min x y) z

-- Calculate a cell
calcCell :: Int -> Int -> Int -> Char -> Char -> Int
calcCell diag above left qysym dbsym = 
         minThree (diag + matchPair qysym dbsym)
                  (1 + left)
                  (1 + above)

-- Calculate a row
calcRow :: [Char] -> [Char] -> Int -> Int -> [Int] -> [Int]
calcRow qys dbs 0 db_pos prev_row = [ initPos db_pos ]
calcRow qys dbs qy_pos db_pos prev_row =
          curr_row ++ [ calcCell (prev_row!!(qy_pos-1))
                                 (prev_row!!(qy_pos))
                                 (curr_row!!(qy_pos-1))
                                 (qys!!(qy_pos-1))
                                 (dbs!!(db_pos-1)) ]
          where curr_row = (calcRow qys dbs 
                           (qy_pos-1) db_pos prev_row)

-- Calculate the whole matrix (one row at a time)
calcMat :: [Char] -> [Char] -> Int -> Int -> [Int]
calcMat qys dbs qylen 0 = initEdge qylen
calcMat qys dbs qylen dblen = 
                calcRow qys dbs qylen dblen
               (calcMat qys dbs qylen (dblen-1))

-- Calculate the edit distance
alignScore :: [Char] -> [Char] -> Int
alignScore xs ys = (calcMat xs ys lxs (length ys))!!lxs
                 where lxs = (length xs)

-- Example (the answer is six)
main = print((alignScore "TRACEBACK" "BACKSPACE"))

--- END CODE ---
----------------

... as this is the first code I've written in Haskell, I should imagine
the quality is very poor (Thinking back to my first C).  I haven't learnt
much of the language yet.

This should have quadratic running time, but the way I've maniuplated
lists is very inefficient, and should be rewritten.

I reckon Haskell could be useful for prototyping, or microcode generation,
(plenty of polymorphism) but I doubt that the performance would be up
to much else.

Anyway, I hope you get the idea,

Guy.
--