...and how to fake Lisp macros with Template Haskell

(I wrote this article in response to a comment by sigfpe. You may find it pretty dry reading, unless you want to build domain-specific languages in Haskell. Proceed at your own risk.)

Haskell's built-in Monad type has some serious limitations. We can fix those limitations using a number of advanced Haskell techniques, including Template Haskell, Haskell's closest equivalent to Lisp macros.

We can illustrate the limitations of Monad with an example from math. In set theory, we can define a set by specifying how to compute each element:

{ xy : x ∈ {1,2,4}, y ∈ {1,2,4} }

We can read this as, "the set of all xy, where x is one of {1,2,4}, and y is one of {1,2,4}." To calculate the answer, we first multiply together all the possible combinations:

1×1=1, 1×2=2, 1×4=4, 2×1=2, 2×2=4, 2×4=8, 4×1=4, 4×2=8, 4×4=16

We then collect up the answers, and---because we're working with sets--we throw away the duplicates:

{1,2,4,8,16}

Can we do the same thing in Haskell? Well, using Haskell's list monad, we can write:

listExample = do
  x <- [1,2,4]
  y <- [1,2,4]
  return (x*y)

But when we run this, Haskell gives us lots of duplicate values:

> listExample
[1,2,4,2,4,8,4,8,16]

Our problem: We're using lists (which can contain duplicate values) to represent sets (which can't). Can we fix this by switching to Haskell's Data.Set?

import qualified Data.Set as S

-- This doesn't work.
setExample = do
  x <- S.fromList [1,2,4]
  y <- S.fromList [1,2,4]
  return (x*y)

Unfortunately, this code fails spectacularly. A Haskell monad is required to work for any types a and b:

class Monad m where
  return :: a -> m a
  fail :: String -> m a
  (>>=) :: m a -> (a -> m b) -> m b

But Data.Set only works for some types. Specifically, it requires that values of type a can be ordered:

data (Ord a) => Set a = ...

As it turns out, we can make Data.Set into a monad. But be warned: The solution involves some pretty ugly Haskell abuse.

Splitting Monad in half

If we want to put a restrictions on a and b, we'll need to move them into the signature of Monad. My first attempt looked like this:

{-# LANGUAGE MultiParamTypeClasses,
             UndecidableInstances #-}

import Prelude hiding (return, fail, (>>=))
import qualified Prelude

-- This won't work:
class NewMonad m a b where
  return :: a -> m a
  fail :: String -> m a
  (>>=) :: m a -> (a -> m b) -> m b

Unfortunately, the type b doesn't appear anywhere in return or fail. This makes the type checker sad. But there's a workaround, discovered by oleg. We can split Monad into two pieces:

class Monad1 m a where
  return :: a -> m a
  fail :: String -> m a

class (Monad1 m a, Monad1 m b) =>
      Monad2 m a b where
  (>>=) :: m a -> (a -> m b) -> m b

Using these type classes, we can finally make Set a monad:

instance (Ord a) =>
         Monad1 S.Set a where
  return = S.singleton
  fail _ = S.empty

instance (Ord a, Ord b) =>
         Monad2 Set.Set a b where
  m >>= f = (setJoin . S.map f) m
    where setJoin = S.unions . S.toList

This gets us very close to our goal:

setExample :: S.Set Int
setExample =
  S.fromList [1,2,4] >>= \x ->
  S.fromList [1,2,4] >>= \y ->
  return (x*y)

When we run this code, the duplicates are gone:

> setExample
fromList [1,2,4,8,16]

Rebuilding do: Macros

Unfortunately, GHC won't let us use the built-in do syntax. Even though we've carefully hidden the regular >>= operator, GHC goes digging around in the libraries and finds it anyway. So we need to somehow replace the built-in do with a new version that uses our redefined >>=.

Update: As Brandon points out below, we can get the same result with -fno-implicit-prelude, which will force GHC to use whatever >>= is in scope. Thanks, Brandon!

We can build our own version of do using Template Haskell. Template Haskell allows us to generate code at compile-time, in fashion similar to Lisp macros. In the code below, $(...) means "insert some code here," and [|...|] parses an expression and returns it as a data structure.

setExample' :: S.Set Int
setExample' = $(restricted [|do
  x <- S.fromList [1,2,4]
  y <- S.fromList [1,2,4]
  return (x*y) |])

The function restricted maps a parsed expression to another parsed expression.

As it turns out, we're not allowed to define restricted (or any of our earlier type classes) in the same file as setExample'. So we need to create a file "RestrictedMonad.hs" and move all our definitions into it:

{-# LANGUAGE MultiParamTypeClasses,
             UndecidableInstances,
             TemplateHaskell #-}

module RestrictedMonad (
  Monad1, Monad2, return, fail, (>>=),
  restricted
  ) where

import Prelude hiding (return, fail, (>>=))
import qualified Prelude

import qualified Data.Set as S
import Language.Haskell.TH
import Language.Haskell.TH.Syntax

-- Our earlier definitions go here...

Now, we can define our "macro":

-- Maps a quoted expression to a quoted
-- expression.
restricted :: Q Exp -> Q Exp
restricted code = do
  (DoE stmts) <- code
  expand stmts

Before we can define expand, we need one more piece. Template Haskell is based on monads, and we've redefined the built-in monad operators. To generate our replacement code, we'll need to use the built-in return operator:

ret :: a -> Q a
ret = Prelude.return

Now we're ready to define expand. This function expands a parsed do-body into a series of function calls. The implementation gets a little messy at times, due to limitations in Template Haskell.

-- pat <- expr; stmts...  We use lamE here
-- because we can't insert 'pat' directly
-- into the [|...|] form.
--
-- Note that we don't call 'fail' on
-- pattern-match failure, but we should.
expand (BindS pat expr:stmts) =
  [| $(ret expr) >>=
       $(lamE [ret pat]
              (expand stmts)) |]

-- let decls...; stmts...
expand (LetS decls:stmts) =
  letE (fmap ret decls)
       (expand stmts)

-- The final expression in the 'do'.
expand (NoBindS expr:[]) = ret expr

-- expr; stmts...
expand (NoBindS expr:stmts) =
  [| $(ret expr) >>=
       (\_ -> $(expand stmts)) |]

expand stmts =
  error ("Malformed 'do': "++show stmts)

For more information on Template Haskell, see Template Meta-programming for Haskell and DSL Implementation in MetaOCaml, Template Haskell, and C++ (PDF).

Some lesser-known monad laws

There are three well-known laws that all monads must obey. But there are some lesser-known monad laws that are automatically enforced in standard Haskell, thanks to the type checker. But now that we allow our monad to use the operations defined by Ord, we need to prove these extra laws manually.

This should be a complete list:

-- The standard monad laws.
join . return == id
join . fmap return == id
join . join == join . fmap join

-- The other monad laws: Return and join
-- are natural transformations.
fmap f . return == return . f
fmap f . join == join . fmap (fmap f)

-- The functor laws.
fmap id == id
fmap (f . g) == fmap f . fmap g

Ultimately, I'd like to see Data.Set become a monad in standard Haskell.