How to make Data.Set a monad
…and how to fake Lisp macros with Template Haskell
(I wrote this article in response to a comment by sigfpe. You may find it pretty dry reading, unless you want to build domain-specific languages in Haskell. Proceed at your own risk.)
Haskell’s built-in Monad
type has some serious limitations. We can fix those limitations using a number of advanced Haskell techniques, including Template Haskell, Haskell’s closest equivalent to Lisp macros.
We can illustrate the limitations of Monad
with an example from math. In set theory, we can define a set by specifying how to compute each
element:
{ xy : x ∈ {1,2,4}, y ∈ {1,2,4} }
We can read this as, “the set of all xy, where x is one of {1,2,4}, and y is one of {1,2,4}.” To calculate the answer, we first multiply together all the possible combinations:
1×1=1, 1×2=2, 1×4=4, 2×1=2, 2×2=4, 2×4=8, 4×1=4, 4×2=8, 4×4=16
We then collect up the answers, and—because we’re working with sets–we throw away the duplicates:
{1,2,4,8,16}
Can we do the same thing in Haskell? Well, using Haskell’s list monad, we can write:
listExample = do
x <- [1,2,4]
y <- [1,2,4]
return (x*y)
But when we run this, Haskell gives us lots of duplicate values:
> listExample
[1,2,4,2,4,8,4,8,16]
Our problem: We’re using lists (which can contain duplicate
values) to represent sets (which can’t). Can we fix this by switching to
Haskell’s Data.Set
?
import qualified Data.Set as S
-- This doesn't work.
setExample = do
x <- S.fromList [1,2,4]
y <- S.fromList [1,2,4]
return (x*y)
Unfortunately, this code fails spectacularly. A Haskell monad is required
to work for any types a
and b
:
class Monad m where
return :: a -> m a
fail :: String -> m a
(>>=) :: m a -> (a -> m b) -> m b
But Data.Set
only works for some types. Specifically, it
requires that values of type a
can be ordered:
data (Ord a) => Set a = ...
As it turns out, we can make Data.Set
into a monad. But be
warned: The solution involves some pretty ugly Haskell abuse.
Splitting Monad
in half
If we want to put a restrictions on a
and b
,
we’ll need to move them into the signature of Monad
. My first
attempt looked like this:
{-# LANGUAGE MultiParamTypeClasses,
UndecidableInstances #-}
import Prelude hiding (return, fail, (>>=))
import qualified Prelude
-- This won't work:
class NewMonad m a b where
return :: a -> m a
fail :: String -> m a
(>>=) :: m a -> (a -> m b) -> m b
Unfortunately, the type b
doesn’t appear anywhere in
return
or fail
. This makes the type checker sad.
But there’s a workaround, discovered by oleg. We can split
Monad
into two pieces:
class Monad1 m a where
return :: a -> m a
fail :: String -> m a
class (Monad1 m a, Monad1 m b) =>
Monad2 m a b where
(>>=) :: m a -> (a -> m b) -> m b
Using these type classes, we can finally make Set
a monad:
instance (Ord a) =>
Monad1 S.Set a where
return = S.singleton
fail _ = S.empty
instance (Ord a, Ord b) =>
Monad2 Set.Set a b where
m >>= f = (setJoin . S.map f) m
where setJoin = S.unions . S.toList
This gets us very close to our goal:
setExample :: S.Set Int
setExample =
S.fromList [1,2,4] >>= \x ->
S.fromList [1,2,4] >>= \y ->
return (x*y)
When we run this code, the duplicates are gone:
> setExample
fromList [1,2,4,8,16]
Rebuilding do
: Macros
Unfortunately, GHC won’t let us use the built-in do
syntax.
Even though we’ve carefully hidden the regular >>=
operator,
GHC goes digging around in the libraries and finds it anyway. So we need
to somehow replace the built-in do
with a new version that
uses our redefined >>=
.
Update: As Brandon points out below, we can get the same result with -fno-implicit-prelude
, which will force GHC to use whatever >>=
is in scope. Thanks, Brandon!
We can build our own version of do
using Template
Haskell. Template Haskell allows us to generate code at compile-time,
in fashion similar to Lisp macros. In the code below, $(...)
means “insert some code here,” and [|...|]
parses an
expression and returns it as a data structure.
setExample' :: S.Set Int
setExample' = $(restricted [|do
x <- S.fromList [1,2,4]
y <- S.fromList [1,2,4]
return (x*y) |])
The function
restricted
maps a parsed expression to another parsed
expression.
As it turns out, we’re not allowed to define restricted
(or
any of our earlier type classes) in the same file as
setExample'
. So we need to create a file “RestrictedMonad.hs”
and move all our definitions into it:
{-# LANGUAGE MultiParamTypeClasses,
UndecidableInstances,
TemplateHaskell #-}
module RestrictedMonad (
Monad1, Monad2, return, fail, (>>=),
restricted
) where
import Prelude hiding (return, fail, (>>=))
import qualified Prelude
import qualified Data.Set as S
import Language.Haskell.TH
import Language.Haskell.TH.Syntax
-- Our earlier definitions go here...
Now, we can define our “macro”:
-- Maps a quoted expression to a quoted
-- expression.
restricted :: Q Exp -> Q Exp
restricted code = do
(DoE stmts) <- code
expand stmts
Before we can define expand
, we need one more piece. Template
Haskell is based on monads, and we’ve redefined the built-in monad
operators. To generate our replacement code, we’ll need to use the
built-in return
operator:
ret :: a -> Q a
ret = Prelude.return
Now we’re ready to define expand
. This function expands a parsed do
-body into a series of
function calls. The implementation gets a little messy at times, due to limitations in
Template Haskell.
-- pat <- expr; stmts... We use lamE here
-- because we can't insert 'pat' directly
-- into the [|...|] form.
--
-- Note that we don't call 'fail' on
-- pattern-match failure, but we should.
expand (BindS pat expr:stmts) =
[| $(ret expr) >>=
$(lamE [ret pat]
(expand stmts)) |]
-- let decls...; stmts...
expand (LetS decls:stmts) =
letE (fmap ret decls)
(expand stmts)
-- The final expression in the 'do'.
expand (NoBindS expr:[]) = ret expr
-- expr; stmts...
expand (NoBindS expr:stmts) =
[| $(ret expr) >>=
(\_ -> $(expand stmts)) |]
expand stmts =
error ("Malformed 'do': "++show stmts)
For more information on Template Haskell, see Template Meta-programming for Haskell and DSL Implementation in MetaOCaml, Template Haskell, and C++ (PDF).
Some lesser-known monad laws
There are three well-known laws that all monads must obey. But
there are some lesser-known monad laws that are automatically
enforced in standard Haskell, thanks to the type checker. But now that we
allow our monad to use the operations defined by Ord
, we
need to prove these extra laws manually.
This should be a complete list:
-- The standard monad laws.
join . return == id
join . fmap return == id
join . join == join . fmap join
-- The other monad laws: Return and join
-- are natural transformations.
fmap f . return == return . f
fmap f . join == join . fmap (fmap f)
-- The functor laws.
fmap id == id
fmap (f . g) == fmap f . fmap g
Ultimately, I’d like to see Data.Set
become a monad in standard Haskell.
Want to contact me about this article? Or if you're looking for something else to read, here's a list of popular posts.
-fno-implicit-prelude
, as Brandon points out. Still, the Template Haskell solution is definitely interesting in its own right: There are some nice features there, which could be useful for other things. pepe: Oh, that's cool. That might be the right way to get something like this into standard Haskell.