## Monads in 15 minutes: Backtracking and Maybe

Posted by Eric Kidd Mon, 12 Mar 2007 19:39:00 GMT

This morning, a programmer visited #haskell and asked how to implement backtracking. Not surprisingly, most of the answers involved monads. After all, monads are ubiquitous in Haskell: They’re used for IO, for probability, for error reporting, and even for quantum mechanics. If you program in Haskell, you’ll probably want to understand monads. So where’s the best place to start?

A friend of mine claims he didn’t truly understand monads until he
understood `join`

. But once he figured that out, everything was
suddenly obvious. That’s the way it worked for me, too. But relatively
few monad tutorials are based on `join`

, so there’s an open niche in a crowded market.

This monad tutorial uses `join`

. Even better, it attempts to
cram everything you need to know about monads into 15 minutes. (Hey, everybody needs a gimmick, right?)

### Backtracking: The lazy way to code

We begin with a backtracking constraint solver. The idea: Given
possible values for `x`

and `y`

, we want to pick
those values which have a product of 8:

```
solveConstraint = do
x <- choose [1,2,3]
y <- choose [4,5,6]
guard (x*y == 8)
return (x,y)
```

Every time `choose`

is called, we save the current program
state. And every time `guard`

fails, we backtrack to a saved
state and try again. Eventually, we’ll hit the right answer:

```
> take 1 solveConstraint
[(2,4)]
```

Let’s build this program step-by-step in Haskell. When we’re done, we’ll have a monad.

### Implementing `choose`

How can we implement `choose`

in Haskell? The obvious version
hits a dead-end quickly:

```
-- Pick one element from the list, saving
-- a backtracking point for later on.
choose :: [a] -> a
choose xs = ...
```

We could be slightly sneakier, and return *all* the possible choices
as a list. We’ll use `Choice`

whenever we talk
about these lists, just to keep things clear:

```
type Choice a = [a]
choose :: [a] -> Choice a
choose xs = xs
```

Running this program returns all the possible answers:

```
> choose [1,2,3]
[1,2,3]
```

Now, since Haskell is a lazy language, we can work with infinite numbers of choices, and only compute those we actually need:

```
> take 3 (choose [1..])
[1,2,3]
```

Because Haskell doesn’t compute answers until we ask for them, we get the actual backtracking for free!

### Combining several choices

Now we have the list `[1,2,3]`

from our example. But what about
the list `[4,5,6]`

? Let’s ignore `guard`

for a minute, and work on getting the final pairs of numbers, unfiltered by any constraint.

For *each* item in the first list, we
need to pair it with *every* item in the second list. We can do
that using `map`

and the following helper function:

```
pair456 :: Int -> Choice (Int,Int)
pair456 x = choose [(x,4), (x,5), (x,6)]
```

Sure enough, this gives us all 9 combinations:

```
> map pair456 (choose [1,2,3])
[[(1,4),(1,5),(1,6)],
[(2,4),(2,5),(2,6)],
[(3,4),(3,5),(3,6)]]
```

But now we have two layers of lists. We can fix that using
`join`

:

```
join :: Choice (Choice a) -> Choice a
join choices = concat choices
```

This collapses the two layers into one:

```
> join (map pair456 (choose [1,2,3]))
[(1,4),(1,5),(1,6),
(2,4),(2,5),(2,6),
(3,4),(3,5),(3,6)]
```

Now that we have `join`

and `map`

, we have two-thirds
of a monad! (Math trivia: In category theory, `join`

is usually
written μ.)

In Haskell, `join`

and `map`

are usually combined
into a single operator:

```
-- Hide the standard versions so we can
-- reimplement them.
import Prelude hiding ((>>=), return)
(>>=) :: Choice a -> (a -> Choice b) ->
Choice b
choices >>= f = join (map f choices)
```

This allows us to simplify our example even further:

```
> choose [1,2,3] >>= pair456
[(1,4),(1,5),(1,6),
(2,4),(2,5),(2,6),
(3,4),(3,5),(3,6)]
```

### Completing our monad: `return`

We’re getting close! We only need to define the third
monad function (and then figure out what to do about `guard`

).

The missing function is almost too trivial to mention: Given a single value of
type `a`

, we need a convenient way to construct a value of type
`Choice a`

:

```
return :: a -> Choice a
return x = choose [x]
```

(More math trivia: `return`

is also known as `unit`

and η. That’s a lot of names for a very simple idea.)

Let’s start assembling the pieces. In the code below, ```
(\x ->
...)
```

creates a function with a single argument `x`

. Pay
careful attention to the parentheses:

```
makePairs :: Choice (Int,Int)
makePairs =
choose [1,2,3] >>= (\x ->
choose [4,5,6] >>= (\y ->
return (x,y)))
```

When run, this gives us a list of all possible combinations of `x`

and `y`

:

```
> makePairs
[(1,4),(1,5),(1,6),
(2,4),(2,5),(2,6),
(3,4),(3,5),(3,6)]
```

As it turns out, this is a really common idiom, so Haskell provides some nice syntactic sugar for us:

```
makePairs' :: Choice (Int,Int)
makePairs' = do
x <- choose [1,2,3]
y <- choose [4,5,6]
return (x,y)
```

This is equivalent to our previous implementation:

```
> makePairs'
[(1,4),(1,5),(1,6),
(2,4),(2,5),(2,6),
(3,4),(3,5),(3,6)]
```

### The final piece: `guard`

In our backtracking monad, we can represent failure as a choice between zero options. (And indeed, this is known as the “zero” for our monad. Not all useful monads have zeros, but you’ll see them occasionally.)

```
-- Define a "zero" for our monad. This
-- represents failure.
mzero :: Choice a
mzero = choose []
-- Either fail, or return something
-- useless and continue the computation.
guard :: Bool -> Choice ()
guard True = return ()
guard False = mzero
```

And now we’re in business:

```
solveConstraint = do
x <- choose [1,2,3]
y <- choose [4,5,6]
guard (x*y == 8)
return (x,y)
```

Note that since the return value of `guard`

is boring, we don’t actually bind it to any variable. Haskell treats this as if we had written:

```
-- "_" is an anonymous variable.
_ <- guard (x*y == 8)
```

That’s it!

```
> take 1 solveConstraint
[(2,4)]
```

### Another monad: `Maybe`

Every monad has three pieces: `return`

, `map`

and
`join`

. This pattern crops up everywhere. For example, we can
represent a computation which might fail using the `Maybe`

monad:

```
returnMaybe :: a -> Maybe a
returnMaybe x = Just x
mapMaybe :: (a -> b) -> Maybe a -> Maybe b
mapMaybe f Nothing = Nothing
mapMaybe f (Just x) = Just (f x)
joinMaybe :: Maybe (Maybe a) -> Maybe a
joinMaybe Nothing = Nothing
joinMaybe (Just x) = x
```

Once again, we can use `do`

to string together individual steps
which might fail:

```
tryToComputeX :: Maybe Int
tryToComputeX = ...
maybeExample :: Maybe (Int, Int)
maybeExample = do
x <- tryToComputeX
y <- tryToComputeY x
return (x,y)
```

Once you can explain how this works, you understand monads. And you’ll start to see this pattern everywhere. There’s something deep about monads and abstract algebra that I don’t understand, but which keeps cropping up over and over again.

### Miscellaneous notes

In Haskell, monads are normally defined using the `Monad`

type
class. This requires you to define two functions: `return`

and
`>>=`

. The `map`

function for monads is actually
named `fmap`

, and you can find it in the `Functor`

type class.

Also, every monad should obey three fairly reasonable rules if you don’t want bad things to happen:

```
-- Adding and collapsing an outer layer
-- leaves a value unchanged.
join (return xs) == xs
-- Adding and collapsing an inner layer
-- leaves a value unchanged.
join (fmap return xs) == xs
-- Join order doesn't matter.
join (join xs) == join (fmap join xs)
```

That’s it! For more information, see Wadler’s Monads for Functional Programming and the excellent All About Monads tutorial. And if you liked the backtracking monad, you’ll also like The Reasoned Schemer.

(This tutorial is dedicated to “ivanm” on #haskell. I hope this helps! And many thanks to Reg Braithwaite for commenting on an early draft.)

**Update:** Replaced `mult456`

with `pair456`

, and explained what’s going on with `guard`

in the final backtracking example.

Dan Psaid about 1 hour later:Three cheers for ‘join’!

The function ‘join’ has an important advantage over ‘bind’ – it’s not a higher order function. I think this may be what makes it easier to grasp for some people.

ivanmsaid about 3 hours later:Thanks for this! It makes the monad concept a little easier.

But now you’ve defined a new error type of mzero ;)

The only thing that confuses me (as I don’t have a Haskell interpreter handy to play with) is how “guard (x*y == 8)” works… aren’t x and y lists? Or does it try every value of the two lists compared to each other?

Ericsaid about 3 hours later:Oops! I should have explained that better. I may update the main post later, but for now, here are some notes.

First,

`guard (x*y 8)`

in the`do`

-body is short for`ignored <- guard (x*y 8)`

. So you’re really looking at this:As to why

`x`

and`y`

are integers, not lists: Remember that`xs >>= f`

is just`join (map f xs)`

. So the individual bindings expand to something that looks like:...where

`x`

is clearly bound to an individual value from`xs`

.When I’m trying to understand something like this, I have the most luck if I go over things a couple of times (and maybe write a test program or two). Reading this kind of Haskell is more like reading math than it is like reading regular code. Definitely a chance of pace. :-)

Good luck, and please don’t hesitate to ask if you have any more questions!

quicksilversaid about 16 hours later:Actually map for Monads isn’t called ‘fmap’ it’s called liftM, and you can define liftM given >>= and return (liftM f xs = xs >>= return.f).

Monads in haskell aren’t automatically Functors, although by rights they should be, and given a Monad you should get a valid functor instance using fmap = liftM.

The concept of failure, in monads which have it, is very powerful though. Nice tutorial.

Ericsaid about 16 hours later:quicksilver: Thanks for the kind words!

And yeah, I’m exceedingly annoyed that Haskell monads aren’t automatically functors. But they should be! :-)

You’re absolutely right that

`liftM`

is guaranteed to exist, and that`fmap`

is left up to the whims of library authors. But still, the connection between`fmap`

and monads is pretty important from a mathematical perspective.petekazsaid 1 day later:I don’t understand the guard (x*y==8) piece of the puzzle. If the result of this is bound to ignored, I’m missing in the case of guard True, how does it return (x,y). And then conversely, in the case of guard False, why does it skip return (x,y)?

Ericsaid 1 day later:petekaz: Yup, that’s the tricky bit.

`guard`

returns either`[()]`

(if is succeeds), or`[]`

(if it fails).So if the

`guard`

succeeds, you can imagine it as:...and if it fails:

If we go back to the

`map`

example, we can see how it works:When we call

`join`

, two branches of the computation disappear entirely:So where are the

`x`

and`y`

in this example? Well, they’re the arguments of the anonymous functions in my previous comment, so they’ll be available when we need them in the final step. (It might help to read about closures if none of this makes any sense at all.)Alternatively, are you familiar with list comprehensions? If so, this program is equivalent to:

If

`x*y`

doesn’t equal`8`

, then there are no values to pick from for`ignored`

, and the final`(x,y)`

won’t be computed.Anyway, I hope one of these perspectives will help you puzzle it out!

Michaelsaid 1 day later:Nice article, Eric.

Another fun citation of type “If you like the backtracking monad…” is Oleg Kiselyov’s paper:

http://okmij.org/ftp/Computation/monads.html#LogicT

Ericsaid 1 day later:Yeah, I was reading that yesterday evening! I recommended it highly for anybody who’s into backtracking monads.

Miles Gouldsaid 17 days later:A friend of mine claims he didn’t truly understand monads until he understood join.There’s a reason why category theorists invariably define monads in terms of join rather than bind!

As for your question about monads and algebra: I don’t really understand monad transformers myself, so can’t help with that bit, but there is indeed a deep connection between monads and algebraic theories. Any standard category theory book should help you here – try Borceux’s Handbook of Categorical Algebra. Or just google for “category theory lecture notes”.

I found enlightenment came from considering this: an /algebra/ for a monad T on a category C is an object A and a morphism a:TA->A, satisfying some obvious compatibility conditions with eta and mu. Now, what is an algebra for the List monad, also known as the “free monoid” monad? How about the free group monad?

Sorry if I’m teaching you to suck eggs here…

Ericsaid 18 days later:Initial algebras are definitely interesting! And they

arerelated to monads, at least according to Edmund Robinson’s Variations on Algebra: monadicity and generalisations of equational theories (which I haven’t finished reading yet).A monad is a triple (

M,η,μ), whereMis a functor, and η:a→M aand μ:M (M a)→M aare natural transformations.A monad morphism is a transformation from one monad to another, e.g., (

M,η,μ) → (M′,η′,μ′). And you can build various interesting algebras by composing monad morphisms.And that’s the bit I still haven’t wrapped my brain completely around. :-)

osfameronsaid 36 days later:Your 15 minutes are not my 15 minutes ;-)

Some things are still unclear: like why you need to use “choose [1,2,3]” given that that just evaluates to [1,2,3]. I have a feeling that might be important though?

Still, working through the code bit by bit is very helpful, and I

almostunderstand some of it now… though I am still having problems with the guard – I don’t understand how it affects the following return statement.I will give another “15 minutes” to this tonight if I get a chance…

Ericsaid 36 days later:Y15MMY: Your 15 minutes may vary. ;-)

The

`choose`

function does nothing but make the code more readable; it’s not actually needed, not even for the type declaration.The

`guard`

is fairly sneaky. If the condition is true, it chooses a dummy value from a single-element list and throws it away. If the condition is false, though, it “chooses” an element from an empty list.Imagine, for a moment, the guard condition is always false. We bind

`x`

to one of 3 values, and`y`

to one of 3 values, giving us 3×3=9 different possible answers.If we imagine that the

`guard`

condition is always false, it would choose from 0 possibilities, giving us 3×3×0=0 possible answers. Do you see how the empty choice suppresses answers? It doesn’t have to effect the`return`

statement directly, because we never make it there.Of course, if the

`guard`

is only falsesomeof the time, then it only suppresses certain answers.I hope this helps! Please don’t hesitate to ask more questions.

osfameronsaid 37 days later:Thanks for the reply: ok, I think I get that better: because >>= is defined in terms of map, it combines with the output of guard as well as the unfiltered 3×3, that’s the piece that I wasn’t seeing.

I think in the description above, saying that the output of guard is “boring” maybe throws you off the scent a little? It’s only boring to us in that we don’t need a variable bound to it, but it still participates in the join.

Steven Brandtsaid 106 days later:As an old imperative programmer I can’t help but think this is just like:

for(int x=1;x<=3;x++) { for(int y=4;y<=6;y++) { if(x*y == 8) { list.append(new Pair(x,y)); } } }

Except the imperative code is a lot easier to understand. I’m trying really hard to understand monads and see their value, but I still can’t quite get there.

Ericsaid 108 days later:Steven: Don’t assume that the simplest example of something is also the most compelling. :-)

For more interesting monads, see the the paper on composable memory transactions, the various probability monads, sigfpe’s article on graph-walking monads, and the paper on using monads for safe hardware abstraction (PDF). If you work with .NET, you may also want to check out Microsoft’s new LINQ technology, which relies heavily on monads for database access and XML processing.

Once you understand monads, you start seeing them everywhere—they’re very general tools, and they can be used to solve a wide variety of problems.

As with any other abstraction, you can do without monads. But if one abstraction solves so many problems so elegantly, it’s worth learning about.

Eriksaid 138 days later:So, before going back over this posting using a full fifteen minutes (I just skimmed it), why use an example that could be solved with the following?

fun :: [Int] → [Int] → [(Int, Int)] fun r s = [(x,y) | x <

~~r, y <~~s, x*y == 8]I find that when toy problems with both simple and obvious solutions are used to demonstrate hard concepts it can obfuscate the idea because while you may be showing the how of the concept, you aren’t explaining the why.

Could you maybe show an example of join that wouldn’t be simple to express with a simple list comprehension?

Ericsaid 138 days later:Erik: Well, a list comprehension

isa monad. Or, if you want to look at it the other way around, a monad is a generalized comprehension that can be defined for almost any parameterized data type.For a more complicated monad, see Bayes’ rule in Haskell. For a rather different monad, see SPJ’s papers on Software Transactional Memory. Basically, by slightly generalizing list comprehensions, you get a tool which solves a surprisingly large and diverse set of problems.

I’ll write a longer post one of these days cataloging a bunch of useful monads. Until then, would anyone like to plug their favorite monads? :-)

Eriksaid 139 days later:Eric: Yes, after making that post I went on to read the “All About Monads” link you placed at the end of the post and came to that understanding. Thanks!

Michaelsaid 153 days later:Nice tutorial, maybe would be a good idea to abstract to tuples of arbitrary length instead of just pairs, to make the backtracking even more obvious.

Drew Vogelsaid 220 days later:I understand the Haskell constructs involved in your example, but I had trouble finding the actual backtracking. I learned of backtracking as an optimization of a search algorithm that worked by tossing out large chunks of the search space. Because 2*4 is the only combination that equals 8, it isn’t obvious that the “take 1” is throwing out the rest of the problem space. Perhaps you could use a combination of the even function and infinite lists to increase the search space and illustrate how you are tossing out a portion of the search space.

Also, a single file with the final code would be wonderful. It’s hard looking at it outside of vim :)

Samposaid 221 days later:Now that was helpful. Time well spend even though it was a bit more than 15 minutes :-) I think you managed to bring me on the verge of understanding monads.

I think the Option -monad explanation did it for me… I’m reiterating just to see if I really understand: I see the the functions returnMaybe, mapMaybe and joinMaybe as ‘rules’. For example mapMaybe tells how to handle function calls where arguments belong to Maybe -monad. 1) If function gets called with Nothing, the return value will always be Nothing. 2) If function gets called with a valid value Just(x), the value x is extracted from Just(x) and handed to the function.

This ensures the Maybe works as expected but not every function needs to know how to handle them.

joinMaybe defines how to handle the returned (nested) values. If the nested values all are Just(something) we can get rid of extra Justs and keep values. If there is a single Nothing, everything will end up being Nothing.

Doug Auclairsaid 246 days later:Finally! (an aside: I did read the LogicT paper) I’ve been looking for concise backtracking in Haskell. Of course, it’s been there all the time, and there’s been several mentions that monads permit backtracking, but your tutorial, Eric, gave me a simple and clear example. Thank you. I’ll start working through some other (similar) examples.

I’ve also read Reasoned Schemer (Dr. William Byrd autographed it for me at the PADL 2006) several times, so I’m looking to combine that and LogicT and your example into a full-blown (concise) rule- or relational-based programming system. Backtracking is key, but do I need full unification? Not too sure.

Thanks, again.

Doug Auclairsaid 247 days later:Of course, it’s been pointed out in other papers about monads that the do notation:

do x <- choose [1,2,3] ...

is very similar to the list compression notation: [x|x <- [1,2,3]]

(in fact, gopher, Haskell’s predecessor, they were so similar that they were the same thing)

And, seeing that the type Choice is an alias for the list type, the monadic expression can simply be rewritten: [(x,y)|x<

~~[1,2,3], y<~~[4,5,6], x*y == 8]Haskell programmers claim their list compression is even more powerful that Prolog’s, and the above example is good supporting evidence …

Dan Dingleberrysaid 261 days later:Dude, WTH is a monad? at least define the dang thing first for casual passers by.

Ericsaid 261 days later:Monads are a somewhat mysterious programming language feature, most commonly encountered in Haskell. See the examples I link to in the first paragraph.

Basically, monads are a clever bit of category theory that can be used to structure control flow in a program.

If you weren’t already thinking, “Hmm, I really ought to learn more about monads,” then you might be happier avoiding the entire topic. :-)

Michaelsaid 261 days later:To add to what Eric wrote, I’d go so far as to say that monads aren’t even all that mysterious, but in fact you will have an easier time understanding them (at least in terms of their utility to a programmer) if you just forget about category theory entirely.

From a programmer’s perspective, a monad is more or less a way of factoring out some common code from various types of operations that are composed sequentially. The fact that this is possible seems strange at firstand I think that’s what leads people to feel monadic types are confusing. Well, that, and the terminology, which can be misleadingly abstruse.

Davidsaid 320 days later:Thanks for this! I’ve read a lot about Haskell but I’ve been confounded by monads. I’ve read every tutorial I’ve found and each helps me understand a little bit more. This one was very helpful. I have an unanswered question though…

The map part of a monad doesn’t seem to have to be the actual “map” function. The Maybe monad’s “map” seems to just propagate the Nothing or Just along as appropriate. Is this correct?

How does this work in the case of the IO monad? I’ve been staring at the GHC prelude code for the past 15 minutes but I’m not fully comprehending what it is doing. I

thinkits >>= function is just doing a function application instead of a “map”, but I’m not sure (that case part is confusing me). Idosee the “join” where it strips out nested IOs. Can anyone help out?Ericsaid 321 days later:David: Excellent questions! The “map” function used in monads is a generalization of the regular “map”.

This will make more sense if you think of “Maybe” as a very specialized containera container which can only hold a single element. From that perspective, “mapMaybe” looks very natural:

In English, this says: “If the container is empty, leave it that way. If it contains an element, apply ‘f’ to the element.” Basically, it’s no different than applying “map” to a zero- or one-element list.

As for the IO monad, don’t worry too much about it. :-) It’s actually a pretty atypical monad.

But if it helps, try reading the type “IO a” as “an IO action returning type a”. Again, if you squint at it right, you can think of an IO action as a single-element collection. You just have to some IO to find out what element is stored in it!

The “mapIO” function can be defined as:

Does this help any?

Joel Houghsaid 394 days later:Great article! A few things clicked for me while reading. I found that writing out the choice monad example step by step helped me to understand. Like so:

Thanks for the epiphanies!

Doug Auclairsaid 428 days later:Linking this article on an introductory post about Maybe, List and Either monads and their uses for nondeterministic programming.

Coincidentally, I wrote a same-game-like puzzle solver (cf. comp.lang.haskell), where I needed to flatten a list of lists. I didn’t see `flatten` in the Prelude or List, so I wrote it out. I then realized that `msum` is flatten for lists of depth 1 … then I reread this article where join :: m (m a) → m a does the same thing. Applied that in my code and noticed I had this pattern: join $ map f x … and replaced that with x >>= f.

Sigh! Months later and I’m still not recognizing when to use a simple bind!