## Refactoring probability distributions, part 2: Random sampling

Posted by Eric Kidd Wed, 21 Feb 2007 23:53:00 GMT

In Part 1, we cloned PFP, a library for computing with probability distributions. PFP represents a distribution as a list of possible values, each with an associated probability.

But in the real world, things aren’t always so easy. What if we wanted to pick a random number between 0 and 1? Our previous implementation would break, because there’s an infinite number of values between 0 and 1—they don’t exactly fit in a list.

As it turns out, Sungwoo Park and colleagues found an elegant solution to this problem. They represented probability distributions as sampling functions, resulting in something called the λ_{◯} calculus. (I have no idea how to pronounce this!)

With a little bit of hacking, we can use their sampling functions as a drop-in replacement for PFP.

### A common interface

Since we will soon have two ways to represent probability distributions, we need to define a common interface.

```
type Weight = Float
class (Functor d, Monad d) => Dist d where
weighted :: [(a, Weight)] -> d a
uniform :: Dist d => [a] -> d a
uniform = weighted . map (\x -> (x, 1))
```

The function `uniform`

will create an equally-weighted distribution from a list of values. Using this API, we can represent a two-child family as follows:

```
data Child = Girl | Boy
deriving (Show, Eq, Ord)
child :: Dist d => d Child
child = uniform [Girl, Boy]
family :: Dist d => d [Child]
family = do
child1 <- child
child2 <- child
return [child1, child2]
```

Now, we need to implement this API two different ways: Once with lists, and a second time with sampling functions.

### Finite distibutions, revisted

The monad from part 1 now becomes:

```
type FDist = PerhapsT ([])
instance Dist FDist where
weighted [] = error "Empty distribution"
weighted xws = PerhapsT (map weight xws)
where weight (x,w) = Perhaps x (P (w / sum))
sum = foldl' (+) 0 (map snd xws)
exact :: FDist a -> [Perhaps a]
exact = runPerhapsT
```

We can run this as follows:

```
> exact family
[Perhaps [Girl,Girl] 25.0%,
Perhaps [Girl,Boy] 25.0%,
Perhaps [Boy,Girl] 25.0%,
Perhaps [Boy,Boy] 25.0%]
```

### Sampling functions

This second part is also easy. First, we define a type `Rand a`

, which represents a value of type `a`

computed using random numbers.

We implement `Rand`

by reduction to Haskell’s IO monad. (There’s any number of other ways to do this, of course.)

```
newtype Rand a = Rand { runRand :: IO a }
randomFloat :: Rand Float
randomFloat = Rand (getStdRandom (random))
```

Next, we make `Rand`

a trivial, sequential monad:

```
instance Functor Rand where
fmap = liftM
instance Monad Rand where
return x = Rand (return x)
r >>= f = Rand (do x <- runRand r
runRand (f x))
```

We can turn any `FDist`

into a `Rand`

by picking an element at random:

```
instance Dist Rand where
weighted = liftF . weighted
liftF :: FDist a -> Rand a
liftF fdist = do
n <- randomFloat
pick (P n) (runPerhapsT fdist)
pick :: Monad m => Prob -> [Perhaps a] -> m a
pick _ [] = fail "No values to pick from"
pick n ((Perhaps x p):ps)
| n <= p = return x
| otherwise = pick (n-p) ps
```

Of course, a single random sample won’t do us much good. We need functions for repeated sampling and for displaying the results:

```
sample :: Rand a -> Int -> Rand [a]
sample r n = sequence (replicate n r)
sampleIO r n = runRand (sample r n)
histogram :: Ord a => [a] -> [Int]
histogram = map length . group . sort
```

And sure enough, we get a nice, even distribution:

```
> liftM histogram (sampleIO family 1000)
[243,242,254,261]
```

That’s it! Two monads for the price of one.

Part 3: Coming soon.

David Housesaid 5 days later:Couldn’t you just use newtype-deriving to make Rand a monad, like you did for making Prob an instance of Num in the first part? Also,

`sequence (replicate f xs)`

is just`replicateM f xs`

.Great articles, by the way!

Ericsaid 5 days later:I suspect that newtype-deriving will work quite nicely. Thanks!

It might also be desirable to use the Rand monad from the Haskell wiki, or perhaps the splittable version (which might make it feasible to run sampling computations in parallel?).