# Bayes' rule in Haskell, or why drug tests don't work

Part 3 of Refactoring Probability Distributions.

(Part 1: PerhapsT,
Part 2: Sampling functions)

A very senior Microsoft developer who moved to Google told me that Google works and thinks at a higher level of abstraction than Microsoft. "Google uses Bayesian filtering the way Microsoft uses the if statement," he said.-Joel Spolsky

I really love this quote, because it's insanely provocative
to any language designer. What *would* a programming language look
like if Bayes' rule were as simple as an `if`

statement?

Let's start with a toy problem, and refactor it until Bayes' rule is baked right into our programming language.

Imagine, for a moment, that we're in charge of administering drug tests for a small business. We'll represent each employee's test results (and drug use) as follows:

```
data Test = Pos | Neg
deriving (Show, Eq)
data HeroinStatus = User | Clean
deriving (Show, Eq)
```

Assuming that 0.1% of our employees have used heroin recently, and that our test is 99% accurate, we can model the testing process as follows:

```
drugTest1 :: Dist d => d (HeroinStatus, Test)
drugTest1 = do
heroinStatus <- percentUser 0.1
testResult <-
if heroinStatus == User
then percentPos 99
else percentPos 1
return (heroinStatus, testResult)
-- Some handy distributions.
percentUser p = percent p User Clean
percentPos p = percent p Pos Neg
-- A weighted distribution with two elements.
percent p x1 x2 =
weighted [(x1, p), (x2, 100-p)]
```

This code is based our FDist monad, which is in turn based on
PFP. Don't worry if it seems slightly mysterious; you can think of the
“`<-`

" operator as choosing an element from a probability
distribution.

Running our drug test shows every possible combination of the two variables:

```
> exact drugTest1
[Perhaps (User,Pos) 0.1%,
Perhaps (User,Neg) 0.0%,
Perhaps (Clean,Pos) 1.0%,
Perhaps (Clean,Neg) 98.9%]
```

If you look carefully, we have a problem. Most of the employees who test positive are actually clean! Let's tweak our code a bit, and try to zoom in on the positive test results.

### Ignoring negative test results

We don't care about employees who test negative for heroin use. We can
throw away those results using Haskell's `Maybe`

type:

```
drugTest2 :: Dist d => d (Maybe HeroinStatus)
drugTest2 = do
(heroinStatus, testResult) <- drugTest1
return (if testResult == Pos
then Just heroinStatus
else Nothing)
```

This shows us just the variables we're interested in, but the percentages are still a mess:

```
> exact drugTest2
[Perhaps (Just User) 0.1%,
Perhaps Nothing 0.0%,
Perhaps (Just Clean) 1.0%,
Perhaps Nothing 98.9%]
```

Ideally, we want to reach into that distribution, discard all the
`Nothing`

values, and then normalize the remaining percentages
so that they add up to 100%. We can do that with a bit of Haskell code:

```
value (Perhaps x _) = x
prob (Perhaps _ p) = p
catMaybes' :: [Perhaps (Maybe a)] -> [Perhaps a]
catMaybes' [] = []
catMaybes' (Perhaps Nothing _ : xs) =
catMaybes' xs
catMaybes' (Perhaps (Just x) p : xs) =
Perhaps x p : catMaybes' xs
onlyJust :: FDist (Maybe a) -> FDist a
onlyJust dist
| total > 0 = PerhapsT (map adjust filtered)
| otherwise = PerhapsT []
where filtered = catMaybes' (runPerhapsT dist)
total = sum (map prob filtered)
adjust (Perhaps x p) =
Perhaps x (p / total)
```

And sure enough, that lets us zoom right in on the interesting values:

```
> exact (onlyJust drugTest2)
[Perhaps User 9.0%,
Perhaps Clean 91.0%]
```

OK, that's definitely not good news. Even though our test is 99% accurate, 91% of the people we accuse will be innocent!

(If this seems
counter-intuitive, imagine what happens if we have *no* employees who
use heroin. Out of 1000 employees, 10 will have a positive test result,
and **100%** of them will be innocent.)

### Baking Maybe into our monad

The above code gets the right answer, but it's still pretty awkward. We
have `Just`

and `Nothing`

all over the place,
stinking up our application code. Why don't we hide them inside our monad?

Fortunately, we can do just that, using the `MaybeT`

monad
transformer. Don't worry if you don't understand the details:

```
type FDist' = MaybeT FDist
-- Monads are Functors, no matter what
-- Haskell thinks.
instance Functor FDist' where
fmap = liftM
instance Dist FDist' where
weighted xws = lift (weighted xws)
```

As Russel and Norvig point out (chapter 13), cancelling out the impossible worlds and normalizing the remaining probabilities is equivalent to Bayes' rule. So in homage, we can write:

```
bayes :: FDist' a -> [Perhaps a]
bayes = exact . onlyJust . runMaybeT
```

We're missing just one piece, a statement to prune out impossible worlds:

```
condition :: Bool -> FDist' ()
condition = MaybeT . return . toMaybe
where toMaybe True = Just ()
toMaybe False = Nothing
```

And now, here's our final drug test.

```
drugTest3 :: FDist' HeroinStatus ->
FDist' HeroinStatus
drugTest3 prior = do
heroinStatus <- prior
testResult <-
if heroinStatus == User
then percentPos 99
else percentPos 1
-- As easy as an 'if' statement:
condition (testResult == Pos)
return heroinStatus
```

This gives us the same results as before:

```
> bayes (drugTest3 (percentUser 0.1))
[Perhaps User 9.0%,
Perhaps Clean 91.0%]
```

So testing all of our employees is still hopeless. But what if we only tested employees with clear signs of heroin abuse? In that case, there's probably a 50/50 chance of drug use.

And that gives us remarkably better results. Out of the people who test positive, 99% will be using drugs:

```
> bayes (drugTest3 (percentUser 50))
[Perhaps User 99.0%,
Perhaps Clean 1.0%]
```

The moral of this story: No matter how accurate our drug test, we shouldn't bother to run it unless we have probable cause.

Similar constraints apply to any population-wide surveillance: If you're searching for something sufficiently rare (criminals, terrorists, strange diseases), it doesn't matter how good your tests are. If you test everyone, you'll drown under thousands of false positives.

### Extreme Haskell geeking

If we go back and look at part 1, this gives us:

```
type FDist' = MaybeT (PerhapsT [])
```

This has some interesting consequences:

- If we collapse
`MaybeT`

into`PerhapsT`

, we can work with probability distributions that don't sum to 1, where the "missing" probability represents an impossible world. - We can add
`condition`

to`Rand`

(part 2) using`MaybeT Rand`

. Bayes' rule is basically the combination of`MaybeT`

and a suitable`catMaybes`

function applied to any probability distribution monad.

Also worth noting: Popular theories of natural language semantics are based on the λ-calculus. Chung-chieh Shan has a fascinating paper showing how to incorporate monads and monad transformers into this model. If we replaced Chung-chieh Shan's Set monad with one of our Bayesian monads, what would we get? (Currently, I have no idea.)

Part 4: The naive Bayes classifier

Part 5: What happens if we replace MaybeT with PerhapsT?

Want to contact me about this article? Or if you're looking for something else to read, here's a list of popular posts.

hasa sense of "fuzzy category").`Fractional`

instance for`Prob`

as you're using division in`onlyJust`

. This is fine, you can just derive it.`Functor`

instance for`FDist'`

isn't needed; there's already an`instance (Functor m) => Functor (MaybeT m)`

in the`MaybeT`

module.`condition`

is just`guard`

.