Bayes' rule in Haskell, or why drug tests don't work
Part 3 of Refactoring Probability Distributions.
(Part 1: PerhapsT,
Part 2: Sampling functions)
A very senior Microsoft developer who moved to Google told me that Google works and thinks at a higher level of abstraction than Microsoft. “Google uses Bayesian filtering the way Microsoft uses the if statement,” he said. -Joel Spolsky
I really love this quote, because it’s insanely provocative
to any language designer. What would a programming language look
like if Bayes’ rule were as simple as an if
statement?
Let’s start with a toy problem, and refactor it until Bayes’ rule is baked right into our programming language.
Imagine, for a moment, that we’re in charge of administering drug tests for a small business. We’ll represent each employee’s test results (and drug use) as follows:
data Test = Pos | Neg
deriving (Show, Eq)
data HeroinStatus = User | Clean
deriving (Show, Eq)
Assuming that 0.1% of our employees have used heroin recently, and that our test is 99% accurate, we can model the testing process as follows:
drugTest1 :: Dist d => d (HeroinStatus, Test)
drugTest1 = do
heroinStatus <- percentUser 0.1
testResult <-
if heroinStatus == User
then percentPos 99
else percentPos 1
return (heroinStatus, testResult)
-- Some handy distributions.
percentUser p = percent p User Clean
percentPos p = percent p Pos Neg
-- A weighted distribution with two elements.
percent p x1 x2 =
weighted [(x1, p), (x2, 100-p)]
This code is based our FDist monad, which is in turn based on
PFP. Don’t worry if it seems slightly mysterious; you can think of the
“<-
” operator as choosing an element from a probability
distribution.
Running our drug test shows every possible combination of the two variables:
> exact drugTest1
[Perhaps (User,Pos) 0.1%,
Perhaps (User,Neg) 0.0%,
Perhaps (Clean,Pos) 1.0%,
Perhaps (Clean,Neg) 98.9%]
If you look carefully, we have a problem. Most of the employees who test positive are actually clean! Let’s tweak our code a bit, and try to zoom in on the positive test results.
Ignoring negative test results
We don’t care about employees who test negative for heroin use. We can
throw away those results using Haskell’s Maybe
type:
drugTest2 :: Dist d => d (Maybe HeroinStatus)
drugTest2 = do
(heroinStatus, testResult) <- drugTest1
return (if testResult == Pos
then Just heroinStatus
else Nothing)
This shows us just the variables we’re interested in, but the percentages are still a mess:
> exact drugTest2
[Perhaps (Just User) 0.1%,
Perhaps Nothing 0.0%,
Perhaps (Just Clean) 1.0%,
Perhaps Nothing 98.9%]
Ideally, we want to reach into that distribution, discard all the
Nothing
values, and then normalize the remaining percentages
so that they add up to 100%. We can do that with a bit of Haskell code:
value (Perhaps x _) = x
prob (Perhaps _ p) = p
catMaybes' :: [Perhaps (Maybe a)] -> [Perhaps a]
catMaybes' [] = []
catMaybes' (Perhaps Nothing _ : xs) =
catMaybes' xs
catMaybes' (Perhaps (Just x) p : xs) =
Perhaps x p : catMaybes' xs
onlyJust :: FDist (Maybe a) -> FDist a
onlyJust dist
| total > 0 = PerhapsT (map adjust filtered)
| otherwise = PerhapsT []
where filtered = catMaybes' (runPerhapsT dist)
total = sum (map prob filtered)
adjust (Perhaps x p) =
Perhaps x (p / total)
And sure enough, that lets us zoom right in on the interesting values:
> exact (onlyJust drugTest2)
[Perhaps User 9.0%,
Perhaps Clean 91.0%]
OK, that’s definitely not good news. Even though our test is 99% accurate, 91% of the people we accuse will be innocent!
(If this seems counter-intuitive, imagine what happens if we have no employees who use heroin. Out of 1000 employees, 10 will have a positive test result, and 100% of them will be innocent.)
Baking Maybe into our monad
The above code gets the right answer, but it’s still pretty awkward. We
have Just
and Nothing
all over the place,
stinking up our application code. Why don’t we hide them inside our monad?
Fortunately, we can do just that, using the MaybeT
monad
transformer. Don’t worry if you don’t understand the details:
type FDist' = MaybeT FDist
-- Monads are Functors, no matter what
-- Haskell thinks.
instance Functor FDist' where
fmap = liftM
instance Dist FDist' where
weighted xws = lift (weighted xws)
As Russel and Norvig point out (chapter 13), cancelling out the impossible worlds and normalizing the remaining probabilities is equivalent to Bayes’ rule. So in homage, we can write:
bayes :: FDist' a -> [Perhaps a]
bayes = exact . onlyJust . runMaybeT
We’re missing just one piece, a statement to prune out impossible worlds:
condition :: Bool -> FDist' ()
condition = MaybeT . return . toMaybe
where toMaybe True = Just ()
toMaybe False = Nothing
And now, here’s our final drug test.
drugTest3 :: FDist' HeroinStatus ->
FDist' HeroinStatus
drugTest3 prior = do
heroinStatus <- prior
testResult <-
if heroinStatus == User
then percentPos 99
else percentPos 1
-- As easy as an 'if' statement:
condition (testResult == Pos)
return heroinStatus
This gives us the same results as before:
> bayes (drugTest3 (percentUser 0.1))
[Perhaps User 9.0%,
Perhaps Clean 91.0%]
So testing all of our employees is still hopeless. But what if we only tested employees with clear signs of heroin abuse? In that case, there’s probably a 50/50 chance of drug use.
And that gives us remarkably better results. Out of the people who test positive, 99% will be using drugs:
> bayes (drugTest3 (percentUser 50))
[Perhaps User 99.0%,
Perhaps Clean 1.0%]
The moral of this story: No matter how accurate our drug test, we shouldn’t bother to run it unless we have probable cause.
Similar constraints apply to any population-wide surveillance: If you’re searching for something sufficiently rare (criminals, terrorists, strange diseases), it doesn’t matter how good your tests are. If you test everyone, you’ll drown under thousands of false positives.
Extreme Haskell geeking
If we go back and look at part 1, this gives us:
type FDist' = MaybeT (PerhapsT [])
This has some interesting consequences:
- If we collapse
MaybeT
intoPerhapsT
, we can work with probability distributions that don’t sum to 1, where the “missing” probability represents an impossible world. - We can add
condition
toRand
(part 2) usingMaybeT Rand
. Bayes’ rule is basically the combination ofMaybeT
and a suitablecatMaybes
function applied to any probability distribution monad.
Also worth noting: Popular theories of natural language semantics are based on the λ-calculus. Chung-chieh Shan has a fascinating paper showing how to incorporate monads and monad transformers into this model. If we replaced Chung-chieh Shan’s Set monad with one of our Bayesian monads, what would we get? (Currently, I have no idea.)
Part 4: The naive Bayes classifier
Part 5: What happens if we replace MaybeT with PerhapsT?
Want to contact me about this article? Or if you're looking for something else to read, here's a list of popular posts.
Fractional
instance forProb
as you're using division inonlyJust
. This is fine, you can just derive it.Functor
instance forFDist'
isn't needed; there's already aninstance (Functor m) => Functor (MaybeT m)
in theMaybeT
module.condition
is justguard
.