Bayes' rule in Haskell, or why drug tests don't work

Posted by Eric Kidd Thu, 22 Feb 2007 18:11:00 GMT

Part 3 of Refactoring Probability Distributions.
(Part 1: PerhapsT, Part 2: Sampling functions)

A very senior Microsoft developer who moved to Google told me that Google works and thinks at a higher level of abstraction than Microsoft. “Google uses Bayesian filtering the way Microsoft uses the if statement,” he said. -Joel Spolsky

I really love this quote, because it’s insanely provocative to any language designer. What would a programming language look like if Bayes’ rule were as simple as an if statement?

Let’s start with a toy problem, and refactor it until Bayes’ rule is baked right into our programming language.

Imagine, for a moment, that we’re in charge of administering drug tests for a small business. We’ll represent each employee’s test results (and drug use) as follows:

data Test = Pos | Neg
  deriving (Show, Eq)

data HeroinStatus = User | Clean
  deriving (Show, Eq)

Assuming that 0.1% of our employees have used heroin recently, and that our test is 99% accurate, we can model the testing process as follows:

drugTest1 :: Dist d => d (HeroinStatus, Test)
drugTest1 = do
  heroinStatus <- percentUser 0.1
  testResult <-
    if heroinStatus == User
      then percentPos 99
      else percentPos 1
  return (heroinStatus, testResult)

-- Some handy distributions.
percentUser p = percent p User Clean
percentPos p = percent p Pos Neg

-- A weighted distribution with two elements.
percent p x1 x2 =
  weighted [(x1, p), (x2, 100-p)]

This code is based our FDist monad, which is in turn based on PFP. Don’t worry if it seems slightly mysterious; you can think of the “<-” operator as choosing an element from a probability distribution.

Running our drug test shows every possible combination of the two variables:

> exact drugTest1
[Perhaps (User,Pos) 0.1%,
 Perhaps (User,Neg) 0.0%,
 Perhaps (Clean,Pos) 1.0%,
 Perhaps (Clean,Neg) 98.9%]

If you look carefully, we have a problem. Most of the employees who test positive are actually clean! Let’s tweak our code a bit, and try to zoom in on the positive test results.

Read more...

Tags , , , , ,

Refactoring probability distributions, part 2: Random sampling

Posted by Eric Kidd Wed, 21 Feb 2007 23:53:00 GMT

In Part 1, we cloned PFP, a library for computing with probability distributions. PFP represents a distribution as a list of possible values, each with an associated probability.

But in the real world, things aren’t always so easy. What if we wanted to pick a random number between 0 and 1? Our previous implementation would break, because there’s an infinite number of values between 0 and 1—they don’t exactly fit in a list.

As it turns out, Sungwoo Park and colleagues found an elegant solution to this problem. They represented probability distributions as sampling functions, resulting in something called the λ calculus. (I have no idea how to pronounce this!)

With a little bit of hacking, we can use their sampling functions as a drop-in replacement for PFP.

A common interface

Since we will soon have two ways to represent probability distributions, we need to define a common interface.

type Weight = Float

class (Functor d, Monad d) => Dist d where
  weighted :: [(a, Weight)] -> d a

uniform :: Dist d => [a] -> d a
uniform = weighted . map (\x -> (x, 1))

The function uniform will create an equally-weighted distribution from a list of values. Using this API, we can represent a two-child family as follows:

data Child = Girl | Boy
  deriving (Show, Eq, Ord)

child :: Dist d => d Child
child = uniform [Girl, Boy]

family :: Dist d => d [Child]
family = do
  child1 <- child
  child2 <- child
  return [child1, child2]

Now, we need to implement this API two different ways: Once with lists, and a second time with sampling functions.

Read more...

Tags , , , ,

Refactoring probability distributions, part 1: PerhapsT

Posted by Eric Kidd Wed, 21 Feb 2007 08:06:00 GMT

(Warning: This article is a bit more technical than most of my stuff. It assumes prior knowledge of monads and monad transformers.)

Martin Erwig and Steve Kollmansberger wrote PFP, a really sweet Haskell library for computing with probabilities. To borrow their example:

die :: Dist Int
die = uniform [1..6]

If we roll a die, we get the expected distribution of results:

> die
1  16.7%
2  16.7%
3  16.7%
4  16.7%
5  16.7%
6  16.7%

If you haven’t seen PFP before, I strongly encourage you to check it out. You can use it to solve all sorts of probability puzzles.

Anyway, I discovered an interesting way to implement PFP using monad transformers. Here’s what it looks like:

type Dist = PerhapsT ([])

uniform = weighted . map (\x -> (x, 1))

In other words, Dist can be written by adding some semantics to the standard list monad.

Perhaps: A less specific version of Maybe

First, let’s define a simple probability type:

newtype Prob = P Float
  deriving (Eq, Ord, Num)

instance Show Prob where
  show (P p) = show intPart ++ "." ++ show fracPart ++ "%"
    where digits = round (1000 * p)
          intPart = digits `div` 10
          fracPart = digits `mod` 10

Thanks to the deriving (Num) declaration, we can treat Prob like any other numeric type.

We can now define Perhaps, which represents a value with an associated probability:

data Perhaps a = Perhaps a Prob
  deriving (Show)

Now, this is just a generalization of Haskell’s built-in Maybe type, which treats a value as either present (probability 1) or absent (probability 0). All we’ve added is a range of possibilities in between: Perhaps x 0.5 represents a 50% chance of having a value.

Note that there’s one small trick here: When the probability of a value is 0, we may not actually know it! But because Haskell is a lazy language, we can write:

Perhaps undefined 0

We’ll need a convenient way to test for this case, to make sure we don’t try to use any undefined values:

neverHappens (Perhaps _ 0) = True
neverHappens _             = False

So Perhaps is just like Maybe. As it turns out, they’re both monads, and they both have an associated monad transformer.

Read more...

Tags , , , ,