Posted by Eric Kidd
Thu, 22 Feb 2007 23:11:00 GMT
Part 3 of Refactoring Probability Distributions.
(Part 1: PerhapsT,
Part 2: Sampling functions)
A very senior Microsoft developer who moved to Google told
me that Google works and thinks at a higher level of abstraction than
Microsoft. “Google uses Bayesian filtering the way Microsoft uses the if
statement,” he said. -Joel Spolsky
I really love this quote, because it’s insanely provocative
to any language designer. What would a programming language look
like if Bayes’ rule were as simple as an if statement?
Let’s start with a toy problem, and refactor it until Bayes’ rule is baked
right into our programming language.
Imagine, for a moment, that we’re in charge of administering drug tests for
a small business. We’ll represent each employee’s test results (and drug use) as follows:
data Test = Pos | Neg
deriving (Show, Eq)
data HeroinStatus = User | Clean
deriving (Show, Eq)
Assuming that 0.1% of our employees have used heroin recently, and that our test is 99%
accurate, we can model the testing process as follows:
drugTest1 :: Dist d => d (HeroinStatus, Test)
drugTest1 = do
heroinStatus <- percentUser 0.1
testResult <-
if heroinStatus == User
then percentPos 99
else percentPos 1
return (heroinStatus, testResult)
percentUser p = percent p User Clean
percentPos p = percent p Pos Neg
percent p x1 x2 =
weighted [(x1, p), (x2, 100p)]
This code is based our FDist monad, which is in turn based on
PFP. Don’t worry if it seems slightly mysterious; you can think of the
“<-” operator as choosing an element from a probability
distribution.
Running our drug test shows every possible combination of the two
variables:
> exact drugTest1
[Perhaps (User,Pos) 0.1%,
Perhaps (User,Neg) 0.0%,
Perhaps (Clean,Pos) 1.0%,
Perhaps (Clean,Neg) 98.9%]
If you look carefully, we have a problem. Most of the employees who test
positive are actually clean! Let’s tweak our code a bit, and try to zoom
in on the positive test results.
Read more...
Tags Haskell, Math, Monads, Probability, Recommended | 15 comments
Posted by Eric Kidd
Thu, 22 Feb 2007 04:53:00 GMT
In Part 1, we cloned PFP, a library for computing with probability distributions. PFP represents a distribution as a list of possible values, each with an associated probability.
But in the real world, things aren’t always so easy. What if we wanted to pick a random number between 0 and 1? Our previous implementation would break, because there’s an infinite number of values between 0 and 1—they don’t exactly fit in a list.
As it turns out, Sungwoo Park and colleagues found an elegant solution to this problem. They represented probability distributions as sampling functions, resulting in something called the λ◯ calculus. (I have no idea how to pronounce this!)
With a little bit of hacking, we can use their sampling functions as a drop-in replacement for PFP.
A common interface
Since we will soon have two ways to represent probability distributions, we need to define a common interface.
type Weight = Float
class (Functor d, Monad d) => Dist d where
weighted :: [(a, Weight)] -> d a
uniform :: Dist d => [a] -> d a
uniform = weighted . map (\x -> (x, 1))
The function uniform will create an equally-weighted distribution from a list of values. Using this API, we can represent a two-child family as follows:
data Child = Girl | Boy
deriving (Show, Eq, Ord)
child :: Dist d => d Child
child = uniform [Girl, Boy]
family :: Dist d => d [Child]
family = do
child1 <- child
child2 <- child
return [child1, child2]
Now, we need to implement this API two different ways: Once with lists, and a second time with sampling functions.
Read more...
Tags Haskell, Math, Monads, Probability | 2 comments
Posted by Eric Kidd
Wed, 21 Feb 2007 13:06:00 GMT
(Warning: This article is a bit more technical than most of my stuff. It assumes prior knowledge of monads and monad transformers.)
Martin Erwig and Steve Kollmansberger wrote PFP, a really sweet Haskell library for computing with probabilities. To borrow their example:
die :: Dist Int
die = uniform [1..6]
If we roll a die, we get the expected distribution of results:
> die
1 16.7%
2 16.7%
3 16.7%
4 16.7%
5 16.7%
6 16.7%
If you haven’t seen PFP before, I strongly encourage you to check it out. You can use it to solve all sorts of probability puzzles.
Anyway, I discovered an interesting way to implement PFP using monad transformers. Here’s what it looks like:
type Dist = PerhapsT ([])
uniform = weighted . map (\x -> (x, 1))
In other words, Dist can be written by adding some semantics to the standard list monad.
Perhaps: A less specific version of Maybe
First, let’s define a simple probability type:
newtype Prob = P Float
deriving (Eq, Ord, Num)
instance Show Prob where
show (P p) = show intPart ++ "." ++ show fracPart ++ "%"
where digits = round (1000 * p)
intPart = digits `div` 10
fracPart = digits `mod` 10
Thanks to the deriving (Num) declaration, we can treat Prob like any other numeric type.
We can now define Perhaps, which represents a value with an associated probability:
data Perhaps a = Perhaps a Prob
deriving (Show)
Now, this is just a generalization of Haskell’s built-in Maybe type, which treats a value as either present (probability 1) or absent (probability 0). All we’ve added is a range of possibilities in between: Perhaps x 0.5 represents a 50% chance of having a value.
Note that there’s one small trick here: When the probability of a value is 0, we may not actually know it! But because Haskell is a lazy language, we can write:
We’ll need a convenient way to test for this case, to make sure we don’t try to use any undefined values:
neverHappens (Perhaps _ 0) = True
neverHappens _ = False
So Perhaps is just like Maybe. As it turns out, they’re both monads, and they both have an associated monad transformer.
Read more...
Tags Haskell, Math, Monads, Probability | 10 comments