Financial crisis background and Munger on the banks

Posted by Eric Kidd Tue, 05 May 2009 09:36:00 GMT

Charlie Munger is the long-time partner of Warren Buffet. Of the two, he’s the more politically conservative. Their company, Berkshire Hathaway, has recently bought big stakes in several of the better-off investment banks.

Recently, Munger sharply criticized the management of the investment banks, saying they’ve grown too politically powerful. The key quote:

“We need to remove from the investment banking and the commercial banking industries a lot of the practices and prerogatives that they have so lovingly possessed,” Munger said. “If they are too big to fail, they are too big to be allowed to be as gamey and venal as they’ve been – and as stupid as they’ve been.” (, via Baseline Scenario.)

What does the bankers’ stupidity have to do with the usual themes of this blog? Well, much of the crisis comes down to bad probability calculations: the big banks have been treating highly correlated events as though they were independent events.

Some good background material on the crisis:

Tags , ,

Probability monads at Hac 07 II

Posted by Eric Kidd Tue, 02 Oct 2007 07:50:00 GMT

From October 5-7, I’ll be at the Haskell Hackathon in Freiburg.

I’ll be working on probability monads, attempting to turn my various blog articles into a real Haskell library.

Some resources:

If you were a peer reviewer, or gave me feedback on the paper, my humble thanks–and apologies. I haven’t had a chance to revise the paper yet, and so your feedback is not yet included.

See you at the Hackathon!

Tags , , , ,

Freiburg in October: Scheme, Dylan, and probability monads

Posted by Eric Kidd Tue, 18 Sep 2007 12:36:00 GMT

Good morning! I’ll be in Freiburg for several events this October, including CUFP and the Haskell Hackathon.

Commercial Users of Functional Programming (CUFP)

On October 4th, I’ll be speaking at CUFP ’07, describing the use of Scheme in a real-world multimedia engine. Some likely topics:

  1. How we switched to Scheme (and why refactoring is your friend).
  2. How our artists learned to program in Scheme (it’s all about the tools).
  3. The tension between functional programming and objects: Can we have both?

Dylan Beer Night

Once upon a time, I dreamt of generic functions and built RPMs for Gwydion Dylan.

Some current and former Dylan hackers are hoping to meet in Freiburg, most likely on October 4th. If you’re at ICFP or one of the workshops, we’d love to hear from you.

Haskell Hackathon

I’ll be at the Haskell Hackathon from Friday to Sunday.

Perhaps it’s time to whip Control.Monad.Probability into shape?

Tags , , , ,

Robot localization using a particle system monad

Posted by Eric Kidd Thu, 19 Apr 2007 20:43:00 GMT

Refactoring Probability Distributions: part 1, part 2, part 3, part 4, part 5

Welcome to the 5th (and final) installment of Refactoring Probability Distributions! Today, let’s begin with an example from Bayesian Filters for Location Estimation (PDF), an excellent paper by Fox and colleagues.

In their example, we have a robot in a hallway with 3 doors. Unfortunately, we don’t know where in the hallway the robot is located:

The vertical black lines are “particles.” Each particle represents a possible location of our robot, chosen at random along the hallway. At first, our particles are spread along the entire hallway (the top row of black lines). Each particle begins life with a weight of 100%, represented by the height of the black line.

Now imagine that our robot has a “door sensor,” which currently tells us that we’re in front of a door. This allows us to rule out any particle which is located between doors.

So we multiply the weight of each particle by 100% (if it’s in front of a door) or 0% (if it’s between doors), which gives us the lower row of particles. If our sensor was less accurate, we might use 90% and 10%, respectively.

What would this example look like in Haskell? We could build a giant list of particles (with weights), but that would require us to do a lot of bookkeeping by hand. Instead, we use a monad to hide all the details, allowing us to work with a single particle at a time.

localizeRobot :: WPS Int
localizeRobot = do
  -- Pick a random starting location.
  -- This will be our first particle.
  pos1 <- uniform [0..299]
  -- We know we're at a door.  Particles
  -- in front of a door get a weight of
  -- 100%, others get 0%.
  if doorAtPosition pos1
    then weight 1
    else weight 0
  -- ...

What happens if our robot drives forward?


Tags , , , , ,

Smart classification using Bayesian monads in Haskell

Posted by Eric Kidd Sat, 03 Mar 2007 09:02:00 GMT

(Refactoring Probability Distributions: part 1, part 2, part 3, part 4)

The world is full of messy classification problems:

  • “Is this order fraudulent?”
  • “It this e-mail a spam?”
  • “What blog posts would Rachel find interesting?”
  • “Which intranet documents is Sam looking for?”

In each case, we want to classify something: Orders are either valid or fraudulent, messages are either spam or non-spam, blog posts are either interesting or boring. Unfortunately, most software is terrible at making these distinctions. For example, why can’t my RSS reader go out and track down the 10 most interesting blog posts every day?

Some software, however, can make these distinctions. Google figures out when I want to watch a movie, and shows me specialized search results. And most e-mail clients can identify spam with over 99% accuracy. But the vast majority of software is dumb, incapable of dealing with the messy dilemmas posed by the real world.

So where can we learn to improve our software?

Outside of Google’s shroud of secrecy, the most successful classifiers are spam filters. And most modern spam filters are inspired by Paul Graham’s essay A Plan for Spam.

So let’s go back to the source, and see what we can learn. As it turns out, we can formulate a lot of the ideas in A Plan for Spam in a straightforward fashion using a Bayesian monad.

Functions from distributions to distributions

Let’s begin with spam filtering. By convention, we divide messages into “spam” and “ham”, where “ham” is the stuff we want to read.

data MsgType = Spam | Ham
  deriving (Show, Eq, Enum, Bounded)

Let’s assume that we’ve just received a new e-mail. Without even looking at it, we know there’s a certain chance that it’s a spam. This gives us something called a “prior distribution” over MsgType.

> bayes msgTypePrior
[Perhaps Spam 64.2%, Perhaps Ham 35.8%]

But what if we know that the first word of the message is “free”? We can use that information to calculate a new distribution.

> bayes (hasWord "free" msgTypePrior)
[Perhaps Spam 90.5%, Perhaps Ham 9.5%]

The function hasWord takes a string and a probability distribution, and uses them to calculate a new probability distribution:

hasWord :: String -> FDist' MsgType ->
           FDist' MsgType
hasWord word prior = do
  msgType <- prior
  wordPresent <-
    wordPresentDist msgType word
  condition wordPresent
  return msgType

This code is based on the Bayesian monad from part 3. As before, the “<-” operator selects a single item from a probability distribution, and “condition” asserts that an expression is true. The actual Bayesian inference happens behind the scenes (handy, that).

If we have multiple pieces of evidence, we can apply them one at a time. Each piece of evidence will update the probability distribution produced by the previous step:

hasWords []     prior = prior
hasWords (w:ws) prior = do
  hasWord w (hasWords ws prior)

The final distribution will combine everything we know:

> bayes (hasWords ["free","bayes"] msgTypePrior)
[Perhaps Spam 34.7%, Perhaps Ham 65.3%]

This technique is known as the naive Bayes classifier. Looked at from the right angle, it’s surprisingly simple.

(Of course, the naive Bayes classifier assumes that all of our evidence is independent. In theory, this is a pretty big assumption. In practice, it works better than you might think.)

But this still leaves us with a lot of questions: How do we keep track of our different classifiers? How do we decide which ones to apply? And do we need to fudge the numbers to get reasonable results?

In the following sections, I’ll walk through various aspects of Paul Graham’s A Plan for Spam, and show how to generalize it. If you want to follow along, you can download the code using Darcs:

darcs get

Tags , , , , ,

Bayes' rule in Haskell, or why drug tests don't work

Posted by Eric Kidd Thu, 22 Feb 2007 18:11:00 GMT

Part 3 of Refactoring Probability Distributions.
(Part 1: PerhapsT, Part 2: Sampling functions)

A very senior Microsoft developer who moved to Google told me that Google works and thinks at a higher level of abstraction than Microsoft. “Google uses Bayesian filtering the way Microsoft uses the if statement,” he said. -Joel Spolsky

I really love this quote, because it’s insanely provocative to any language designer. What would a programming language look like if Bayes’ rule were as simple as an if statement?

Let’s start with a toy problem, and refactor it until Bayes’ rule is baked right into our programming language.

Imagine, for a moment, that we’re in charge of administering drug tests for a small business. We’ll represent each employee’s test results (and drug use) as follows:

data Test = Pos | Neg
  deriving (Show, Eq)

data HeroinStatus = User | Clean
  deriving (Show, Eq)

Assuming that 0.1% of our employees have used heroin recently, and that our test is 99% accurate, we can model the testing process as follows:

drugTest1 :: Dist d => d (HeroinStatus, Test)
drugTest1 = do
  heroinStatus <- percentUser 0.1
  testResult <-
    if heroinStatus == User
      then percentPos 99
      else percentPos 1
  return (heroinStatus, testResult)

-- Some handy distributions.
percentUser p = percent p User Clean
percentPos p = percent p Pos Neg

-- A weighted distribution with two elements.
percent p x1 x2 =
  weighted [(x1, p), (x2, 100-p)]

This code is based our FDist monad, which is in turn based on PFP. Don’t worry if it seems slightly mysterious; you can think of the “<-” operator as choosing an element from a probability distribution.

Running our drug test shows every possible combination of the two variables:

> exact drugTest1
[Perhaps (User,Pos) 0.1%,
 Perhaps (User,Neg) 0.0%,
 Perhaps (Clean,Pos) 1.0%,
 Perhaps (Clean,Neg) 98.9%]

If you look carefully, we have a problem. Most of the employees who test positive are actually clean! Let’s tweak our code a bit, and try to zoom in on the positive test results.


Tags , , , , ,

Refactoring probability distributions, part 2: Random sampling

Posted by Eric Kidd Wed, 21 Feb 2007 23:53:00 GMT

In Part 1, we cloned PFP, a library for computing with probability distributions. PFP represents a distribution as a list of possible values, each with an associated probability.

But in the real world, things aren’t always so easy. What if we wanted to pick a random number between 0 and 1? Our previous implementation would break, because there’s an infinite number of values between 0 and 1—they don’t exactly fit in a list.

As it turns out, Sungwoo Park and colleagues found an elegant solution to this problem. They represented probability distributions as sampling functions, resulting in something called the λ calculus. (I have no idea how to pronounce this!)

With a little bit of hacking, we can use their sampling functions as a drop-in replacement for PFP.

A common interface

Since we will soon have two ways to represent probability distributions, we need to define a common interface.

type Weight = Float

class (Functor d, Monad d) => Dist d where
  weighted :: [(a, Weight)] -> d a

uniform :: Dist d => [a] -> d a
uniform = weighted . map (\x -> (x, 1))

The function uniform will create an equally-weighted distribution from a list of values. Using this API, we can represent a two-child family as follows:

data Child = Girl | Boy
  deriving (Show, Eq, Ord)

child :: Dist d => d Child
child = uniform [Girl, Boy]

family :: Dist d => d [Child]
family = do
  child1 <- child
  child2 <- child
  return [child1, child2]

Now, we need to implement this API two different ways: Once with lists, and a second time with sampling functions.


Tags , , , ,

Refactoring probability distributions, part 1: PerhapsT

Posted by Eric Kidd Wed, 21 Feb 2007 08:06:00 GMT

(Warning: This article is a bit more technical than most of my stuff. It assumes prior knowledge of monads and monad transformers.)

Martin Erwig and Steve Kollmansberger wrote PFP, a really sweet Haskell library for computing with probabilities. To borrow their example:

die :: Dist Int
die = uniform [1..6]

If we roll a die, we get the expected distribution of results:

> die
1  16.7%
2  16.7%
3  16.7%
4  16.7%
5  16.7%
6  16.7%

If you haven’t seen PFP before, I strongly encourage you to check it out. You can use it to solve all sorts of probability puzzles.

Anyway, I discovered an interesting way to implement PFP using monad transformers. Here’s what it looks like:

type Dist = PerhapsT ([])

uniform = weighted . map (\x -> (x, 1))

In other words, Dist can be written by adding some semantics to the standard list monad.

Perhaps: A less specific version of Maybe

First, let’s define a simple probability type:

newtype Prob = P Float
  deriving (Eq, Ord, Num)

instance Show Prob where
  show (P p) = show intPart ++ "." ++ show fracPart ++ "%"
    where digits = round (1000 * p)
          intPart = digits `div` 10
          fracPart = digits `mod` 10

Thanks to the deriving (Num) declaration, we can treat Prob like any other numeric type.

We can now define Perhaps, which represents a value with an associated probability:

data Perhaps a = Perhaps a Prob
  deriving (Show)

Now, this is just a generalization of Haskell’s built-in Maybe type, which treats a value as either present (probability 1) or absent (probability 0). All we’ve added is a range of possibilities in between: Perhaps x 0.5 represents a 50% chance of having a value.

Note that there’s one small trick here: When the probability of a value is 0, we may not actually know it! But because Haskell is a lazy language, we can write:

Perhaps undefined 0

We’ll need a convenient way to test for this case, to make sure we don’t try to use any undefined values:

neverHappens (Perhaps _ 0) = True
neverHappens _             = False

So Perhaps is just like Maybe. As it turns out, they’re both monads, and they both have an associated monad transformer.


Tags , , , ,

Probabilistic Functional Programming is cool

Posted by Eric Kidd Sat, 10 Feb 2007 16:14:00 GMT

Syntaxfree is hacking on Martin Erwig’s probability monad. This is one of the coolest monads out there—it allows you to trivially solve all kinds of probability problems.

Mikael Johansson has a good example.

I hope to write a bit more about probability monads soon. There’s already a long post sitting on my hard drive, and some more ideas that I’m still trying to puzzle out.

In the meantime, I’d like to recommend The Haskell Road to Logic, Maths and Programming. There’s an excellent review available.

Tags , , ,

Fromberger spam filtering paper

Posted by Eric Mon, 30 Sep 2002 00:00:00 GMT

Michael Fromberger has written a nice formal analysis (PDF) of Paul Graham's Plan for Spam.

Tags ,

Older posts: 1 2