Posted by Eric Kidd
Tue, 02 Oct 2007 11:50:00 GMT
From October 5-7, I’ll be at the Haskell Hackathon in Freiburg.
I’ll be working on probability monads, attempting to turn my various blog articles into a real Haskell library.
Some resources:
If you were a peer reviewer, or gave me feedback on the paper, my humble thanks–and apologies. I haven’t had a chance to revise the paper yet, and so your feedback is not yet included.
See you at the Hackathon!
Tags Haskell, Math, Monads, Probability | 5 comments
Posted by Eric Kidd
Tue, 18 Sep 2007 16:36:00 GMT
Good morning! I’ll be in Freiburg for several events this October, including CUFP and the Haskell Hackathon.
Commercial Users of Functional Programming (CUFP)
On October 4th, I’ll be speaking at CUFP ’07, describing the use of Scheme in a real-world multimedia engine. Some likely topics:
- How we switched to Scheme (and why refactoring is your friend).
- How our artists learned to program in Scheme (it’s all about the tools).
- The tension between functional programming and objects: Can we have both?
Dylan Beer Night
Once upon a time, I dreamt of generic functions and built RPMs for Gwydion Dylan.
Some current and former Dylan hackers are hoping to meet in Freiburg, most likely on October 4th. If you’re at ICFP or one of the workshops, we’d love to hear from you.
Haskell Hackathon
I’ll be at the Haskell Hackathon from Friday to Sunday.
Perhaps it’s time to whip Control.Monad.Probability into shape?
Tags Dylan, Haskell, Monads, Probability, Scheme | no comments
Posted by Eric Kidd
Sat, 28 Apr 2007 15:53:00 GMT
Bowling is a tricky game to score. It’s just complicated enough
to act as a good programming exercise. And Ron Jeffries has performed this
exercise many times, in C#, Smalltalk, and other
languages. He’s been searching for a tidy and elegant solution, one
which makes the rules of bowling as clear as possible.
In the past, though, Jeffries has been a bit skeptical of Haskell
implementations of bowling:
The recursive [Haskell] solution, however, is questionable on
more fundamental grounds. A game of bowling consists of ten frames, not
less or more, and the “ten-ness” of the game is not represented in the
recursive solutions at all. Even if we let that slide, the recursive
solutions make it a bit hard to understand what’s going on.
Let’s see if we can do better. No knowledge of bowling is required–if we
do this right, our program should be at least as clear as an
English-language version of the rules.
Along the way, we’ll encounter lazy lists, an interesting recursion combinator, and Hoogle, the Haskell search engine.
The rules of bowling
In bowling, we roll balls down a lane, trying to knock down pins. If we
know how many pins we knock down with each ball, we can compute the final
score. So our program looks something like this:
type Balls = [Int]
type Score = Int
scoreGame :: Balls -> Score
scoreGame balls = ???
But how do we implement scoreGame?
Scoring a frame
A bowling game is divided into 10 frames. Ordinary frames consist of 1 or 2 balls. The 10th frame may have an additional 1 or 2 bonus balls, which we discuss below.
To score an individual frame, we need to do two things: (1) calculate the score for our frame, and (2) figure out where the next frame
starts. Our scoring function will return both pieces of information:
scoreFrame :: Balls -> (Score, Balls)
If we knock down all 10 pins with the first ball in a frame
(x1), we call it a strike, and move on to the next
frame immediately. But we also get a bonus—we’re allowed to count balls
y1 and y2 from the next frame towards
this frame’s score:
scoreFrame (x1: y1:y2:ys) | x1 == 10 =
(x1+y1+y2, y1:y2:ys)
If we knock down all the pins using two balls (x1 and
x2), we call it a spare. And we get to count one ball from
the next frame as our bonus:
scoreFrame (x1:x2: y1:ys) | x1+x2 == 10 =
(x1+x2+y1, y1:ys)
If we don’t manage to knock all 10 pins with two balls, we call it an open
frame. And we don’t get any bonus:
scoreFrame (x1:x2: ys) =
(x1+x2, ys)
What happens if we have a strike or a spare in the 10th frame?
We get to roll our bonus balls anyway. Conventionally, these extra balls
are recorded as part of the 10th frame (making it 3 balls long), but
they’re really just phantom balls hanging off the end of the game.
Next, we need to turn scoreFrame into a recursive function.
Read more...
Tags Haskell | 11 comments
Posted by Eric Kidd
Fri, 20 Apr 2007 00:43:00 GMT
Refactoring Probability Distributions:
part 1, part 2, part 3, part 4, part 5
Welcome to the 5th (and final) installment of Refactoring Probability
Distributions! Today, let’s begin with an example from
Bayesian Filters for Location Estimation (PDF), an excellent paper by Fox and colleagues.
In their example, we have a robot in a hallway with 3 doors. Unfortunately, we don’t know where in the hallway the robot is located:

The vertical black lines are “particles.” Each particle represents a
possible location of our robot, chosen at random along the hallway. At
first, our particles are spread along the entire hallway (the top
row of black lines). Each particle begins life with a weight of 100%, represented by the height of the black line.
Now imagine that our robot has a “door sensor,” which currently tells us that
we’re in front of a door. This allows us to rule out any particle which is located between doors.
So we multiply the weight of each particle by 100% (if it’s in front of a door) or 0% (if it’s between doors), which gives us the lower row of particles. If our sensor was less accurate, we might use 90% and 10%, respectively.
What would this example look like in Haskell? We could build a
giant list of particles (with weights), but that would require us to do a
lot of bookkeeping by hand. Instead, we use a monad to hide all the
details, allowing us to work with a single particle at a time.
localizeRobot :: WPS Int
localizeRobot = do
pos1 <- uniform [0..299]
if doorAtPosition pos1
then weight 1
else weight 0
What happens if our robot drives forward?
Read more...
Tags Haskell, Math, Monads, Probability, Robots | 9 comments
Posted by Eric Kidd
Fri, 16 Mar 2007 02:29:00 GMT
…and how to fake Lisp macros with Template Haskell
(I wrote this article in response to a comment by
sigfpe. You may find it pretty dry reading, unless you want to
build domain-specific languages in Haskell. Proceed at your own
risk.)
Haskell’s built-in Monad type has some serious limitations. We can fix those limitations using a number of advanced Haskell techniques, including Template Haskell, Haskell’s closest equivalent to Lisp macros.
We can illustrate the limitations of Monad with an example from math. In set theory, we can define a set by specifying how to compute each
element:
{ xy : x ∈ {1,2,4}, y ∈ {1,2,4} }
We can read this as, “the set of all xy, where x
is one of {1,2,4}, and y is one of {1,2,4}.” To calculate the
answer, we first multiply together all the possible combinations:
1×1=1, 1×2=2, 1×4=4, 2×1=2, 2×2=4, 2×4=8, 4×1=4, 4×2=8, 4×4=16
We then collect up the answers, and—because we’re working with sets–we
throw away the duplicates:
{1,2,4,8,16}
Can we do the same thing in Haskell? Well, using Haskell’s list
monad, we can write:
listExample = do
x <- [1,2,4]
y <- [1,2,4]
return (x*y)
But when we run this, Haskell gives us lots of duplicate values:
> listExample
[1,2,4,2,4,8,4,8,16]
Our problem: We’re using lists (which can contain duplicate
values) to represent sets (which can’t). Can we fix this by switching to
Haskell’s Data.Set?
import qualified Data.Set as S
setExample = do
x <- S.fromList [1,2,4]
y <- S.fromList [1,2,4]
return (x*y)
Unfortunately, this code fails spectacularly. A Haskell monad is required
to work for any types a and b:
class Monad m where
return :: a -> m a
fail :: String -> m a
(>>=) :: m a -> (a -> m b) -> m b
But Data.Set only works for some types. Specifically, it
requires that values of type a can be ordered:
data (Ord a) => Set a = ...
As it turns out, we can make Data.Set into a monad. But be
warned: The solution involves some pretty ugly Haskell abuse.
Read more...
Tags Haskell, Macros, Monads | 9 comments
Posted by Eric Kidd
Mon, 12 Mar 2007 23:39:00 GMT
This morning, a programmer visited #haskell and asked how to implement
backtracking. Not surprisingly, most of the answers involved
monads. After all, monads are ubiquitous in Haskell: They’re used for IO,
for probability, for error reporting, and even for quantum
mechanics. If you program in Haskell, you’ll
probably want to understand monads. So where’s the best place to start?
A friend of mine claims he didn’t truly understand monads until he
understood join. But once he figured that out, everything was
suddenly obvious. That’s the way it worked for me, too. But relatively
few monad tutorials are based on join, so there’s an open niche in a crowded market.
This monad tutorial uses join. Even better, it attempts to
cram everything you need to know about monads into 15 minutes. (Hey, everybody needs a gimmick, right?)
Backtracking: The lazy way to code
We begin with a backtracking constraint solver. The idea: Given
possible values for x and y, we want to pick
those values which have a product of 8:
solveConstraint = do
x <- choose [1,2,3]
y <- choose [4,5,6]
guard (x*y == 8)
return (x,y)
Every time choose is called, we save the current program
state. And every time guard fails, we backtrack to a saved
state and try again. Eventually, we’ll hit the right answer:
> take 1 solveConstraint
[(2,4)]
Let’s build this program step-by-step in Haskell. When we’re done, we’ll
have a monad.
Read more...
Tags Haskell, Math, Monads | 31 comments
Posted by Eric Kidd
Sun, 11 Mar 2007 00:05:00 GMT
Haskell is a marvellous language, but there are some things I don’t like about it. My least favorite: Haskell has no fewer than 8
different APIs for reporting errors.
To make a bad situation worse, the choice of API varies between
popular libraries. To give a particularly unfortunate example,
Network.URI.parseURI and Network.HTTP.simpleHTTP report
errors in entirely different ways, turning a “download this URL” program into a page of code,
nearly half of which is devoted to dealing with various kinds of errors. (The rest is boilerplate that could be refactored into a nice wrapper.)
Let’s begin with a toy function, the simplest possible program that could
actually fail:
As every algebra student knows, we can’t divide by zero. Using this
function as our example, let’s take a look at all the different ways we
can implement error-reporting in Haskell.
Read more...
Tags Haskell | 7 comments
Posted by Eric Kidd
Mon, 05 Mar 2007 14:32:00 GMT
Monads are a remarkably powerful tool for building specialized programming languages. Some examples include:
But there’s a bunch of things I don’t understand about monads. In each case, my confusion involves some aspect of the underlying math that “bubbles up” to affect the design of specialized languages.
(Warning: Obscure monad geeking ahead.)
Commutative monads
A “commutative monad” is any monad where we can replace the expression:
…with:
…without changing the meaning. Examples of commutative monads include Reader and Rand. This is an important property, because it might allow us to parallelize the commonly-used sequence function across huge numbers of processors:
sequence :: (Monad m) => [m a] -> m [a]
Simon Peyton Jones lists this problem as Open Challenge #2, saying:
Commutative monads are very common. (Environment,
unique supply, random number generation.) For these, monads over-sequentialise.
Wanted: theory and notation for some cool compromise.
Commutative monad morphisms
Monad morphisms are the category theory equivalent of Haskell’s monad transformers. Haskell’s monad transformers can be expressed as monad layerings, which correspond to the monad morphisms of category theory.
Many complicated monads break down into a handful of monad transformers, often in surprising ways.
But composing monad transformers is a mess, because they interact in poorly-understood ways. In general, the following two types have very different semantics:
FooT (BarT m)
BarT (FooT m)
If FooT and BarT commute with each other, however, the two types would be equivalent. This is helpful when building large stacks of monad transformers.
Chung-chieh Shan encountered a related problem when applying monad morphisms to build a theory of natural language semantics:
It remains to be seen whether monads would provide the appropriate
conceptual encapsulation for a semantic theory with broader coverage. In
particular, for both natural and programming language semantics, combining monads—or perhaps monad-like objects—remains an open issue that
promises additional insight.
Monad morphisms and abstract algebra
Dan Piponi has been drawing some fascinating connections between monad morphisms and abstract algebra. See, for example:
This approach seems to throw a lot of light on monad morphisms—but at least in my case, the light only highlights my confusion.
Of the three problems listed here, this is the one most likely to be discussed in a textbook somewhere. And a solution to this problem would likely help significantly with the other two.
So, my question: Does anybody have any books, papers or ideas that might help untangle this mess?
Update: Be sure to see the comment thread on the second Dan Piponi post above and Chung-chieh Shan’s excellent bibliography on monad transformers.
Tags Haskell, Math, Monads | 6 comments
Posted by Eric Kidd
Sat, 03 Mar 2007 14:02:00 GMT
(Refactoring Probability Distributions: part 1, part 2,
part 3, part 4)
The world is full of messy classification problems:
- “Is this order fraudulent?”
- “It this e-mail a spam?”
- “What blog posts would Rachel find interesting?”
- “Which intranet documents is Sam looking for?”
In each case, we want to classify something: Orders are either valid or
fraudulent, messages are either spam or non-spam, blog posts are either
interesting or boring. Unfortunately, most software is terrible at
making these distinctions. For example, why can’t my RSS reader go out and
track down the 10 most interesting blog posts every day?
Some software, however, can make these distinctions.
Google figures out when I want to watch a movie, and shows me specialized
search results. And most e-mail clients can identify spam with over
99% accuracy. But the vast majority of software is dumb, incapable of
dealing with the messy dilemmas posed by the real world.
So where can we learn to improve our software?
Outside of Google’s shroud
of secrecy, the most successful classifiers are spam filters. And most modern
spam filters are inspired by Paul Graham’s essay A Plan for Spam.
So let’s go back to the source, and see what we can learn. As it turns out, we can formulate a lot of the ideas in A Plan
for Spam in a straightforward fashion using a Bayesian
monad.
Functions from distributions to distributions
Let’s begin with spam filtering. By convention, we divide messages into
“spam” and “ham”, where “ham” is the stuff we want to read.
data MsgType = Spam | Ham
deriving (Show, Eq, Enum, Bounded)
Let’s assume that we’ve just received a new e-mail. Without even looking
at it, we know there’s a certain chance that it’s a spam. This gives us
something called a “prior distribution” over MsgType.
> bayes msgTypePrior
[Perhaps Spam 64.2%, Perhaps Ham 35.8%]
But what if we know that the first word of the message is “free”? We can
use that information to calculate a new distribution.
> bayes (hasWord "free" msgTypePrior)
[Perhaps Spam 90.5%, Perhaps Ham 9.5%]
The function hasWord takes a string and a probability
distribution, and uses them to calculate a new probability distribution:
hasWord :: String -> FDist' MsgType ->
FDist' MsgType
hasWord word prior = do
msgType <- prior
wordPresent <-
wordPresentDist msgType word
condition wordPresent
return msgType
This code is based on the Bayesian monad from part 3. As before,
the “<-” operator selects a single item from a probability
distribution, and “condition” asserts that an expression is true. The
actual Bayesian inference happens behind the scenes (handy, that).
If we have multiple pieces of evidence, we can apply them one at a time.
Each piece of evidence will update the probability distribution produced by
the previous step:
hasWords [] prior = prior
hasWords (w:ws) prior = do
hasWord w (hasWords ws prior)
The final distribution will combine everything we know:
> bayes (hasWords ["free","bayes"] msgTypePrior)
[Perhaps Spam 34.7%, Perhaps Ham 65.3%]
This technique is known as the naive Bayes classifier. Looked at from the right angle, it’s surprisingly simple.
(Of course, the naive Bayes classifier assumes that all of our evidence is independent. In theory, this is a pretty big assumption. In practice, it works better than you might think.)
But this still leaves us with a lot of questions: How do we keep track of
our different classifiers? How do we decide which ones to apply? And do
we need to fudge the numbers to get reasonable results?
In the following sections, I’ll walk through various aspects of Paul
Graham’s A Plan for Spam, and show how to generalize it. If you
want to follow along, you can download the code using Darcs:
darcs get http://www.randomhacks.net/darcs/probability
Read more...
Tags Haskell, Math, Monads, Probability, Spam | 5 comments
Posted by Eric Kidd
Thu, 22 Feb 2007 23:11:00 GMT
Part 3 of Refactoring Probability Distributions.
(Part 1: PerhapsT,
Part 2: Sampling functions)
A very senior Microsoft developer who moved to Google told
me that Google works and thinks at a higher level of abstraction than
Microsoft. “Google uses Bayesian filtering the way Microsoft uses the if
statement,” he said. -Joel Spolsky
I really love this quote, because it’s insanely provocative
to any language designer. What would a programming language look
like if Bayes’ rule were as simple as an if statement?
Let’s start with a toy problem, and refactor it until Bayes’ rule is baked
right into our programming language.
Imagine, for a moment, that we’re in charge of administering drug tests for
a small business. We’ll represent each employee’s test results (and drug use) as follows:
data Test = Pos | Neg
deriving (Show, Eq)
data HeroinStatus = User | Clean
deriving (Show, Eq)
Assuming that 0.1% of our employees have used heroin recently, and that our test is 99%
accurate, we can model the testing process as follows:
drugTest1 :: Dist d => d (HeroinStatus, Test)
drugTest1 = do
heroinStatus <- percentUser 0.1
testResult <-
if heroinStatus == User
then percentPos 99
else percentPos 1
return (heroinStatus, testResult)
percentUser p = percent p User Clean
percentPos p = percent p Pos Neg
percent p x1 x2 =
weighted [(x1, p), (x2, 100p)]
This code is based our FDist monad, which is in turn based on
PFP. Don’t worry if it seems slightly mysterious; you can think of the
“<-” operator as choosing an element from a probability
distribution.
Running our drug test shows every possible combination of the two
variables:
> exact drugTest1
[Perhaps (User,Pos) 0.1%,
Perhaps (User,Neg) 0.0%,
Perhaps (Clean,Pos) 1.0%,
Perhaps (Clean,Neg) 98.9%]
If you look carefully, we have a problem. Most of the employees who test
positive are actually clean! Let’s tweak our code a bit, and try to zoom
in on the positive test results.
Read more...
Tags Haskell, Math, Monads, Probability, Recommended | 15 comments