Posted by Eric Kidd
Fri, 20 May 2011 20:01:00 GMT

Last month, the folks at Lab49 explained how to compute the derivative of a data structure. This is a great example of how to write about mathematical subjects for a casual audience: They draw analogies to well-known programming languages, they follow a single, well-chosen thread of explanation, and there’s a clever payoff at the end.

The Lab49 blog post is, of course, based on two classic papers by Conor McBride, and Huet’s original paper The Zipper.

If you’re interested in real-world applications of this technique, there’s a great explanation in the final chapter of Learn You a Haskell for Great Good. If you’re interested in some deeper mathematical connections, see the discussion at Lambda the Ultimate.

Tags Haskell, Math **|** 5 comments

Posted by Eric Kidd
Thu, 12 May 2011 12:09:00 GMT

A question asked while standing in the shower: What do all of the following have in common?

- Banach and Brouwer fixed points. If you’re in Manhattan, and you crumple up a map of Manhattan and place it on the ground, at least one point on your map will be exactly over the corresponding point on the ground. (This is true even if your map is
*larger* than life.)
- The fixed points computed by the Y combinator, which is used to construct anonymous recursive functions in the lambda calculus.
- The Nash equilibrium, which is the stable equilibrium of a multi-player game (and one of the key ideas of economics). See also this lovely—if metaphorical—rant by Scott Aaronson.
- The eigenvectors of a matrix, which will still point in the same direction after multiplication by the matrix.

At what level of abstraction are all these important ideas really just the same idea? If we strip everything down to generalized abstract nonsense, is there a nice simple formulation that covers all of the above?

(I can’t play with this shiny toy today; I have to work.)

Tags Haskell, Math **|** 2 comments

Posted by Eric Kidd
Tue, 02 Oct 2007 07:50:00 GMT

From October 5-7, I’ll be at the Haskell Hackathon in Freiburg.

I’ll be working on probability monads, attempting to turn my various blog articles into a real Haskell library.

Some resources:

If you were a peer reviewer, or gave me feedback on the paper, my humble thanks–and apologies. I haven’t had a chance to revise the paper yet, and so your feedback is not yet included.

See you at the Hackathon!

Tags Haskell, Math, Monads, Probability, ProbabilityMonads

Posted by Eric Kidd
Thu, 19 Apr 2007 20:43:00 GMT

**Refactoring Probability Distributions:**
part 1, part 2, part 3, part 4, **part 5**

Welcome to the 5th (and final) installment of *Refactoring Probability
Distributions!* Today, let’s begin with an example from
Bayesian Filters for Location Estimation (PDF), an excellent paper by Fox and colleagues.

In their example, we have a robot in a hallway with 3 doors. Unfortunately, we don’t know *where* in the hallway the robot is located:

The vertical black lines are “particles.” Each particle represents a
possible location of our robot, chosen at random along the hallway. At
first, our particles are spread along the entire hallway (the top
row of black lines). Each particle begins life with a weight of 100%, represented by the height of the black line.

Now imagine that our robot has a “door sensor,” which currently tells us that
we’re in front of a door. This allows us to rule out any particle which is located *between* doors.

So we multiply the weight of each particle by 100% (if it’s in front of a door) or 0% (if it’s between doors), which gives us the lower row of particles. If our sensor was less accurate, we might use 90% and 10%, respectively.

What would this example look like in Haskell? We *could* build a
giant list of particles (with weights), but that would require us to do a
lot of bookkeeping by hand. Instead, we use a monad to hide all the
details, allowing us to work with a single particle at a time.

```
localizeRobot :: WPS Int
localizeRobot = do
pos1 <- uniform [0..299]
if doorAtPosition pos1
then weight 1
else weight 0
```

What happens if our robot drives forward?

Read more...
Tags Haskell, Math, Monads, Probability, ProbabilityMonads, Robots

Posted by Eric Kidd
Mon, 12 Mar 2007 19:39:00 GMT

This morning, a programmer visited #haskell and asked how to implement
backtracking. Not surprisingly, most of the answers involved
monads. After all, monads are ubiquitous in Haskell: They’re used for IO,
for probability, for error reporting, and even for quantum
mechanics. If you program in Haskell, you’ll
probably want to understand monads. So where’s the best place to start?

A friend of mine claims he didn’t truly understand monads until he
understood `join`

. But once he figured that out, everything was
suddenly obvious. That’s the way it worked for me, too. But relatively
few monad tutorials are based on `join`

, so there’s an open niche in a crowded market.

This monad tutorial uses `join`

. Even better, it attempts to
cram everything you need to know about monads into 15 minutes. (Hey, everybody needs a gimmick, right?)

### Backtracking: The lazy way to code

We begin with a backtracking constraint solver. The idea: Given
possible values for `x`

and `y`

, we want to pick
those values which have a product of 8:

```
solveConstraint = do
x <- choose [1,2,3]
y <- choose [4,5,6]
guard (x*y == 8)
return (x,y)
```

Every time `choose`

is called, we save the current program
state. And every time `guard`

fails, we backtrack to a saved
state and try again. Eventually, we’ll hit the right answer:

```
> take 1 solveConstraint
[(2,4)]
```

Let’s build this program step-by-step in Haskell. When we’re done, we’ll
have a monad.

Read more...
Tags Haskell, Math, Monads

Posted by Eric Kidd
Wed, 07 Mar 2007 08:01:00 GMT

Download the free book or visit the official site

My college linear algebra course was held early in the morning, and
it was devoted almost entirely to blackboard proofs. The professor would
stand in front of the room, half asleep, and write:

“Theorem. Lemma. Lemma. Proof. Theorem…”

Despite this experience, I somehow managed to learn about eigenvectors
and kernels. Or at least, I learned how to write proofs about them. But I
had no intuition for linear algebra: I couldn’t visualize it, and I couldn’t
explain why anybody, anywhere, ever *cared* about eigenvectors.

Years later, in a computer vision class, I finally learned to care about
linear algebra. It could solve all sorts of cool problems!
(Eigenfaces, in particular, blew me away.) And since then, I’ve
encountered linear algebra everywhere. But my intuition is still
piecemeal, built from half-a-dozen applications over the years.

My motto for math is, “If it keeps showing up, build a rock-solid intuition
for how it works.” And towards that end, I’ve been looking for a good
linear algebra textbook.

My ideal linear algebra textbook would:

- Include plenty of motivating examples.
- Show how to solve real-world problems.
- Devote plenty of time to proofs.

The proofs, after all, are necessary in the real world. If you ever
attempt to do something slightly odd, you’ll want to
prove that it actually works.

### Jim Hefferon’s *Linear Algebra*

Professor Jim Hefferon’s *Linear Algebra* is available as a free PDF
download. But don’t be fooled by the price: Hefferon’s book is
better than most of the expensive tomes sold in college bookstores.

Everything in Hefferon’s book is superbly motivated. The first chapter
begins with two real-world examples: Unknown weights placed on balances,
and the ratios of complex molecules in chemical reactions. These examples
are used to introduce Gauss’s method for solving systems of linear
equations. Further into the book, the examples begin to tie back to
earlier chapters. Determinants, for example, are motivated by the
usefulness of recognizing isomorphisms and invertible matrices.

But Hefferon’s emphasis on real-world examples is admirably balanced by an
abundance of proofs. The first proof appears on page 4, and nearly
everything is proven either in the main text or in the exercises. This
will be helpful for readers who (like me) are trying to bring more rigor to
their mathematical thinking.

### The “Topics”: Fascinating real-world problems

The most delightful part of the book, however, are the “Topics” at the end
of each chapter. These cover a wide range of fields, including biology,
economics, probability and abstract algebra. The topic “Stable
Populations” begins:

Imagine a reserve park with animals from a species that we are
trying to protect. The park doesn’t have a fence and so animals cross the
boundary, both from the inside out and in the other direction. Every year,
10% of the animals from inside of the park leave, and 1% of the animals
from the outside find their way in. We can ask if we can find a stable
level of population for this park: is there a population that, once
established, will stay constant over time, with the number of animals
leaving equal to the number of animals entering?

Hefferon relates the solution to Markov chains and eigenvalues, cementing
several important intuitions firmly in place.

Other topics include basic electronics, space shuttle O-rings, and the
number of games required to win the World Series. There are plenty of
CS-related discussions, too: a survey of things that can go wrong in naive
numeric code, the time required to calculate determinants, and
how the memory hierarchy affects array layout.

Hefferon’s love for linear algebra is infectious, and his “Topics” will
appeal to anybody who does recreational math.

### “Free” as in “freedom”

*Linear Algebra* is published under the GNU Free Documentation
License and the Creative Commons Share Alike license.

What
this means: You may make copies of the book, or even print them out at a
copyshop and charge students a fee. You may also create a custom version of
the textbook, and share it with anybody who’s interested. The only
restriction: You must “share alike,” honoring the original author’s terms
as you pass along the textbook.

### Miscellaneous notes

Hefferon has put out a call for extra material. In particular, he’d love
to have a section on quantum mechanics:

Several people have asked me about a Topic on eigenvectors and
eigenvalues in Quantum Mechanics. Sadly, I don’t know any QM. If you can
help, that’d be great.

On the downside, the internal PDF links in *Linear Algebra* are broken
in MacOS X Preview. This is odd, because the LaTeX `hyperref`

package usually works fine with Preview.

The reddit discussion of
*Linear Algebra* has pointers to several other linear algebra textbooks, with varying emphasis. And many other free math
textbooks are available online.

If you have any favorite math books (paper or PDF, for any area of
mathematics), please feel free to recommend them in the comment thread!

Download the free book or visit the official site

Tags Math

Posted by Eric Kidd
Mon, 05 Mar 2007 09:32:00 GMT

Monads are a remarkably powerful tool for building specialized programming languages. Some examples include:

But there’s a bunch of things I don’t understand about monads. In each case, my confusion involves some aspect of the underlying math that “bubbles up” to affect the design of specialized languages.

(Warning: Obscure monad geeking ahead.)

### Commutative monads

A “commutative monad” is any monad where we can replace the expression:

…with:

…without changing the meaning. Examples of commutative monads include `Reader`

and `Rand`

. This is an important property, because it might allow us to parallelize the commonly-used `sequence`

function across huge numbers of processors:

```
sequence :: (Monad m) => [m a] -> m [a]
```

Simon Peyton Jones lists this problem as Open Challenge #2, saying:

Commutative monads are very common. (Environment,
unique supply, random number generation.) For these, monads over-sequentialise.

Wanted: theory and notation for some cool compromise.

### Commutative monad morphisms

~~Monad morphisms are the category theory equivalent of Haskell’s monad transformers.~~ *Haskell’s monad transformers can be expressed as monad layerings, which correspond to the monad morphisms of category theory.*

Many complicated monads break down into a handful of monad transformers, often in surprising ways.

But composing monad transformers is a mess, because they interact in poorly-understood ways. In general, the following two types have very different semantics:

```
FooT (BarT m)
BarT (FooT m)
```

If `FooT`

and `BarT`

commute with each other, however, the two types would be equivalent. This is helpful when building large stacks of monad transformers.

Chung-chieh Shan encountered a related problem when applying monad morphisms to build a theory of natural language semantics:

It remains to be seen whether monads would provide the appropriate
conceptual encapsulation for a semantic theory with broader coverage. In
particular, for both natural and programming language semantics, combining monads—or perhaps monad-like objects—remains an open issue that
promises additional insight.

### Monad morphisms and abstract algebra

Dan Piponi has been drawing some fascinating connections between monad morphisms and abstract algebra. See, for example:

This approach seems to throw a lot of light on monad morphisms—but at least in my case, the light only highlights my confusion.

Of the three problems listed here, this is the one most likely to be discussed in a textbook somewhere. And a solution to this problem would likely help significantly with the other two.

So, my question: Does anybody have any books, papers or ideas that might help untangle this mess?

*Update: Be sure to see the comment thread on the second Dan Piponi post above and Chung-chieh Shan’s excellent bibliography on monad transformers.*

Tags Haskell, Math, Monads

Posted by Eric Kidd
Sat, 03 Mar 2007 09:02:00 GMT

(Refactoring Probability Distributions: part 1, part 2,
part 3, **part 4**)

The world is full of messy classification problems:

- “Is this order fraudulent?”
- “It this e-mail a spam?”
- “What blog posts would Rachel find interesting?”
- “Which intranet documents is Sam looking for?”

In each case, we want to classify something: Orders are either valid or
fraudulent, messages are either spam or non-spam, blog posts are either
interesting or boring. Unfortunately, most software is *terrible* at
making these distinctions. For example, why can’t my RSS reader go out and
track down the 10 most interesting blog posts every day?

Some software, however, *can* make these distinctions.
Google figures out when I want to watch a movie, and shows me specialized
search results. And most e-mail clients can identify spam with over
99% accuracy. But the vast majority of software is dumb, incapable of
dealing with the messy dilemmas posed by the real world.

So where can we learn to improve our software?

Outside of Google’s shroud
of secrecy, the most successful classifiers are spam filters. And most modern
spam filters are inspired by Paul Graham’s essay A Plan for Spam.

So let’s go back to the source, and see what we can learn. As it turns out, we can formulate a lot of the ideas in A Plan
for Spam in a straightforward fashion using a Bayesian
monad.

### Functions from distributions to distributions

Let’s begin with spam filtering. By convention, we divide messages into
“spam” and “ham”, where “ham” is the stuff we want to read.

```
data MsgType = Spam | Ham
deriving (Show, Eq, Enum, Bounded)
```

Let’s assume that we’ve just received a new e-mail. Without even looking
at it, we know there’s a certain chance that it’s a spam. This gives us
something called a “prior distribution” over `MsgType`

.

```
> bayes msgTypePrior
[Perhaps Spam 64.2%, Perhaps Ham 35.8%]
```

But what if we know that the first word of the message is “free”? We can
use that information to calculate a new distribution.

```
> bayes (hasWord "free" msgTypePrior)
[Perhaps Spam 90.5%, Perhaps Ham 9.5%]
```

The function `hasWord`

takes a string and a probability
distribution, and uses them to calculate a new probability distribution:

```
hasWord :: String -> FDist' MsgType ->
FDist' MsgType
hasWord word prior = do
msgType <- prior
wordPresent <-
wordPresentDist msgType word
condition wordPresent
return msgType
```

This code is based on the Bayesian monad from part 3. As before,
the “`<-`

” operator selects a single item from a probability
distribution, and “condition” asserts that an expression is true. The
actual Bayesian inference happens behind the scenes (handy, that).

If we have multiple pieces of evidence, we can apply them one at a time.
Each piece of evidence will update the probability distribution produced by
the previous step:

```
hasWords [] prior = prior
hasWords (w:ws) prior = do
hasWord w (hasWords ws prior)
```

The final distribution will combine everything we know:

```
> bayes (hasWords ["free","bayes"] msgTypePrior)
[Perhaps Spam 34.7%, Perhaps Ham 65.3%]
```

This technique is known as the naive Bayes classifier. Looked at from the right angle, it’s surprisingly simple.

(Of course, the naive Bayes classifier assumes that all of our evidence is independent. In theory, this is a pretty big assumption. In practice, it works better than you might think.)

But this still leaves us with a lot of questions: How do we keep track of
our different classifiers? How do we decide which ones to apply? And do
we need to fudge the numbers to get reasonable results?

In the following sections, I’ll walk through various aspects of Paul
Graham’s A Plan for Spam, and show how to generalize it. If you
want to follow along, you can download the code using Darcs:

`darcs get http://www.randomhacks.net/darcs/probability`

Read more...
Tags Haskell, Math, Monads, Probability, ProbabilityMonads, Spam

Posted by Eric Kidd
Thu, 22 Feb 2007 18:11:00 GMT

Part 3 of Refactoring Probability Distributions.

(Part 1: PerhapsT,
Part 2: Sampling functions)

*A very senior Microsoft developer who moved to Google told
me that Google works and thinks at a higher level of abstraction than
Microsoft. “Google uses Bayesian filtering the way Microsoft uses the if
statement,” he said.* -Joel Spolsky

I really love this quote, because it’s insanely provocative
to any language designer. What *would* a programming language look
like if Bayes’ rule were as simple as an `if`

statement?

Let’s start with a toy problem, and refactor it until Bayes’ rule is baked
right into our programming language.

Imagine, for a moment, that we’re in charge of administering drug tests for
a small business. We’ll represent each employee’s test results (and drug use) as follows:

```
data Test = Pos | Neg
deriving (Show, Eq)
data HeroinStatus = User | Clean
deriving (Show, Eq)
```

Assuming that 0.1% of our employees have used heroin recently, and that our test is 99%
accurate, we can model the testing process as follows:

```
drugTest1 :: Dist d => d (HeroinStatus, Test)
drugTest1 = do
heroinStatus <- percentUser 0.1
testResult <-
if heroinStatus == User
then percentPos 99
else percentPos 1
return (heroinStatus, testResult)
percentUser p = percent p User Clean
percentPos p = percent p Pos Neg
percent p x1 x2 =
weighted [(x1, p), (x2, 100p)]
```

This code is based our FDist monad, which is in turn based on
PFP. Don’t worry if it seems slightly mysterious; you can think of the
“`<-`

” operator as choosing an element from a probability
distribution.

Running our drug test shows every possible combination of the two
variables:

```
> exact drugTest1
[Perhaps (User,Pos) 0.1%,
Perhaps (User,Neg) 0.0%,
Perhaps (Clean,Pos) 1.0%,
Perhaps (Clean,Neg) 98.9%]
```

If you look carefully, we have a problem. Most of the employees who test
positive are actually clean! Let’s tweak our code a bit, and try to zoom
in on the positive test results.

Read more...
Tags Haskell, Math, Monads, Probability, ProbabilityMonads, Recommended

Posted by Eric Kidd
Wed, 21 Feb 2007 23:53:00 GMT

In Part 1, we cloned PFP, a library for computing with probability distributions. PFP represents a distribution as a list of possible values, each with an associated probability.

But in the real world, things aren’t always so easy. What if we wanted to pick a random number between 0 and 1? Our previous implementation would break, because there’s an infinite number of values between 0 and 1—they don’t exactly fit in a list.

As it turns out, Sungwoo Park and colleagues found an elegant solution to this problem. They represented probability distributions as sampling functions, resulting in something called the λ_{◯} calculus. (I have no idea how to pronounce this!)

With a little bit of hacking, we can use their sampling functions as a drop-in replacement for PFP.

### A common interface

Since we will soon have two ways to represent probability distributions, we need to define a common interface.

```
type Weight = Float
class (Functor d, Monad d) => Dist d where
weighted :: [(a, Weight)] -> d a
uniform :: Dist d => [a] -> d a
uniform = weighted . map (\x -> (x, 1))
```

The function `uniform`

will create an equally-weighted distribution from a list of values. Using this API, we can represent a two-child family as follows:

```
data Child = Girl | Boy
deriving (Show, Eq, Ord)
child :: Dist d => d Child
child = uniform [Girl, Boy]
family :: Dist d => d [Child]
family = do
child1 <- child
child2 <- child
return [child1, child2]
```

Now, we need to implement this API two different ways: Once with lists, and a second time with sampling functions.

Read more...
Tags Haskell, Math, Monads, Probability, ProbabilityMonads