How to make Data.Set a monad

Posted by Eric Kidd Fri, 16 Mar 2007 02:29:00 GMT

…and how to fake Lisp macros with Template Haskell

(I wrote this article in response to a comment by sigfpe. You may find it pretty dry reading, unless you want to build domain-specific languages in Haskell. Proceed at your own risk.)

Haskell’s built-in Monad type has some serious limitations. We can fix those limitations using a number of advanced Haskell techniques, including Template Haskell, Haskell’s closest equivalent to Lisp macros.

We can illustrate the limitations of Monad with an example from math. In set theory, we can define a set by specifying how to compute each element:

{ xy : x ∈ {1,2,4}, y ∈ {1,2,4} }

We can read this as, “the set of all xy, where x is one of {1,2,4}, and y is one of {1,2,4}.” To calculate the answer, we first multiply together all the possible combinations:

1×1=1, 1×2=2, 1×4=4, 2×1=2, 2×2=4, 2×4=8, 4×1=4, 4×2=8, 4×4=16

We then collect up the answers, and—because we’re working with sets–we throw away the duplicates:

{1,2,4,8,16}

Can we do the same thing in Haskell? Well, using Haskell’s list monad, we can write:

listExample = do
  x <- [1,2,4]
  y <- [1,2,4]
  return (x*y)

But when we run this, Haskell gives us lots of duplicate values:

> listExample
[1,2,4,2,4,8,4,8,16]

Our problem: We’re using lists (which can contain duplicate values) to represent sets (which can’t). Can we fix this by switching to Haskell’s Data.Set?

import qualified Data.Set as S

-- This doesn't work.
setExample = do
  x <- S.fromList [1,2,4]
  y <- S.fromList [1,2,4]
  return (x*y)

Unfortunately, this code fails spectacularly. A Haskell monad is required to work for any types a and b:

class Monad m where
  return :: a -> m a
  fail :: String -> m a
  (>>=) :: m a -> (a -> m b) -> m b

But Data.Set only works for some types. Specifically, it requires that values of type a can be ordered:

data (Ord a) => Set a = ...

As it turns out, we can make Data.Set into a monad. But be warned: The solution involves some pretty ugly Haskell abuse.

Read more...

Tags , ,  | 9 comments

Monads in 15 minutes: Backtracking and Maybe

Posted by Eric Kidd Mon, 12 Mar 2007 23:39:00 GMT

This morning, a programmer visited #haskell and asked how to implement backtracking. Not surprisingly, most of the answers involved monads. After all, monads are ubiquitous in Haskell: They’re used for IO, for probability, for error reporting, and even for quantum mechanics. If you program in Haskell, you’ll probably want to understand monads. So where’s the best place to start?

A friend of mine claims he didn’t truly understand monads until he understood join. But once he figured that out, everything was suddenly obvious. That’s the way it worked for me, too. But relatively few monad tutorials are based on join, so there’s an open niche in a crowded market.

This monad tutorial uses join. Even better, it attempts to cram everything you need to know about monads into 15 minutes. (Hey, everybody needs a gimmick, right?)

Backtracking: The lazy way to code

We begin with a backtracking constraint solver. The idea: Given possible values for x and y, we want to pick those values which have a product of 8:

solveConstraint = do
  x <- choose [1,2,3]
  y <- choose [4,5,6]
  guard (x*y == 8)
  return (x,y)

Every time choose is called, we save the current program state. And every time guard fails, we backtrack to a saved state and try again. Eventually, we’ll hit the right answer:

> take 1 solveConstraint
[(2,4)]

Let’s build this program step-by-step in Haskell. When we’re done, we’ll have a monad.

Read more...

Tags , ,  | 31 comments

8 ways to report errors in Haskell

Posted by Eric Kidd Sun, 11 Mar 2007 00:05:00 GMT

Haskell is a marvellous language, but there are some things I don’t like about it. My least favorite: Haskell has no fewer than 8 different APIs for reporting errors.

To make a bad situation worse, the choice of API varies between popular libraries. To give a particularly unfortunate example, Network.URI.parseURI and Network.HTTP.simpleHTTP report errors in entirely different ways, turning a “download this URL” program into a page of code, nearly half of which is devoted to dealing with various kinds of errors. (The rest is boilerplate that could be refactored into a nice wrapper.)

Let’s begin with a toy function, the simplest possible program that could actually fail:

myDiv x y = x / y

As every algebra student knows, we can’t divide by zero. Using this function as our example, let’s take a look at all the different ways we can implement error-reporting in Haskell.

Read more...

Tags  | 7 comments

Jim Hefferon's Linear Algebra: A free textbook with fascinating applications

Posted by Eric Kidd Wed, 07 Mar 2007 13:01:00 GMT

My college linear algebra course was held early in the morning, and it was devoted almost entirely to blackboard proofs. The professor would stand in front of the room, half asleep, and write:

“Theorem. Lemma. Lemma. Proof. Theorem…”

Despite this experience, I somehow managed to learn about eigenvectors and kernels. Or at least, I learned how to write proofs about them. But I had no intuition for linear algebra: I couldn’t visualize it, and I couldn’t explain why anybody, anywhere, ever cared about eigenvectors.

Years later, in a computer vision class, I finally learned to care about linear algebra. It could solve all sorts of cool problems! (Eigenfaces, in particular, blew me away.) And since then, I’ve encountered linear algebra everywhere. But my intuition is still piecemeal, built from half-a-dozen applications over the years.

My motto for math is, “If it keeps showing up, build a rock-solid intuition for how it works.” And towards that end, I’ve been looking for a good linear algebra textbook.

My ideal linear algebra textbook would:

  1. Include plenty of motivating examples.
  2. Show how to solve real-world problems.
  3. Devote plenty of time to proofs.

The proofs, after all, are necessary in the real world. If you ever attempt to do something slightly odd, you’ll want to prove that it actually works.

Jim Hefferon’s Linear Algebra

Professor Jim Hefferon’s Linear Algebra is available as a free PDF download. But don’t be fooled by the price: Hefferon’s book is better than most of the expensive tomes sold in college bookstores.

Everything in Hefferon’s book is superbly motivated. The first chapter begins with two real-world examples: Unknown weights placed on balances, and the ratios of complex molecules in chemical reactions. These examples are used to introduce Gauss’s method for solving systems of linear equations. Further into the book, the examples begin to tie back to earlier chapters. Determinants, for example, are motivated by the usefulness of recognizing isomorphisms and invertible matrices.

But Hefferon’s emphasis on real-world examples is admirably balanced by an abundance of proofs. The first proof appears on page 4, and nearly everything is proven either in the main text or in the exercises. This will be helpful for readers who (like me) are trying to bring more rigor to their mathematical thinking.

The “Topics”: Fascinating real-world problems

The most delightful part of the book, however, are the “Topics” at the end of each chapter. These cover a wide range of fields, including biology, economics, probability and abstract algebra. The topic “Stable Populations” begins:

Imagine a reserve park with animals from a species that we are trying to protect. The park doesn’t have a fence and so animals cross the boundary, both from the inside out and in the other direction. Every year, 10% of the animals from inside of the park leave, and 1% of the animals from the outside find their way in. We can ask if we can find a stable level of population for this park: is there a population that, once established, will stay constant over time, with the number of animals leaving equal to the number of animals entering?

Hefferon relates the solution to Markov chains and eigenvalues, cementing several important intuitions firmly in place.

Other topics include basic electronics, space shuttle O-rings, and the number of games required to win the World Series. There are plenty of CS-related discussions, too: a survey of things that can go wrong in naive numeric code, the time required to calculate determinants, and how the memory hierarchy affects array layout.

Hefferon’s love for linear algebra is infectious, and his “Topics” will appeal to anybody who does recreational math.

“Free” as in “freedom”

Linear Algebra is published under the GNU Free Documentation License and the Creative Commons Share Alike license.

What this means: You may make copies of the book, or even print them out at a copyshop and charge students a fee. You may also create a custom version of the textbook, and share it with anybody who’s interested. The only restriction: You must “share alike,” honoring the original author’s terms as you pass along the textbook.

Miscellaneous notes

Hefferon has put out a call for extra material. In particular, he’d love to have a section on quantum mechanics:

Several people have asked me about a Topic on eigenvectors and eigenvalues in Quantum Mechanics. Sadly, I don’t know any QM. If you can help, that’d be great.

On the downside, the internal PDF links in Linear Algebra are broken in MacOS X Preview. This is odd, because the LaTeX hyperref package usually works fine with Preview.

The reddit discussion of Linear Algebra has pointers to several other linear algebra textbooks, with varying emphasis. And many other free math textbooks are available online.

If you have any favorite math books (paper or PDF, for any area of mathematics), please feel free to recommend them in the comment thread!

Tags  | 15 comments

Three things I don't understand about monads

Posted by Eric Kidd Mon, 05 Mar 2007 14:32:00 GMT

Monads are a remarkably powerful tool for building specialized programming languages. Some examples include:

But there’s a bunch of things I don’t understand about monads. In each case, my confusion involves some aspect of the underlying math that “bubbles up” to affect the design of specialized languages.

(Warning: Obscure monad geeking ahead.)

Commutative monads

A “commutative monad” is any monad where we can replace the expression:

do a <- ma
   b <- mb
   f a b

…with:

do b <- mb
   a <- ma
   f a b

…without changing the meaning. Examples of commutative monads include Reader and Rand. This is an important property, because it might allow us to parallelize the commonly-used sequence function across huge numbers of processors:

sequence :: (Monad m) => [m a] -> m [a]

Simon Peyton Jones lists this problem as Open Challenge #2, saying:

Commutative monads are very common. (Environment, unique supply, random number generation.) For these, monads over-sequentialise.

Wanted: theory and notation for some cool compromise.

Commutative monad morphisms

Monad morphisms are the category theory equivalent of Haskell’s monad transformers. Haskell’s monad transformers can be expressed as monad layerings, which correspond to the monad morphisms of category theory.

Many complicated monads break down into a handful of monad transformers, often in surprising ways.

But composing monad transformers is a mess, because they interact in poorly-understood ways. In general, the following two types have very different semantics:

FooT (BarT m)
BarT (FooT m)

If FooT and BarT commute with each other, however, the two types would be equivalent. This is helpful when building large stacks of monad transformers.

Chung-chieh Shan encountered a related problem when applying monad morphisms to build a theory of natural language semantics:

It remains to be seen whether monads would provide the appropriate conceptual encapsulation for a semantic theory with broader coverage. In particular, for both natural and programming language semantics, combining monads—or perhaps monad-like objects—remains an open issue that promises additional insight.

Monad morphisms and abstract algebra

Dan Piponi has been drawing some fascinating connections between monad morphisms and abstract algebra. See, for example:

This approach seems to throw a lot of light on monad morphisms—but at least in my case, the light only highlights my confusion.

Of the three problems listed here, this is the one most likely to be discussed in a textbook somewhere. And a solution to this problem would likely help significantly with the other two.

So, my question: Does anybody have any books, papers or ideas that might help untangle this mess?

Update: Be sure to see the comment thread on the second Dan Piponi post above and Chung-chieh Shan’s excellent bibliography on monad transformers.

Tags , ,  | 6 comments

Smart classification using Bayesian monads in Haskell

Posted by Eric Kidd Sat, 03 Mar 2007 14:02:00 GMT

(Refactoring Probability Distributions: part 1, part 2, part 3, part 4)

The world is full of messy classification problems:

  • “Is this order fraudulent?”
  • “It this e-mail a spam?”
  • “What blog posts would Rachel find interesting?”
  • “Which intranet documents is Sam looking for?”

In each case, we want to classify something: Orders are either valid or fraudulent, messages are either spam or non-spam, blog posts are either interesting or boring. Unfortunately, most software is terrible at making these distinctions. For example, why can’t my RSS reader go out and track down the 10 most interesting blog posts every day?

Some software, however, can make these distinctions. Google figures out when I want to watch a movie, and shows me specialized search results. And most e-mail clients can identify spam with over 99% accuracy. But the vast majority of software is dumb, incapable of dealing with the messy dilemmas posed by the real world.

So where can we learn to improve our software?

Outside of Google’s shroud of secrecy, the most successful classifiers are spam filters. And most modern spam filters are inspired by Paul Graham’s essay A Plan for Spam.

So let’s go back to the source, and see what we can learn. As it turns out, we can formulate a lot of the ideas in A Plan for Spam in a straightforward fashion using a Bayesian monad.

Functions from distributions to distributions

Let’s begin with spam filtering. By convention, we divide messages into “spam” and “ham”, where “ham” is the stuff we want to read.

data MsgType = Spam | Ham
  deriving (Show, Eq, Enum, Bounded)

Let’s assume that we’ve just received a new e-mail. Without even looking at it, we know there’s a certain chance that it’s a spam. This gives us something called a “prior distribution” over MsgType.

> bayes msgTypePrior
[Perhaps Spam 64.2%, Perhaps Ham 35.8%]

But what if we know that the first word of the message is “free”? We can use that information to calculate a new distribution.

> bayes (hasWord "free" msgTypePrior)
[Perhaps Spam 90.5%, Perhaps Ham 9.5%]

The function hasWord takes a string and a probability distribution, and uses them to calculate a new probability distribution:

hasWord :: String -> FDist' MsgType ->
           FDist' MsgType
hasWord word prior = do
  msgType <- prior
  wordPresent <-
    wordPresentDist msgType word
  condition wordPresent
  return msgType

This code is based on the Bayesian monad from part 3. As before, the “<-” operator selects a single item from a probability distribution, and “condition” asserts that an expression is true. The actual Bayesian inference happens behind the scenes (handy, that).

If we have multiple pieces of evidence, we can apply them one at a time. Each piece of evidence will update the probability distribution produced by the previous step:

hasWords []     prior = prior
hasWords (w:ws) prior = do
  hasWord w (hasWords ws prior)

The final distribution will combine everything we know:

> bayes (hasWords ["free","bayes"] msgTypePrior)
[Perhaps Spam 34.7%, Perhaps Ham 65.3%]

This technique is known as the naive Bayes classifier. Looked at from the right angle, it’s surprisingly simple.

(Of course, the naive Bayes classifier assumes that all of our evidence is independent. In theory, this is a pretty big assumption. In practice, it works better than you might think.)

But this still leaves us with a lot of questions: How do we keep track of our different classifiers? How do we decide which ones to apply? And do we need to fudge the numbers to get reasonable results?

In the following sections, I’ll walk through various aspects of Paul Graham’s A Plan for Spam, and show how to generalize it. If you want to follow along, you can download the code using Darcs:

darcs get http://www.randomhacks.net/darcs/probability
Read more...

Tags , , , ,  | 5 comments