Probabilistic Functional Programming is cool

Posted by Eric Kidd Sat, 10 Feb 2007 21:14:00 GMT

Syntaxfree is hacking on Martin Erwig’s probability monad. This is one of the coolest monads out there—it allows you to trivially solve all kinds of probability problems.

Mikael Johansson has a good example.

I hope to write a bit more about probability monads soon. There’s already a long post sitting on my hard drive, and some more ideas that I’m still trying to puzzle out.

In the meantime, I’d like to recommend The Haskell Road to Logic, Maths and Programming. There’s an excellent review available.

Tags , , ,  | no comments

Map fusion: Making Haskell 225% faster

Posted by Eric Kidd Sat, 10 Feb 2007 14:55:00 GMT

Or, how to optimize MapReduce, and when folds are faster than loops

Purely functional programming might actually be worth the pain, if you care about large-scale optimization.

Lately, I’ve been studying how to speed up parallel algorithms. Many parallel algorithms, such as Google’s MapReduce, have two parts:

  1. First, you transform the data by mapping one or more functions over each value.
  2. Next, you repeatedly merge the transformed data, “reducing” it down to a final result.

Unfortunately, there’s a couple of nasty performance problems lurking here. We really want to combine all those steps into a single pass, so that we can eliminate temporary working data. But we don’t always want to do this optimization by hand—it would be better if the compiler could do it for us.

As it turns out, Haskell is an amazing testbed for this kind of optimization. Let’s build a simple model, show where it breaks, and then crank the performance way up.

Trees, and the performance problems they cause

We’ll use single-threaded trees for our testbed. They’re simple enough to demonstrate the basic idea, and they can be generalized to parallel systems. (If you want know how, check out the papers at the end of this article.)

A tree is either empty, or it is a node with a left child, a value and a right child:

data Tree a = Empty
            | Node (Tree a) a (Tree a)
  deriving (Show)

Here’s a sample tree containing three values:

tree = (Node left 2 right)
  where left  = (Node Empty 1 Empty)
        right = (Node Empty 3 Empty)

We can use treeMap to apply a function to every value in a tree, creating a new tree:

treeMap :: (a -> b) -> Tree a -> Tree b

treeMap f Empty = Empty
treeMap f (Node l x r) =
  Node (treeMap f l) (f x) (treeMap f r)

Using treeMap, we can build various functions that manipulate trees:

-- Double each value in a tree.
treeDouble tree = treeMap (*2) tree

-- Add one to each value in a tree.
treeIncr tree   = treeMap (+1) tree

What if we want to add up all the values in a tree? Well, we could write a simple recursive sum function:

treeSum Empty = 0
treeSum (Node l x r) =
  treeSum l + x + treeSum r

But for reasons that will soon become clear, it’s much better to refactor the recursive part of treeSum into a reusable treeFold function (“fold” is Haskell’s name for “reduce”):

treeFold f b Empty = b
treeFold f b (Node l x r) =
  f (treeFold f b l) x (treeFold f b r)

treeSum t = treeFold (\l x r -> l+x+r) 0 t

Now we can double all the values in a tree, add 1 to each, and sum up the result:

treeSum (treeIncr (treeDouble tree))

But there’s a very serious problem with this code. Imagine that we’re working with a million-node tree. The two calls to treeMap (buried inside treeIncr and treeDouble) will each create a new million-node tree. Obviously, this will kill our performance, and it will make our garbage collector cry.

Fortunately, we can do a lot better than this, thanks to some funky GHC extensions.

Read more...

Tags , ,  | 7 comments

The first Carnival of Mathematics

Posted by Eric Kidd Fri, 09 Feb 2007 14:25:00 GMT

A blog carnival is a summary of recent posts on an interesting topic. A typical carnival appears weekly or monthly, and moves from blog to blog with each edition.

If you’re into math, go check out the first Carnival of Mathematics. There’s something for everyone:

I’ve gotten plenty of submissions that span the entire gamut of math-blogging: education, pure math, applied math, debunking bad math - it’s all there. Only the gender distribution could be made slightly more equal (and that’s an understatement). I’m linking to the posters in roughly increasing order of mathematical difficulty, but don’t let my opinions deter you from reading the posts closer to the bottom.

Why study math? Even the most abstract math has an uncanny tendency to solve unexpected problems:

[T]he enormous usefulness of mathematics in the natural sciences is something bordering on the mysterious and [there] is no rational explanation for it.

And since I’ve started hacking on Haskell, I’ve been equally surprised at how often deep math can solve real-world programming problems.

Tags  | no comments