Yesterday, I was working on a Haskell program that read in megabytes of data, parsed it, and wrote a subset of the data back to standard output. At first it was pretty fast: 7 seconds for everything.

But then I made the mistake of parsing some floating point numbers, and printing them back out. My performance died: 120 seconds.

You can see similar problems at the Great Language Shootout. Haskell runs at 1/2th the speed of C for many benchmarks, then suddently drops to 1/20th for others.

Here’s what’s going on, and how to fix it.

(Many thanks to Don Stewart and the other folks on #haskell for helping me figure this out!)

Profiling in Haskell ——————–

GHC, the Glasgow Haskell Compiler, has a really nice profiler. To use it, ask the compiler to turn on profiling:

$ ghc -prof -auto-all -O --make ParseData.hs

Then, run your program with some extra flags, and look at the output:

$ ./ParseData +RTS -p -RTS < data.csv > /dev/null
$ less ParseData.prof

You may also need to assign some “cost centers,” which give you fine-grained profiling inside larger functions. See the manual for details.

The Culprits: String, read and show

As it turns out, my program suffered from the typical problems:

  1. Haskell’s standard String type is painfully slow. Code which touches it is doomed to run at 1/20th the speed of C. To fix this problem, use ByteString for high-performance work, which should get you back up near 1/2th the speed of C.

  2. read and show are not your friends. Either replace them with ByteString functions like readInt, or write your own versions by hand. Don Stewart has provided some nice examples that call out to C.

Obviously, the standard library needs to get better in these areas. And it wouldn’t hurt to apply these tips to the Haskell code in the Great Language Shootout, either.

But my program now runs in 14 seconds, and 65% of that time is in the one remaining call to show. So it looks like Haskell is OK for high-performance text-munging after all!

(Update: The latest Haskell performance notes can be found on the Haskell wiki.)