Yesterday, I was working on a Haskell program that read in megabytes of data, parsed it, and wrote a subset of the data back to standard output. At first it was pretty fast: 7 seconds for everything.
But then I made the mistake of parsing some floating point numbers, and printing them back out. My performance died: 120 seconds.
You can see similar problems at the Great Language Shootout. Haskell runs at 1/2th the speed of C for many benchmarks, then suddently drops to 1/20th for others.
Here's what's going on, and how to fix it.
(Many thanks to Don Stewart and the other folks on #haskell for helping me figure this out!)
Profiling in Haskell
GHC, the Glasgow Haskell Compiler, has a really nice profiler. To use it, ask the compiler to turn on profiling:
$ ghc -prof -auto-all -O --make ParseData.hs
Then, run your program with some extra flags, and look at the output:
$ ./ParseData +RTS -p -RTS < data.csv > /dev/null $ less ParseData.prof
You may also need to assign some "cost centers," which give you fine-grained profiling inside larger functions. See the manual for details.
The Culprits: String, read and show
As it turns out, my program suffered from the typical problems:
Stringtype is painfully slow. Code which touches it is doomed to run at 1/20th the speed of C. To fix this problem, use
ByteStringfor high-performance work, which should get you back up near 1/2th the speed of C.
showare not your friends. Either replace them with
readInt, or write your own versions by hand. Don Stewart has provided some nice examples that call out to C.
Obviously, the standard library needs to get better in these areas. And it wouldn't hurt to apply these tips to the Haskell code in the Great Language Shootout, either.
But my program now runs in 14 seconds, and 65% of that time is in the one remaining call to
show. So it looks like Haskell is OK for high-performance text-munging after all!
(Update: The latest Haskell performance notes can be found on the Haskell wiki.)