High-Performance Haskell

Yesterday, I was working on a Haskell program that read in megabytes of data, parsed it, and wrote a subset of the data back to standard output. At first it was pretty fast: 7 seconds for everything.

But then I made the mistake of parsing some floating point numbers, and printing them back out. My performance died: 120 seconds.

You can see similar problems at the Great Language Shootout. Haskell runs at 1/2th the speed of C for many benchmarks, then suddently drops to 1/20th for others.

Here’s what’s going on, and how to fix it.

(Many thanks to Don Stewart and the other folks on #haskell for helping me figure this out!)

Profiling in Haskell ——————–

GHC, the Glasgow Haskell Compiler, has a really nice profiler. To use it, ask the compiler to turn on profiling:

$ ghc -prof -auto-all -O --make ParseData.hs

Then, run your program with some extra flags, and look at the output:

$ ./ParseData +RTS -p -RTS < data.csv > /dev/null
$ less ParseData.prof

You may also need to assign some “cost centers,” which give you fine-grained profiling inside larger functions. See the manual for details.

The Culprits: String, read and show

As it turns out, my program suffered from the typical problems:

Haskell’s standard String type is painfully slow. Code which touches it is doomed to run at 1/20th the speed of C. To fix this problem, use ByteString for high-performance work, which should get you back up near 1/2th the speed of C.
read and show are not your friends. Either replace them with ByteString functions like readInt, or write your own versions by hand. Don Stewart has provided some nice examples that call out to C.

Obviously, the standard library needs to get better in these areas. And it wouldn’t hurt to apply these tips to the Haskell code in the Great Language Shootout, either.

But my program now runs in 14 seconds, and 65% of that time is in the one remaining call to show. So it looks like Haskell is OK for high-performance text-munging after all!

(Update: The latest Haskell performance notes can be found on the Haskell wiki.)

Michael wrote on Jan 22, 2007:

Wow. I always knew the overhead of strings was pretty stiff in Haskell, but I don’t think I ever realized just how stiff. Still, it’s nice to know that what was slowing things down wasn’t an intrinsic feature of the evaluation model.

Eric Kidd wrote on Jan 22, 2007:

Yeah, lazy evaluation and the lack of destructive updates seem to cost you about 50-70% of the performance you’d get from a tuned C program. Of course, you’ll gain that back with one iteration of Moore’s Law, so who cares?

Haskell’s lazy ByteString really is a wonder, though. Internally, it’s a lazy list of strict arrays, combined with a complete reimplementation of Haskell’s list API. You can use it as a drop-in replacement for String (except for read and show).

Haskell’s Data.Map is also screamingly fast. So you really can write text processing code in Haskell. You just have to know where the pitfalls are. :-/

All in all, this is excellent news: It makes Haskell one of the fastest high-level “scripting”¹ languages out there, running faster than anything but O’Caml and the better Lisp compilers.

¹ For extremely demented values of “scripting.”

Gary wrote on Jan 23, 2007:

I also discovered this problem and solution before leaving that hellhole. I thought I’d throw in a couple of notes:

First, there have been long-standing rumors that people are developing a parser library based on ByteStrings – based on parsec, I believe. I personally had a brief crack at it, using John Meacham’s frisby library. I ended up stopping when it occured to me that things that relied on

f (c:cs) = c:cs

being id wouldn’t work so well for ByteStrings. (While they would “work” for lazy ByteStrings, converting ByteStrings into [Char] in the first pass is not quite ideal.)

Second, if you’ve written your own readX functions, you might have discovered that the ByteString’s readInt function is written quite in a quite c-like style. However, in a message on the Haskell-cafe mailing list, Chris Kuklewicz demonstrated that more functional formulations of the same code had roughly similar performance.

I think the main point has little to do with Haskell and much to do with the legacy of using cons-lists for everything. Of course, Haskell doesn’t help by reserving the best syntax for that particular data structure, but maybe someday we’ll have views and can leave that legacy behind as well.

Eric Kidd wrote on Jan 23, 2007:

Thank you for the link to your blog! I especially like your essays on “Functional Anti-patterns,” some of which make me twitch.

And, yeah, pattern-matching is a weird corner of Haskell—you can only match against constructors, not arbitrary functions, and patterns aren’t reified in any useful sort of way.

And it’s rather annoying that ByteString requires a reimplementation of the entire Prelude and anything which uses it. Why not provide typeclasses that generalize over the different string representations?

(While we’re discussing Haskell performance, Amanda Clare has some good material on Haskell optimization, including an argument that laziness is liability when working with large data sets. She also co-authored Data Mining the Yeast Genome in a Lazy Functional Language (PDF), so she presumably has a pretty good idea of the issues involved.)

Gary wrote on Jan 26, 2007:

Honestly, I think it’s because nobody has yet figured out how to get a Ph.D. thesis out of it.

The problem with attempting such a class at this point is that it would end up being a major rewrite of the prelude, among other things. It would probably also involve rethinking several of the changes in Haskell 98 – particularly restricting monad comprehensions to the [x] monad. Finally, to do it properly, you need to address the pattern matching issue you mentioned. Simon PJ has a largely worthless proposal here which has received the not-very-enthusiastic response I think it deserves; Wadler’s original paper was closer, in my opinion, but is currently out of favor amongst the Haskell Illuminati because of perceived problems with equational reasoning. (For more, see the relevant sections here.)

So, in my opinion, there is a correct answer. It’s actually a very appealing correct answer. Consider, for instance, that Torsten Grust’s Ph.D. thesis (linked from here) provides a semantics for OQL in terms of monad comprehensions; this could provide a more natural embedding of large-scale data operations in Haskell than any other language I know of. However, getting there from here seems to require a largish number of scattered changes, improvements, and arguments, and so far nobody’s attempted it.

(Incidentally, at ICFP last year I made an off-hand remark to Simon PJ about something being far easier “if only someone would get around to fixing Haskell collections.” He responded that well, yes, that would be a big job, eh?)

Having started such a project, I think the Haskell community would be in a better position to start addressing the problem of handling large amounts of data. I disagree that laziness is always a liability; on the other hand, as with the ByteString library, I think the important step is to provide a low level library that encompasses the necessary strictness while exposing a higher-level lazy interface. My personal experience with handling large data files tops out around 132MB (about 600K data items), and the processing was conveniently sequential – i.e., while we were building a single view of the data which could be manipulated, said view could be constructed incrementally without having to keep the majority of the data in memory. In this case, we got a significant advantage from both laziness and garbage collection, in that irrelevant portions of the overall view were not constructed, and we were actually able to maintain a constant memory footprint no matter the size of the data file. When we added a whole bunch of strictness (in this case, because we were constrained by a ridiculous decision to serialize every state change and send it to a Java-based client), we saw a degradation in performance and memory usage larger than we would have expected from the serialization alone; that led me to be fairly confident that our earlier success was coming from laziness.

With that said, I suspect relatively few applications fit themselves as nicely to that kind of catamorphic representation. Thus, having libraries to making encoding them easier and to handle the lower-level details of convincing non-catamorphic processes to look catamorphic is probably fairly essential to Haskell’s success in data-heavy problems.

Eric Kidd wrote on Jan 26, 2007:

Torsten Grust’s thesis absolutely rocks my world—I read it with Piece’s “Basic Category Theory for Computer Scientists” open at my side, and I took the time to implement the F-algebra over K₁+K_E×Id in Haskell as literally as possible.

Grust’s representation of OQL as monad comprehensions is extremely slick, and it can be trivially compiled to catamorphisms. That much, as least, could be handled by minor extensions to standard Haskell.

But to get any performance out of his approach, don’t you need to be able to pattern-match against the query comprehensions and extract some optimizable combinators (i.e., ones which can use an index to filter records)? Otherwise, aren’t you stuck with only the catamorphisms, which are just fancy brute-force loops over your tables?

Now, HaskellDB works around this problem by only allowing you to manipulate “proxy” values, so that you never run any Haskell code against real data. Instead, you record the operations and build some SQL. But as with any such hack, you need to overload every function to know about your proxy type.

So as far as I can tell, there’s no way to make Grust’s work feasible in standard Haskell without writing Yet Another Abstract Interpreter Kludge. (Well, maybe you could do something with GHC’s rewrite rules, but that’s just scary.) And I think even Microsoft’s LINQ gives you access to the ASTs of the user’s queries, so you can pick them apart and feed them to a query optimizer.

But I do agree with you about laziness: Despite the danger of massive space leaks, it can sometimes turn out to be a big win. You can set up expensive analytic steps, and only execute them if you need the information.

But if your program isn’t one big catamorphism (either a map or a reduce, or some combination), then you need to be pretty careful to make your records strict—or else you’ll eat an order of magnitude too much memory.

Gary wrote on Jan 30, 2007:

You would need to be able to pull the queries apart and extract the efficient query operators in the middle. It’s fairly easy to imagine doing so with either Template Haskell or, (better, but non-existent) a partial evaluator.

I doubt we’ll have a language in the near future in which every application that requires large data sets will run wonderfully no matter how the program is written. MS is probably the closest with LINQ and spreading SQL Server around. My hope, though, would be that having a model of queries, relations, etc. that’s close to the language’s natural expressions would eventually allow Haskell libraries to come closer to bridging the gap without as heavy an additional burden as SQL Server.

wellsed@gmail.com wrote on Feb 01, 2007:

“I doubt we’ll have a language in the near future in which every application that requires large data sets will run wonderfully no matter how the program is written.”

How about Keisli? Its written on top of SML, but it was designed for this purpose. A quick Google search yields papers and such. It has changed its name in more recent years, so you may have to search for the recent implementations. I also think they are now proprietary, but the research papers could yield some knowledge about the relevant issues.

Revence 27 wrote on Feb 20, 2007:

Hey, I was trying to mine ID3v1 tags out of MP3 files, with a Haskell program, and I thought the bugger had died on me. A Google (haskell large data sets) led me here. Hmm … Going to check this stuff out. I’m a n00b, as you may have guessed.

The Culprits: String, read and show

More posts