<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/css" href="/stylesheets/rss.css"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/">
  <channel>
    <title>Random Hacks: Tag Performance</title>
    <link>http://www.randomhacks.net/articles/tag/Performance?tag=Performance</link>
    <language>en-us</language>
    <ttl>40</ttl>
    <description>Technology and Other Fun Stuff</description>
    <item>
      <title>Map fusion: Making Haskell 225% faster</title>
      <description>&lt;p&gt;&lt;b&gt;Or, how to optimize MapReduce, and when folds are faster than loops&lt;/b&gt;&lt;/p&gt;

&lt;p&gt;Purely functional programming might actually be worth the pain, if you care about large-scale optimization.&lt;/p&gt;

&lt;p&gt;Lately, I&amp;#8217;ve been studying how to speed up parallel algorithms.  Many
parallel algorithms, such as Google&amp;#8217;s &lt;a href="http://labs.google.com/papers/mapreduce.html"&gt;MapReduce&lt;/a&gt;, have two parts:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;First, you transform the data by mapping one or more functions over each value.&lt;/li&gt;
&lt;li&gt;Next, you repeatedly merge the transformed data, &amp;#8220;reducing&amp;#8221; it down to a
final result.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Unfortunately, there&amp;#8217;s a couple of nasty performance problems lurking here.  We &lt;i&gt;really&lt;/i&gt; want to combine all those steps into a single pass, so that we can eliminate temporary working data. But we don&amp;#8217;t always want to do this optimization by hand&amp;#8212;it would be better if the compiler could do it for us.&lt;/p&gt;

&lt;p&gt;As it turns out, Haskell is an amazing testbed for this kind of
optimization. Let&amp;#8217;s build a simple model, show where it breaks, and then
crank the performance &lt;i&gt;way&lt;/i&gt; up.&lt;/p&gt;

&lt;h3&gt;Trees, and the performance problems they cause&lt;/h3&gt;

&lt;p&gt;We&amp;#8217;ll use single-threaded trees for our testbed. They&amp;#8217;re simple enough to demonstrate the basic idea, and they can be generalized to parallel systems. (If you want know how, check out the papers at the end of this article.)&lt;/p&gt;

&lt;p&gt;A tree is either empty, or it is
a node with a left child, a value and a right child:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_haskell "&gt;&lt;span class='hs-keyword'&gt;data&lt;/span&gt; &lt;span class='hs-conid'&gt;Tree&lt;/span&gt; &lt;span class='hs-varid'&gt;a&lt;/span&gt; &lt;span class='hs-keyglyph'&gt;=&lt;/span&gt; &lt;span class='hs-conid'&gt;Empty&lt;/span&gt;
            &lt;span class='hs-keyglyph'&gt;|&lt;/span&gt; &lt;span class='hs-conid'&gt;Node&lt;/span&gt; &lt;span class='hs-layout'&gt;(&lt;/span&gt;&lt;span class='hs-conid'&gt;Tree&lt;/span&gt; &lt;span class='hs-varid'&gt;a&lt;/span&gt;&lt;span class='hs-layout'&gt;)&lt;/span&gt; &lt;span class='hs-varid'&gt;a&lt;/span&gt; &lt;span class='hs-layout'&gt;(&lt;/span&gt;&lt;span class='hs-conid'&gt;Tree&lt;/span&gt; &lt;span class='hs-varid'&gt;a&lt;/span&gt;&lt;span class='hs-layout'&gt;)&lt;/span&gt;
  &lt;span class='hs-keyword'&gt;deriving&lt;/span&gt; &lt;span class='hs-layout'&gt;(&lt;/span&gt;&lt;span class='hs-conid'&gt;Show&lt;/span&gt;&lt;span class='hs-layout'&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Here&amp;#8217;s a sample tree containing three values:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_haskell "&gt;&lt;span class='hs-definition'&gt;tree&lt;/span&gt; &lt;span class='hs-keyglyph'&gt;=&lt;/span&gt; &lt;span class='hs-layout'&gt;(&lt;/span&gt;&lt;span class='hs-conid'&gt;Node&lt;/span&gt; &lt;span class='hs-varid'&gt;left&lt;/span&gt; &lt;span class='hs-num'&gt;2&lt;/span&gt; &lt;span class='hs-varid'&gt;right&lt;/span&gt;&lt;span class='hs-layout'&gt;)&lt;/span&gt;
  &lt;span class='hs-keyword'&gt;where&lt;/span&gt; &lt;span class='hs-varid'&gt;left&lt;/span&gt;  &lt;span class='hs-keyglyph'&gt;=&lt;/span&gt; &lt;span class='hs-layout'&gt;(&lt;/span&gt;&lt;span class='hs-conid'&gt;Node&lt;/span&gt; &lt;span class='hs-conid'&gt;Empty&lt;/span&gt; &lt;span class='hs-num'&gt;1&lt;/span&gt; &lt;span class='hs-conid'&gt;Empty&lt;/span&gt;&lt;span class='hs-layout'&gt;)&lt;/span&gt;
        &lt;span class='hs-varid'&gt;right&lt;/span&gt; &lt;span class='hs-keyglyph'&gt;=&lt;/span&gt; &lt;span class='hs-layout'&gt;(&lt;/span&gt;&lt;span class='hs-conid'&gt;Node&lt;/span&gt; &lt;span class='hs-conid'&gt;Empty&lt;/span&gt; &lt;span class='hs-num'&gt;3&lt;/span&gt; &lt;span class='hs-conid'&gt;Empty&lt;/span&gt;&lt;span class='hs-layout'&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;We can use &lt;code&gt;treeMap&lt;/code&gt; to apply a function to every value in a
tree, creating a new tree:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_haskell "&gt;&lt;span class='hs-definition'&gt;treeMap&lt;/span&gt; &lt;span class='hs-keyglyph'&gt;::&lt;/span&gt; &lt;span class='hs-layout'&gt;(&lt;/span&gt;&lt;span class='hs-varid'&gt;a&lt;/span&gt; &lt;span class='hs-keyglyph'&gt;-&amp;gt;&lt;/span&gt; &lt;span class='hs-varid'&gt;b&lt;/span&gt;&lt;span class='hs-layout'&gt;)&lt;/span&gt; &lt;span class='hs-keyglyph'&gt;-&amp;gt;&lt;/span&gt; &lt;span class='hs-conid'&gt;Tree&lt;/span&gt; &lt;span class='hs-varid'&gt;a&lt;/span&gt; &lt;span class='hs-keyglyph'&gt;-&amp;gt;&lt;/span&gt; &lt;span class='hs-conid'&gt;Tree&lt;/span&gt; &lt;span class='hs-varid'&gt;b&lt;/span&gt;

&lt;span class='hs-definition'&gt;treeMap&lt;/span&gt; &lt;span class='hs-varid'&gt;f&lt;/span&gt; &lt;span class='hs-conid'&gt;Empty&lt;/span&gt; &lt;span class='hs-keyglyph'&gt;=&lt;/span&gt; &lt;span class='hs-conid'&gt;Empty&lt;/span&gt;
&lt;span class='hs-definition'&gt;treeMap&lt;/span&gt; &lt;span class='hs-varid'&gt;f&lt;/span&gt; &lt;span class='hs-layout'&gt;(&lt;/span&gt;&lt;span class='hs-conid'&gt;Node&lt;/span&gt; &lt;span class='hs-varid'&gt;l&lt;/span&gt; &lt;span class='hs-varid'&gt;x&lt;/span&gt; &lt;span class='hs-varid'&gt;r&lt;/span&gt;&lt;span class='hs-layout'&gt;)&lt;/span&gt; &lt;span class='hs-keyglyph'&gt;=&lt;/span&gt;
  &lt;span class='hs-conid'&gt;Node&lt;/span&gt; &lt;span class='hs-layout'&gt;(&lt;/span&gt;&lt;span class='hs-varid'&gt;treeMap&lt;/span&gt; &lt;span class='hs-varid'&gt;f&lt;/span&gt; &lt;span class='hs-varid'&gt;l&lt;/span&gt;&lt;span class='hs-layout'&gt;)&lt;/span&gt; &lt;span class='hs-layout'&gt;(&lt;/span&gt;&lt;span class='hs-varid'&gt;f&lt;/span&gt; &lt;span class='hs-varid'&gt;x&lt;/span&gt;&lt;span class='hs-layout'&gt;)&lt;/span&gt; &lt;span class='hs-layout'&gt;(&lt;/span&gt;&lt;span class='hs-varid'&gt;treeMap&lt;/span&gt; &lt;span class='hs-varid'&gt;f&lt;/span&gt; &lt;span class='hs-varid'&gt;r&lt;/span&gt;&lt;span class='hs-layout'&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Using &lt;code&gt;treeMap&lt;/code&gt;, we can build various functions that manipulate
trees:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_haskell "&gt;&lt;span class='hs-comment'&gt;-- Double each value in a tree.&lt;/span&gt;
&lt;span class='hs-definition'&gt;treeDouble&lt;/span&gt; &lt;span class='hs-varid'&gt;tree&lt;/span&gt; &lt;span class='hs-keyglyph'&gt;=&lt;/span&gt; &lt;span class='hs-varid'&gt;treeMap&lt;/span&gt; &lt;span class='hs-layout'&gt;(&lt;/span&gt;&lt;span class='hs-varop'&gt;*&lt;/span&gt;&lt;span class='hs-num'&gt;2&lt;/span&gt;&lt;span class='hs-layout'&gt;)&lt;/span&gt; &lt;span class='hs-varid'&gt;tree&lt;/span&gt;

&lt;span class='hs-comment'&gt;-- Add one to each value in a tree.&lt;/span&gt;
&lt;span class='hs-definition'&gt;treeIncr&lt;/span&gt; &lt;span class='hs-varid'&gt;tree&lt;/span&gt;   &lt;span class='hs-keyglyph'&gt;=&lt;/span&gt; &lt;span class='hs-varid'&gt;treeMap&lt;/span&gt; &lt;span class='hs-layout'&gt;(&lt;/span&gt;&lt;span class='hs-varop'&gt;+&lt;/span&gt;&lt;span class='hs-num'&gt;1&lt;/span&gt;&lt;span class='hs-layout'&gt;)&lt;/span&gt; &lt;span class='hs-varid'&gt;tree&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;What if we want to add up all the values in a tree?  Well, we could write a
simple recursive sum function:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_haskell "&gt;&lt;span class='hs-definition'&gt;treeSum&lt;/span&gt; &lt;span class='hs-conid'&gt;Empty&lt;/span&gt; &lt;span class='hs-keyglyph'&gt;=&lt;/span&gt; &lt;span class='hs-num'&gt;0&lt;/span&gt;
&lt;span class='hs-definition'&gt;treeSum&lt;/span&gt; &lt;span class='hs-layout'&gt;(&lt;/span&gt;&lt;span class='hs-conid'&gt;Node&lt;/span&gt; &lt;span class='hs-varid'&gt;l&lt;/span&gt; &lt;span class='hs-varid'&gt;x&lt;/span&gt; &lt;span class='hs-varid'&gt;r&lt;/span&gt;&lt;span class='hs-layout'&gt;)&lt;/span&gt; &lt;span class='hs-keyglyph'&gt;=&lt;/span&gt;
  &lt;span class='hs-varid'&gt;treeSum&lt;/span&gt; &lt;span class='hs-varid'&gt;l&lt;/span&gt; &lt;span class='hs-varop'&gt;+&lt;/span&gt; &lt;span class='hs-varid'&gt;x&lt;/span&gt; &lt;span class='hs-varop'&gt;+&lt;/span&gt; &lt;span class='hs-varid'&gt;treeSum&lt;/span&gt; &lt;span class='hs-varid'&gt;r&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;But for reasons that will soon become clear, it&amp;#8217;s much better to refactor
the recursive part of &lt;code&gt;treeSum&lt;/code&gt; into a reusable
&lt;code&gt;treeFold&lt;/code&gt; function (&amp;#8220;fold&amp;#8221; is Haskell&amp;#8217;s name for &amp;#8220;reduce&amp;#8221;):&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_haskell "&gt;&lt;span class='hs-definition'&gt;treeFold&lt;/span&gt; &lt;span class='hs-varid'&gt;f&lt;/span&gt; &lt;span class='hs-varid'&gt;b&lt;/span&gt; &lt;span class='hs-conid'&gt;Empty&lt;/span&gt; &lt;span class='hs-keyglyph'&gt;=&lt;/span&gt; &lt;span class='hs-varid'&gt;b&lt;/span&gt;
&lt;span class='hs-definition'&gt;treeFold&lt;/span&gt; &lt;span class='hs-varid'&gt;f&lt;/span&gt; &lt;span class='hs-varid'&gt;b&lt;/span&gt; &lt;span class='hs-layout'&gt;(&lt;/span&gt;&lt;span class='hs-conid'&gt;Node&lt;/span&gt; &lt;span class='hs-varid'&gt;l&lt;/span&gt; &lt;span class='hs-varid'&gt;x&lt;/span&gt; &lt;span class='hs-varid'&gt;r&lt;/span&gt;&lt;span class='hs-layout'&gt;)&lt;/span&gt; &lt;span class='hs-keyglyph'&gt;=&lt;/span&gt;
  &lt;span class='hs-varid'&gt;f&lt;/span&gt; &lt;span class='hs-layout'&gt;(&lt;/span&gt;&lt;span class='hs-varid'&gt;treeFold&lt;/span&gt; &lt;span class='hs-varid'&gt;f&lt;/span&gt; &lt;span class='hs-varid'&gt;b&lt;/span&gt; &lt;span class='hs-varid'&gt;l&lt;/span&gt;&lt;span class='hs-layout'&gt;)&lt;/span&gt; &lt;span class='hs-varid'&gt;x&lt;/span&gt; &lt;span class='hs-layout'&gt;(&lt;/span&gt;&lt;span class='hs-varid'&gt;treeFold&lt;/span&gt; &lt;span class='hs-varid'&gt;f&lt;/span&gt; &lt;span class='hs-varid'&gt;b&lt;/span&gt; &lt;span class='hs-varid'&gt;r&lt;/span&gt;&lt;span class='hs-layout'&gt;)&lt;/span&gt;

&lt;span class='hs-definition'&gt;treeSum&lt;/span&gt; &lt;span class='hs-varid'&gt;t&lt;/span&gt; &lt;span class='hs-keyglyph'&gt;=&lt;/span&gt; &lt;span class='hs-varid'&gt;treeFold&lt;/span&gt; &lt;span class='hs-layout'&gt;(&lt;/span&gt;&lt;span class='hs-keyglyph'&gt;\&lt;/span&gt;&lt;span class='hs-varid'&gt;l&lt;/span&gt; &lt;span class='hs-varid'&gt;x&lt;/span&gt; &lt;span class='hs-varid'&gt;r&lt;/span&gt; &lt;span class='hs-keyglyph'&gt;-&amp;gt;&lt;/span&gt; &lt;span class='hs-varid'&gt;l&lt;/span&gt;&lt;span class='hs-varop'&gt;+&lt;/span&gt;&lt;span class='hs-varid'&gt;x&lt;/span&gt;&lt;span class='hs-varop'&gt;+&lt;/span&gt;&lt;span class='hs-varid'&gt;r&lt;/span&gt;&lt;span class='hs-layout'&gt;)&lt;/span&gt; &lt;span class='hs-num'&gt;0&lt;/span&gt; &lt;span class='hs-varid'&gt;t&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Now we can double all the values in a tree, add 1 to each, and sum up the
result:&lt;/p&gt;

&lt;div class="typocode"&gt;&lt;pre&gt;&lt;code class="typocode_haskell "&gt;&lt;span class='hs-definition'&gt;treeSum&lt;/span&gt; &lt;span class='hs-layout'&gt;(&lt;/span&gt;&lt;span class='hs-varid'&gt;treeIncr&lt;/span&gt; &lt;span class='hs-layout'&gt;(&lt;/span&gt;&lt;span class='hs-varid'&gt;treeDouble&lt;/span&gt; &lt;span class='hs-varid'&gt;tree&lt;/span&gt;&lt;span class='hs-layout'&gt;)&lt;/span&gt;&lt;span class='hs-layout'&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;But there&amp;#8217;s a very serious problem with this code.  Imagine that we&amp;#8217;re
working with a million-node tree.  The two calls to &lt;code&gt;treeMap&lt;/code&gt;
(buried inside &lt;code&gt;treeIncr&lt;/code&gt; and &lt;code&gt;treeDouble&lt;/code&gt;) will each
create a &lt;i&gt;new&lt;/i&gt; million-node tree.  Obviously, this will kill our performance,
and it will make our garbage collector cry.&lt;/p&gt;

&lt;p&gt;Fortunately, we can do a lot better than this, thanks to some funky GHC
extensions.&lt;/p&gt;&lt;p&gt;&lt;a href="http://www.randomhacks.net/articles/2007/02/10/map-fusion-and-haskell-performance"&gt;Read More&lt;/a&gt;&lt;/p&gt;</description>
      <pubDate>Sat, 10 Feb 2007 09:55:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:9e56e1ad-e953-4a5a-89ff-e4708979f952</guid>
      <author>Eric Kidd</author>
      <link>http://www.randomhacks.net/articles/2007/02/10/map-fusion-and-haskell-performance</link>
      <category>Haskell</category>
      <category>Performance</category>
      <category>Recommended</category>
      <trackback:ping>http://www.randomhacks.net/articles/trackback/290</trackback:ping>
    </item>
    <item>
      <title>High-Performance Haskell</title>
      <description>&lt;p&gt;Yesterday, I was working on a Haskell program that read in megabytes of data, parsed it, and wrote a subset of the data back to standard output. At first it was pretty fast: &lt;strong&gt;7 seconds&lt;/strong&gt; for everything.&lt;/p&gt;

&lt;p&gt;But then I made the mistake of &lt;em&gt;parsing&lt;/em&gt; some floating point numbers, and printing them back out. My performance died: &lt;strong&gt;120 seconds&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;You can see similar problems at the &lt;a href="http://shootout.alioth.debian.org/gp4/benchmark.php?test=all&amp;amp;lang=ghc&amp;amp;lang2=gcc"&gt;Great Language Shootout&lt;/a&gt;. Haskell runs at 1/2th the speed of C for many benchmarks, then suddently drops to 1/20th for others.&lt;/p&gt;

&lt;p&gt;Here&amp;#8217;s what&amp;#8217;s going on, and how to fix it.&lt;/p&gt;

&lt;p&gt;(Many thanks to Don Stewart and the other folks on #haskell for helping me figure this out!)&lt;/p&gt;&lt;p&gt;&lt;a href="http://www.randomhacks.net/articles/2007/01/22/high-performance-haskell"&gt;Read More&lt;/a&gt;&lt;/p&gt;</description>
      <pubDate>Mon, 22 Jan 2007 08:32:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:c4ca3a93-c9f2-410b-854c-8a8c49481f10</guid>
      <author>Eric Kidd</author>
      <link>http://www.randomhacks.net/articles/2007/01/22/high-performance-haskell</link>
      <category>Haskell</category>
      <category>Performance</category>
      <trackback:ping>http://www.randomhacks.net/articles/trackback/236</trackback:ping>
    </item>
  </channel>
</rss>

