podsnaphttps://blog.podsnap.com/2018-10-17T00:00:00-04:00Apply Within - Bringing applicative desugaring to scala for-notation2018-10-17T00:00:00-04:002018-10-17T00:00:00-04:00Peter Fraenkeltag:blog.podsnap.com,2018-10-17:/apply.html<h1>Stupidly Obscure Programming in a Troubled Time</h1>
<p><img src="./images/patrick-hammer.png"></p>
<p>Since obsessively underlining passages in a tattered copy of
<a href="https://www.powells.com/book/-9780749390549">Goodbye to Berlin</a>
hasn't proven to be the uplifting diversion I was hoping for, I
resolved to bash my head against some really complicated scala code that I'm not
qualified to write and that nobody is asking for either.</p>
<p>So that this exercise in self-abuse could pass for a reasonable use
of my time - but not too reasonable, since that would be a giveaway -
I decided to link it to an existing, longstanding obsession of mine:
paradigms and constructs for concurrency.</p>
<p>Warning of what's …</p><h1>Stupidly Obscure Programming in a Troubled Time</h1>
<p><img src="./images/patrick-hammer.png"></p>
<p>Since obsessively underlining passages in a tattered copy of
<a href="https://www.powells.com/book/-9780749390549">Goodbye to Berlin</a>
hasn't proven to be the uplifting diversion I was hoping for, I
resolved to bash my head against some really complicated scala code that I'm not
qualified to write and that nobody is asking for either.</p>
<p>So that this exercise in self-abuse could pass for a reasonable use
of my time - but not too reasonable, since that would be a giveaway -
I decided to link it to an existing, longstanding obsession of mine:
paradigms and constructs for concurrency.</p>
<p>Warning of what's to come:</p>
<ol>
<li>Concurrent data retrieval in Haskell with <code>Haxl</code> and in Scala with <code>Fetch</code>,
observing that <code>Haxl</code> benefits from extensions to the <code>do</code> construct that
we don't have with Scala's <code>for</code>.</li>
<li>Tedious construction of a Scala macro that purports to extend <code>for</code> in the
same fashion. Many sad lessons are learned along the way.</li>
<li>Demonstration of success in this endeavor, with some rueful acknowledgments.</li>
<li>An unexpected rant about programming that uses types with names from category
theory in combination with blocks of expressions involving left-arrows.</li>
</ol>
<h1>Waiting for guff</h1>
<p><img src="./images/guffman.jpg" width=500></p>
<p>A couple of years ago, I offered to the the ungrateful internet a numbing
<a href="http://blog.podsnap.com/qaxl.html">fugue of mumblings</a> that
purported to explain how parallelism and batching were achieved in the
<a href="https://www.youtube.com/watch?v=sT6VJkkhy0o">Haxl</a> package for Haskell.
For no obvious reason,
nearly all my code was in Clojure, which made no sense in the context
of Haskell, but paid some dividends in the end, because the modern[ish] miracle
of homoiconicity, plus a nice interface to
<a href="http://docs.paralleluniverse.co/quasar/">quasar</a> fibers, facilitated an alternate
implementation based on asynchronicity and no monadic guff whatsoever.</p>
<p>Two years is a long time to wait for monadic guff, but good things - well,
<em>things</em> anyway, come to those who wait.</p>
<p>Buried within the
verbiage<sup id="fnref:verbiage"><a class="footnote-ref" href="#fn:verbiage">1</a></sup> of the my previous post were a few pieces of information
that will be useful today.</p>
<p>First, the Haxl is a Monad, allowing data retrieval operations to be composed
in do notation. In this contrived example, we retrieve the BFF of each member of
some <code>grp</code>, and then we produce a list of the full names of each of the BFFs.</p>
<div class="highlight"><pre><span></span><code> <span class="n">runHaxl</span> <span class="n">env</span> <span class="o">$</span> <span class="kr">do</span>
<span class="n">ids</span> <span class="ow"><-</span> <span class="n">getIds</span> <span class="n">grp</span>
<span class="n">mapM</span> <span class="n">getBffName</span>
<span class="kr">where</span>
<span class="n">getBffName</span> <span class="n">id</span> <span class="ow">=</span> <span class="kr">do</span>
<span class="n">bffId</span> <span class="ow"><-</span> <span class="n">getBff</span> <span class="n">id</span>
<span class="n">fn</span> <span class="ow"><-</span> <span class="n">getFirstName</span> <span class="n">bffId</span>
<span class="n">ln</span> <span class="ow"><-</span> <span class="n">getLastName</span> <span class="n">bffId</span>
<span class="n">fn</span> <span class="o">++</span> <span class="s">" "</span> <span class="o">++</span> <span class="n">ln</span>
</code></pre></div>
<p>The cool part is that, assuming proper implementations of <code>getBff</code>, <code>getFirstName</code>
and <code>getLastName</code>, the queries to assemble this information will be batched into
two queries that</p>
<ol>
<li>retrieve the <code>bffId</code>s for everyone in the group</li>
<li>retrieve the first and last names for all the <code>bffId</code>s so obtained.</li>
</ol>
<p>It does this by not performing the queries as soon as they're made, but (loosely)
by building up a tree of which queries depend on the results of which other ones,
accumulating as many queries as it can do without those unknown dependencies and
executing them in a batch, then repeating, using the all available information based
on previous queries.</p>
<p>There's a bit of special magic necessary to thwart the usual ordering described
in nested <code>fmap</code> and <code>>>=</code>. In fact, the Haskell compiler has been
<a href="https://ghc.haskell.org/trac/ghc/wiki/ApplicativeDo">rejiggered</a> to take advantage
of the fact that</p>
<ul>
<li><code>Haxl</code> is an applicative functor</li>
<li>Neither the <code>getLastName</code> nor <code>getFirstName</code> call depends on the other</li>
</ul>
<p>and automatically translate the 2nd <code>do</code> block to</p>
<div class="highlight"><pre><span></span><code> <span class="n">getBffName</span> <span class="n">id</span> <span class="ow">=</span> <span class="kr">do</span>
<span class="n">bffId</span> <span class="ow"><-</span> <span class="n">getBff</span> <span class="n">id</span>
<span class="p">(</span><span class="n">fn</span><span class="p">,</span><span class="n">ln</span><span class="p">)</span> <span class="ow"><-</span> <span class="p">(,)</span> <span class="o">$</span> <span class="p">(</span><span class="n">getFirstName</span> <span class="n">bffId</span><span class="p">)</span> <span class="o"><*></span> <span class="p">(</span><span class="n">getLastname</span> <span class="n">bffId</span><span class="p">)</span>
<span class="n">fn</span> <span class="o">++</span> <span class="s">" "</span> <span class="o">++</span> <span class="n">ln</span>
</code></pre></div>
<p>where <code>fn</code> and <code>ln</code> are now retrieved simultaneously.</p>
<p>Without this magic, there would have been three queries, one to get the <code>bffId</code>s,
and then one each to get the first and last names. With the magic, the last two
queries are combined. So magic gets us from 9:3 to 9:2.</p>
<h2>Go fetch</h2>
<p><img src="./images/fetch-with-ruff-ruffman.jpg" width=400 ></p>
<p>In Scala, there's a library similar to Haxl, called <a href="https://www.47deg.com/blog/fetch-scala-library/">Fetch</a>,
which (again, assuming that proper data sources have been written), can
batch together what look like queries for individual items. Just accept for now
that the various <code>getXXX</code> methods return <code>Fetch</code> instances.</p>
<div class="highlight"><pre><span></span><code> <span class="n">getIds</span><span class="p">(</span><span class="n">grp</span><span class="p">).</span><span class="n">traverse</span> <span class="p">{</span><span class="n">id</span> <span class="o">=></span>
<span class="k">for</span><span class="p">(</span><span class="n">bffId</span> <span class="o"><-</span> <span class="n">getBff</span><span class="p">(</span><span class="n">id</span><span class="p">)</span>
<span class="n">fn</span> <span class="o"><-</span> <span class="n">getFirstName</span><span class="p">(</span><span class="n">bffId</span><span class="p">)</span>
<span class="n">ln</span> <span class="o"><-</span> <span class="n">getLastName</span> <span class="n">bffId</span><span class="p">)</span>
<span class="k">yield</span> <span class="s">s"</span><span class="si">$</span><span class="n">fn</span><span class="s"> </span><span class="si">$</span><span class="n">ln</span><span class="s">"</span>
<span class="p">}</span>
</code></pre></div>
<p>Unfortunately, there is no special applicative magic in Scala's <code>for</code>
comprehension, so if we wanted to get the job done in two batches, we'd
have to do the applicative transform ourselves...</p>
<p>Sort of. Scala is a language that, depending on your perspective, either
provides sophisticated support for multi-argument functions, or fails
to recognize that multi-argument functions are actually curried
single-argument functions, so "applicative" in Scala is sort of a
qualitative equivalent: Not support per se for a <code><*></code> combinator</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">starcyclops</span><span class="p">[</span><span class="nc">F</span><span class="p">[</span><span class="n">_</span><span class="p">],</span> <span class="nc">A</span><span class="p">,</span> <span class="nc">B</span><span class="p">](</span><span class="n">f</span><span class="p">:</span> <span class="nc">F</span><span class="p">[</span><span class="nc">A</span> <span class="o">=></span> <span class="nc">B</span><span class="p">]):</span> <span class="nc">F</span><span class="p">[</span><span class="nc">A</span><span class="p">]</span> <span class="o">=></span> <span class="nc">F</span><span class="p">[</span><span class="nc">B</span><span class="p">]</span>
</code></pre></div>
<p>but rather support for what you would want to accomplish by chaining such
things together, that is, apply a function of N parameters to N arguments
of type <code>F[_]</code>, e.g.</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">mapN</span><span class="p">[</span><span class="nc">F</span><span class="p">[</span><span class="n">_</span><span class="p">],</span> <span class="nc">T1</span><span class="p">,</span> <span class="p">...</span> <span class="nc">Tn</span><span class="p">,</span> <span class="nc">R</span><span class="p">](</span><span class="n">f</span><span class="p">:</span> <span class="p">(</span><span class="nc">T1</span><span class="p">,</span><span class="nc">T2</span> <span class="p">...</span> <span class="nc">Tn</span><span class="p">)</span> <span class="o">=></span> <span class="nc">R</span><span class="p">))</span>
<span class="p">(</span><span class="nc">F</span><span class="p">[</span><span class="nc">T1</span><span class="p">],</span><span class="nc">F</span><span class="p">[</span><span class="nc">T2</span><span class="p">]</span> <span class="p">...</span> <span class="nc">F</span><span class="p">[</span><span class="nc">Tn</span><span class="p">]):</span> <span class="nc">F</span><span class="p">[</span><span class="nc">R</span><span class="p">]</span>
</code></pre></div>
<p>In addition to the <code>mapN</code>, there are various syntactical conveniences. For
example, it's very common to want <code>mapN(TupleN.apply _)(f1, f2, ... fN)</code>, so
CATS provides syntactic sugar to express this as <code>(f1, f2, ... fN).tupled</code>.</p>
<p>Thus, the "applicative transform" that we must perform manually thus looks like this:</p>
<div class="highlight"><pre><span></span><code> <span class="n">getIds</span><span class="p">(</span><span class="n">grp</span><span class="p">).</span><span class="n">traverse</span> <span class="p">{</span><span class="n">id</span> <span class="o">=></span>
<span class="k">for</span><span class="p">(</span><span class="n">bffId</span> <span class="o"><-</span> <span class="n">getBff</span><span class="p">(</span><span class="n">id</span><span class="p">)</span>
<span class="p">(</span><span class="n">fn</span><span class="p">,</span><span class="n">ln</span><span class="p">)</span> <span class="o"><-</span> <span class="p">(</span><span class="n">getFirstName</span><span class="p">(</span><span class="n">bffId</span><span class="p">),</span> <span class="n">getLastName</span> <span class="n">bffId</span><span class="p">).</span><span class="n">tupled</span>
<span class="k">yield</span> <span class="s">s"</span><span class="si">$</span><span class="n">fn</span><span class="s"> </span><span class="si">$</span><span class="n">ln</span><span class="s">"</span>
<span class="p">}</span>
</code></pre></div>
<p>In the next section, we'll try to verify that Fetch actually delivers the
expected concurrency.</p>
<h2>Quick demonstration of concurrency with Fetch</h2>
<p>Closely following the examples in the <a href="http://47deg.github.io/fetch/docs.html">fetch documentation</a>,
I wrote some fake data sources that pretend to divine best-friends-for-life and
names, but actually just look them up in a static dictionary. All data sources extend
a trait defined in the Fetch library,</p>
<div class="highlight"><pre><span></span><code><span class="k">trait</span> <span class="nc">DataSource</span><span class="p">[</span><span class="nc">I</span><span class="p">,</span> <span class="nc">A</span><span class="p">]</span> <span class="p">{</span>
<span class="k">def</span> <span class="nf">name</span><span class="p">:</span> <span class="nc">String</span>
<span class="k">def</span> <span class="nf">fetch</span><span class="p">[</span><span class="nc">F</span><span class="p">[</span><span class="n">_</span><span class="p">]</span> <span class="p">:</span> <span class="nc">ConcurrentEffect</span> <span class="p">:</span> <span class="nc">Par</span><span class="p">](</span><span class="n">id</span><span class="p">:</span> <span class="nc">I</span><span class="p">):</span> <span class="nc">F</span><span class="p">[</span><span class="nc">Option</span><span class="p">[</span><span class="nc">A</span><span class="p">]]</span>
<span class="k">def</span> <span class="nf">batch</span><span class="p">[</span><span class="nc">F</span><span class="p">[</span><span class="n">_</span><span class="p">]</span> <span class="p">:</span> <span class="nc">ConcurrentEffect</span> <span class="p">:</span> <span class="nc">Par</span><span class="p">](</span><span class="n">ids</span><span class="p">:</span> <span class="nc">NonEmptyList</span><span class="p">[</span><span class="nc">I</span><span class="p">]):</span> <span class="nc">F</span><span class="p">[</span><span class="nc">Map</span><span class="p">[</span><span class="nc">I</span><span class="p">,</span> <span class="nc">A</span><span class="p">]]</span>
<span class="p">}</span>
</code></pre></div>
<p>where the <code>fetch</code> method fetches one value, and <code>batch</code> batches up several at once. Note that,
while the trait is defined in the Fetch library, it nowhere refers to other Fetch
types. The implicit <code>ConcurrentEffect</code> is just something with a <code>runCancelable</code> method,
which does what it sounds like it does, and <code>Par</code> has a <code>parallel</code> method, whose
return value gives us access to various <code>Applicative</code> methods. So what we're doing
is guaranteeing that <code>fetch</code> and <code>batch</code> will be used with constructs that can
be run in parallel. Those classes come from the <a href="https://typelevel.org/cats/">Cats</a>
library, rather than Fetch.</p>
<p>The boilerplate for my fake data sources can be condensed into a sub-trait,</p>
<div class="highlight"><pre><span></span><code> <span class="k">trait</span> <span class="nc">DictSource</span><span class="p">[</span><span class="nc">A</span><span class="p">,</span><span class="nc">B</span><span class="p">]</span> <span class="k">extends</span> <span class="nc">DataSource</span><span class="p">[</span><span class="nc">A</span><span class="p">,</span><span class="nc">B</span><span class="p">]</span> <span class="p">{</span>
<span class="k">protected</span> <span class="kd">val</span> <span class="n">dict</span><span class="p">:</span> <span class="nc">Map</span><span class="p">[</span><span class="nc">A</span><span class="p">,</span><span class="nc">B</span><span class="p">]</span>
<span class="k">final</span> <span class="k">override</span> <span class="k">def</span> <span class="nf">fetch</span><span class="p">[</span><span class="nc">F</span><span class="p">[</span><span class="n">_</span><span class="p">]</span> <span class="p">:</span> <span class="nc">ConcurrentEffect</span> <span class="p">:</span> <span class="nc">Par</span><span class="p">](</span><span class="n">a</span><span class="p">:</span> <span class="nc">A</span><span class="p">)</span> <span class="o">=</span> <span class="n">logReturn</span><span class="p">(</span><span class="n">dict</span><span class="p">.</span><span class="n">get</span><span class="p">(</span><span class="n">a</span><span class="p">),</span> <span class="s">s"One </span><span class="si">$</span><span class="n">name</span><span class="s">"</span><span class="p">)</span>
<span class="k">final</span> <span class="k">override</span> <span class="k">def</span> <span class="nf">batch</span><span class="p">[</span><span class="nc">F</span><span class="p">[</span><span class="n">_</span><span class="p">]</span> <span class="p">:</span> <span class="nc">ConcurrentEffect</span> <span class="p">:</span> <span class="nc">Par</span><span class="p">](</span><span class="n">as</span><span class="p">:</span> <span class="nc">NonEmptyList</span><span class="p">[</span><span class="nc">A</span><span class="p">])</span> <span class="o">=</span>
<span class="n">logReturn</span><span class="p">(</span><span class="n">dict</span><span class="p">.</span><span class="n">filterKeys</span><span class="p">(</span><span class="n">as</span><span class="p">.</span><span class="n">toList</span><span class="p">.</span><span class="n">contains</span><span class="p">),</span> <span class="s">s"</span><span class="si">${</span><span class="n">as</span><span class="p">.</span><span class="n">size</span><span class="si">}</span><span class="s"> x </span><span class="si">$</span><span class="n">name</span><span class="s">"</span><span class="p">)</span>
<span class="k">private</span> <span class="k">def</span> <span class="nf">logReturn</span><span class="p">[</span><span class="nc">F</span><span class="p">[</span><span class="n">_</span><span class="p">]</span> <span class="p">:</span> <span class="nc">ConcurrentEffect</span> <span class="p">:</span> <span class="nc">Par</span><span class="p">,</span> <span class="nc">A</span><span class="p">](</span><span class="n">result</span><span class="p">:</span> <span class="nc">A</span><span class="p">,</span> <span class="n">msg</span><span class="p">:</span> <span class="nc">String</span><span class="p">)</span> <span class="o">=</span> <span class="p">{</span>
<span class="kd">val</span> <span class="n">id</span> <span class="o">=</span> <span class="nc">Thread</span><span class="p">.</span><span class="n">currentThread</span><span class="p">.</span><span class="n">getId</span>
<span class="nc">Sync</span><span class="p">[</span><span class="nc">F</span><span class="p">].</span><span class="n">delay</span><span class="p">(</span><span class="n">println</span><span class="p">(</span><span class="s">s"Requesting [tid=</span><span class="si">$</span><span class="n">id</span><span class="s">] </span><span class="si">$</span><span class="n">msg</span><span class="s">"</span><span class="p">))</span> <span class="o">>></span>
<span class="nc">Sync</span><span class="p">[</span><span class="nc">F</span><span class="p">].</span><span class="n">delay</span><span class="p">(</span><span class="nc">Thread</span><span class="p">.</span><span class="n">sleep</span><span class="p">(</span><span class="mi">100</span><span class="p">))</span> <span class="o">>></span>
<span class="nc">Sync</span><span class="p">[</span><span class="nc">F</span><span class="p">].</span><span class="n">delay</span><span class="p">(</span><span class="n">println</span><span class="p">(</span><span class="s">s"Receiving [tid=</span><span class="si">$</span><span class="n">id</span><span class="s">] </span><span class="si">$</span><span class="n">msg</span><span class="s">"</span><span class="p">))</span> <span class="o">>></span>
<span class="nc">Sync</span><span class="p">[</span><span class="nc">F</span><span class="p">].</span><span class="n">pure</span><span class="p">(</span><span class="n">result</span><span class="p">)</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<p>that looks for results in its <code>dict</code>ionary, and, conveniently, documents the
data requests on stdout so we can see what's happening and when. Other
than thinking up some amusing names, implementing the three data sources
is trivial.</p>
<div class="highlight"><pre><span></span><code> <span class="k">type</span> <span class="nc">PersonId</span> <span class="o">=</span> <span class="nc">Int</span>
<span class="k">type</span> <span class="nc">LastName</span> <span class="o">=</span> <span class="nc">String</span>
<span class="k">type</span> <span class="nc">FirstName</span> <span class="o">=</span> <span class="nc">String</span>
<span class="k">type</span> <span class="nc">BFF</span> <span class="o">=</span> <span class="nc">PersonId</span>
<span class="k">implicit</span> <span class="k">object</span> <span class="nc">BFFSource</span> <span class="k">extends</span> <span class="nc">DictSource</span><span class="p">[</span><span class="nc">PersonId</span><span class="p">,</span> <span class="nc">BFF</span><span class="p">]</span> <span class="p">{</span>
<span class="k">override</span> <span class="kd">val</span> <span class="n">name</span> <span class="o">=</span> <span class="s">"Best Friends Forever"</span>
<span class="k">override</span> <span class="kd">val</span> <span class="n">dict</span> <span class="o">=</span> <span class="nc">Map</span><span class="p">(</span><span class="mi">1</span> <span class="o">→</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">2</span> <span class="o">→</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">3</span> <span class="o">→</span> <span class="mi">1</span><span class="p">)</span>
<span class="p">}</span>
<span class="k">implicit</span> <span class="k">object</span> <span class="nc">FirstNameSource</span> <span class="k">extends</span> <span class="nc">DictSource</span><span class="p">[</span><span class="nc">PersonId</span><span class="p">,</span> <span class="nc">FirstName</span><span class="p">]</span> <span class="p">{</span>
<span class="k">override</span> <span class="kd">val</span> <span class="n">name</span> <span class="o">=</span> <span class="s">"First Name"</span>
<span class="k">override</span> <span class="kd">val</span> <span class="n">dict</span> <span class="o">=</span> <span class="nc">Map</span><span class="p">(</span><span class="mi">1</span> <span class="o">→</span> <span class="s">"Biff"</span><span class="p">,</span> <span class="mi">2</span> <span class="o">→</span> <span class="s">"Heinz"</span><span class="p">,</span> <span class="mi">3</span> <span class="o">→</span> <span class="s">"Tree"</span><span class="p">)</span>
<span class="p">}</span>
<span class="k">implicit</span> <span class="k">object</span> <span class="nc">LastNameSource</span> <span class="k">extends</span> <span class="nc">DictSource</span><span class="p">[</span><span class="nc">PersonId</span><span class="p">,</span> <span class="nc">LastName</span><span class="p">]</span> <span class="p">{</span>
<span class="k">override</span> <span class="kd">val</span> <span class="n">name</span> <span class="o">=</span> <span class="s">"Last Name"</span>
<span class="k">override</span> <span class="kd">val</span> <span class="n">dict</span> <span class="o">=</span> <span class="nc">Map</span><span class="p">(</span><span class="mi">1</span> <span class="o">→</span> <span class="s">"Loman"</span><span class="p">,</span> <span class="mi">2</span> <span class="o">→</span> <span class="s">"Doofenshmirtz"</span><span class="p">,</span> <span class="mi">3</span> <span class="o">→</span> <span class="s">"Trunks"</span><span class="p">)</span>
<span class="p">}</span>
</code></pre></div>
<p>At this point, something actually called "Fetch" enters the picture,
via the following methods that will produce
<code>Fetch</code> instances that use the defined data sources, given a particular
concurrency construct <code>F</code>.</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">getBFF</span><span class="p">[</span><span class="nc">F</span><span class="p">[</span><span class="n">_</span><span class="p">]:</span> <span class="nc">ConcurrentEffect</span> <span class="p">:</span> <span class="nc">Par</span><span class="p">](</span><span class="n">id</span><span class="p">:</span> <span class="nc">PersonId</span><span class="p">):</span> <span class="nc">Fetch</span><span class="p">[</span><span class="nc">F</span><span class="p">,</span><span class="nc">BFF</span><span class="p">]</span> <span class="o">=</span>
<span class="nc">Fetch</span><span class="p">(</span><span class="n">id</span><span class="p">,</span> <span class="nc">BFFSource</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">getFirstName</span><span class="p">[</span><span class="nc">F</span><span class="p">[</span><span class="n">_</span><span class="p">]:</span> <span class="nc">ConcurrentEffect</span> <span class="p">:</span> <span class="nc">Par</span><span class="p">](</span><span class="n">id</span><span class="p">:</span> <span class="nc">PersonId</span><span class="p">)</span> <span class="o">=</span> <span class="p">[</span><span class="nc">F</span><span class="p">,</span> <span class="nc">FirstName</span><span class="p">]</span> <span class="o">=</span>
<span class="nc">Fetch</span><span class="p">(</span><span class="n">id</span><span class="p">,</span> <span class="nc">FirstNameSource</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">getLastName</span><span class="p">[</span><span class="nc">F</span><span class="p">[</span><span class="n">_</span><span class="p">]:</span> <span class="nc">ConcurrentEffect</span> <span class="p">:</span> <span class="nc">Par</span><span class="p">](</span><span class="n">id</span><span class="p">:</span> <span class="nc">PersonId</span><span class="p">):</span> <span class="nc">Fetch</span><span class="p">[</span><span class="nc">F</span><span class="p">,</span> <span class="nc">LastName</span><span class="p">]</span> <span class="o">=</span>
<span class="nc">Fetch</span><span class="p">(</span><span class="n">id</span><span class="p">,</span> <span class="nc">LastNameSource</span><span class="p">)</span>
</code></pre></div>
<p>Three concerns are nicely separated and can be implemented independently:</p>
<ol>
<li>The <code>DataSource</code>, which in real life would actually go to some database and
retrieve information, singly or in batches.</li>
<li>The <code>ConcurrentEffect</code>/<code>Par</code> construct for running the data retrieval routines.</li>
<li>The <code>Fetch</code> logic to gather up queries to be run in batches.</li>
</ol>
<p>This decoupling comes at the cost of quite a bit of type complexity, however.
In an earlier
version of the library, <code>Fetch</code> had one type parameter - specifying the type of thing
being fetched, and, since the properties of <code>Fetch</code> did not depend on the <code>F</code>
we didn't need to carry around the implicit guarantees that <code>F</code> has <code>ConcurrentEffect</code>
and <code>Par</code> instances. (Presumably that ugliness will disappear with implicit function
types in Scala 3.)</p>
<p>Now we'll assemble a query to get the full name of the BFF of one person:</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">getBFF</span><span class="p">[</span><span class="nc">F</span><span class="p">[</span><span class="n">_</span><span class="p">]:</span> <span class="nc">ConcurrentEffect</span> <span class="p">:</span> <span class="nc">Par</span><span class="p">](</span><span class="n">id</span><span class="p">:</span> <span class="nc">PersonId</span><span class="p">):</span> <span class="nc">Fetch</span><span class="p">[</span><span class="nc">F</span><span class="p">,</span> <span class="nc">String</span><span class="p">]</span> <span class="o">=</span> <span class="k">for</span> <span class="p">{</span>
<span class="n">bff</span> <span class="o">←</span> <span class="n">getBFF</span><span class="p">(</span><span class="n">id</span><span class="p">)</span>
<span class="n">fn</span> <span class="o">←</span> <span class="n">getFirstName</span><span class="p">(</span><span class="n">bff</span><span class="p">)</span>
<span class="n">ln</span> <span class="o">←</span> <span class="n">getLastName</span><span class="p">(</span><span class="n">bff</span><span class="p">)</span>
<span class="p">}</span> <span class="k">yield</span> <span class="s">s"</span><span class="si">$</span><span class="n">fn</span><span class="s"> </span><span class="si">$</span><span class="n">ln</span><span class="s">"</span>
</code></pre></div>
<p>and another that traverse this to get three BFFs,</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">getBFFs</span><span class="p">[</span><span class="nc">F</span><span class="p">[</span><span class="n">_</span><span class="p">]:</span> <span class="nc">ConcurrentEffect</span> <span class="p">:</span> <span class="nc">Par</span><span class="p">]:</span> <span class="nc">Fetch</span><span class="p">[</span><span class="nc">F</span><span class="p">,</span><span class="nc">List</span><span class="p">[</span><span class="nc">String</span><span class="p">]]</span> <span class="o">=</span> <span class="nc">List</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">).</span><span class="n">traverse</span><span class="p">(</span><span class="n">getBFF</span><span class="p">[</span><span class="nc">F</span><span class="p">])</span>
</code></pre></div>
<p>run the query,</p>
<div class="highlight"><pre><span></span><code> <span class="n">println</span><span class="p">(</span><span class="nc">Fetch</span><span class="p">.</span><span class="n">run</span><span class="p">[</span><span class="nc">IO</span><span class="p">](</span><span class="n">getBFFs</span><span class="p">).</span><span class="n">unsafeRunSync</span><span class="p">())</span>
</code></pre></div>
<p>and see what kind of parallelism we get, as indicated by the <code>println</code>s in
the contrived <code>logReturn</code> method:</p>
<p>First try:</p>
<pre>
<mark style="color: blue">Requesting [tid=15] 3 x BFF</mark>
<mark style="color: red">Receiving [tid=15] 3 x BFF</mark>
<mark style="color: blue">Requesting [tid=13] 3 x Last Name</mark>
<mark style="color: red">Receiving [tid=13] 3 x Last Name</mark>
<mark style="color: blue">Requesting [tid=14] 3 x First Name</mark>
<mark style="color: red">Receiving [tid=14] 3 x First Name</mark>
List(Heinz Doofenshmirtz, Tree Trunks, Biff Loman)
</pre>
<p>This is about as expected - though it is a bit sad that none of the
BFFships are mutual. While we logically queried for 9 items - fully
interleaved as requests for BFF, LastName, FirstName, BFF, LastName, FirstName, etc. -
our queries ran in only 3 batches, with like queries grouped together.
Also, as expected, no advantage was taken of the fact that the two name
queries were theoretically independent.</p>
<p>If we manually parallelize the two independent name queries:</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">getBFF2</span><span class="p">[</span><span class="nc">F</span><span class="p">[</span><span class="n">_</span><span class="p">]:</span> <span class="nc">ConcurrentEffect</span> <span class="p">:</span> <span class="nc">Par</span><span class="p">](</span><span class="n">id</span><span class="p">:</span> <span class="nc">PersonId</span><span class="p">):</span> <span class="nc">Fetch</span><span class="p">[</span><span class="nc">F</span><span class="p">,</span> <span class="nc">String</span><span class="p">]</span> <span class="o">=</span> <span class="k">for</span> <span class="p">{</span>
<span class="n">bff</span> <span class="o">←</span> <span class="n">getBFF</span><span class="p">(</span><span class="n">id</span><span class="p">)</span>
<span class="p">(</span><span class="n">fn</span><span class="p">,</span><span class="n">ln</span><span class="p">)</span> <span class="o">←</span> <span class="p">(</span><span class="n">getFirstName</span><span class="p">(</span><span class="n">bff</span><span class="p">),</span> <span class="n">getLastName</span><span class="p">(</span><span class="n">bff</span><span class="p">)).</span><span class="n">tupled</span>
<span class="p">}</span> <span class="k">yield</span> <span class="s">s"</span><span class="si">$</span><span class="n">fn</span><span class="s"> </span><span class="si">$</span><span class="n">ln</span><span class="s">"</span>
</code></pre></div>
<p>we see a subtly different execution:</p>
<pre>
<mark style="color: blue">Requesting [tid=15] 3 x BFF</mark>
<mark style="color: red">Receiving [tid=15] 3 x BFF</mark>
<mark style="color: blue">Requesting [tid=13] 3 x Last Name</mark>
<mark style="color: blue">Requesting [tid=14] 3 x First Name</mark>
<mark style="color: red">Receiving [tid=13] 3 x Last Name</mark>
<mark style="color: red">Receiving [tid=14] 3 x First Name</mark>
List(Heinz Doofenshmirtz, Tree Trunks, Biff Loman)
</pre>
<p>Note how the grouping has changed. Instead of fully interleaved Requesting/Receiving
messages, both first and last name requests go out simultaneously. We get the
desired reduction from 9 batches down to two.</p>
<h2>But I don't want to do anything manually</h2>
<p><img src="./images/lazycat.jpg" width=400></p>
<p>Why should important people like us be forced to transform our code manually?
Less frivolously, since the whole purpose libraries like <code>Haxl</code> and <code>Fetch</code> is
to detect operations that could occur concurrently and batch them automatically,
it's annoying to have to keep track of the limits to their detection abilities.</p>
<p><img src="./images/apply-now.jpg" width=400></p>
<p>It would be cool if Scala's <code>for</code> could do something similar to Haskell's <code>do</code>,
detecting when the monads are in fact parallelizable and independent, and
tupling them together.</p>
<p>One reason this is difficult to accomplish in Scala is that at the time that
<code>for</code> desugaring occurs, the compiler has not yet performed type resolution,
so it has no way of knowing whether the rhs of the left-arrow has applicative
powers: <code>for</code> comprehensions simply get textually converted into nested <code>flatMap</code> and <code>map</code>
calls, e.g.</p>
<div class="highlight"><pre><span></span><code> <span class="k">for</span><span class="p">(</span><span class="n">i</span> <span class="o"><-</span> <span class="nc">Some</span><span class="p">(</span><span class="mi">1</span><span class="p">);</span> <span class="n">j</span> <span class="o"><-</span> <span class="nc">Some</span><span class="p">(</span><span class="mi">2</span><span class="p">);</span> <span class="n">k</span> <span class="o"><-</span> <span class="nc">Some</span><span class="p">(</span><span class="mi">3</span><span class="p">))</span> <span class="k">yield</span> <span class="n">i</span><span class="o">+</span><span class="n">j</span><span class="o">+</span><span class="n">k</span><span class="err">'</span>
</code></pre></div>
<p>becomes</p>
<div class="highlight"><pre><span></span><code> <span class="nc">Some</span><span class="p">(</span><span class="mi">1</span><span class="p">).</span><span class="n">flatMap</span><span class="p">(((</span><span class="n">i</span><span class="p">)</span> <span class="o">=></span> <span class="nc">Some</span><span class="p">(</span><span class="mi">2</span><span class="p">).</span><span class="n">flatMap</span><span class="p">(((</span><span class="n">j</span><span class="p">)</span> <span class="o">=></span> <span class="nc">Some</span><span class="p">(</span><span class="mi">3</span><span class="p">).</span><span class="n">map</span><span class="p">(((</span><span class="n">k</span><span class="p">)</span> <span class="o">=></span> <span class="n">i</span><span class="p">.</span><span class="n">$plus</span><span class="p">(</span><span class="n">j</span><span class="p">).</span><span class="n">$plus</span><span class="p">(</span><span class="n">k</span><span class="p">)))))))</span>
</code></pre></div>
<p>irrespective of the abilities of the <code>Option</code> class. In fact, we could stick in
arbitrary monad poseurs, and the parser couldn't care less.</p>
<div class="highlight"><pre><span></span><code> <span class="k">for</span><span class="p">(</span><span class="n">i</span> <span class="o"><-</span> <span class="nc">Harry</span><span class="p">(</span><span class="mi">1</span><span class="p">);</span> <span class="n">j</span> <span class="o"><-</span> <span class="nc">Dick</span><span class="p">(</span><span class="mi">2</span><span class="p">);</span> <span class="n">k</span> <span class="o"><-</span> <span class="nc">Jane</span><span class="p">(</span><span class="mi">3</span><span class="p">))</span> <span class="k">yield</span> <span class="n">i</span><span class="o">+</span><span class="n">j</span><span class="o">+</span><span class="n">k</span><span class="err">'</span>
<span class="nc">Harry</span><span class="p">(</span><span class="mi">1</span><span class="p">).</span><span class="n">flatMap</span><span class="p">(((</span><span class="n">i</span><span class="p">)</span> <span class="o">=></span> <span class="nc">Dick</span><span class="p">(</span><span class="mi">2</span><span class="p">).</span><span class="n">flatMap</span><span class="p">(((</span><span class="n">j</span><span class="p">)</span> <span class="o">=></span> <span class="nc">Jane</span><span class="p">(</span><span class="mi">3</span><span class="p">).</span><span class="n">map</span><span class="p">(((</span><span class="n">k</span><span class="p">)</span> <span class="o">=></span> <span class="n">i</span><span class="p">.</span><span class="n">$plus</span><span class="p">(</span><span class="n">j</span><span class="p">).</span><span class="n">$plus</span><span class="p">(</span><span class="n">k</span><span class="p">)))))))</span>
</code></pre></div>
<p>Obviously the compiler will eventually complain, just not yet.</p>
<p>If we want to get messy with compiler internals and fetch <code>(j,k)</code> at once, we're going
to have to do so <em>after</em> the typer has run, which means we'll be working with with
code that has already been converted into nested maps.</p>
<p>Writing a macro to do this will be awkward,
but not impossible, especially if set our sights relatively low.
For example, I'll assume the simplest possible <code>for</code> constructs - no guards, no <code>=</code>
assignments. Additionally, the grouping will only work for independent terms that happen
to be adjacent; there will be no re-ordering of the code.</p>
<p>Also, I'm going to keep testing to an absolute minimum, guaranteeing that the macro
will work under practically no other circumstances, as is fitting, given that
the whole purpose of this work is self-abuse.</p>
<h2>Under the Macroscope</h2>
<p>Scala macros are fun and easy, in the sense that it's incredibly easy to make
obscure mistakes and fun to laugh at yourself afterwards for making them.
Also, they're difficult to debug, so when you do manage to fix a problem,
you feel very proud of yourself.</p>
<p>The goal is to be able to write</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">getBFF</span><span class="p">[</span><span class="nc">F</span><span class="p">[</span><span class="n">_</span><span class="p">]:</span> <span class="nc">ConcurrentEffect</span> <span class="p">:</span> <span class="nc">Par</span><span class="p">](</span><span class="n">id</span><span class="p">:</span> <span class="nc">PersonId</span><span class="p">):</span> <span class="nc">Fetch</span><span class="p">[</span><span class="nc">F</span><span class="p">,</span> <span class="nc">String</span><span class="p">]</span> <span class="o">=</span>
<span class="nc">LiftFetchTuples</span> <span class="p">{</span>
<span class="k">for</span> <span class="p">{</span>
<span class="n">bff</span> <span class="o">←</span> <span class="n">getBFF</span><span class="p">(</span><span class="n">id</span><span class="p">)</span>
<span class="n">fn</span> <span class="o">←</span> <span class="n">getFirstName</span><span class="p">(</span><span class="n">bff</span><span class="p">)</span>
<span class="n">ln</span> <span class="o">←</span> <span class="n">getLastName</span><span class="p">(</span><span class="n">bff</span><span class="p">)</span>
<span class="p">}</span> <span class="k">yield</span> <span class="s">s"</span><span class="si">$</span><span class="n">fn</span><span class="s"> </span><span class="si">$</span><span class="n">ln</span><span class="s">"</span>
<span class="p">}</span>
</code></pre></div>
<p>where <code>LiftFetchTuples.apply</code> is actually a macro that teases apart the map-nest
passed to it as an argument, and converts it into the version with
<code>(fn,ln) ← (getFirstName(bff), getLastName(bff)).tupled</code> in it.</p>
<p>I'm going to omit the most boring details (only the <em>most</em> boring details) ,
so if you want those, take a look at the
<a href="https://github.com/pnf/scala-playground2/blob/master/applicator/src/main/scala/applicator/LiftTuples.scala">actual code</a>
on github.</p>
<p>We start with the standard macro boilerplate,</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">apply</span><span class="p">[</span><span class="nc">M</span><span class="p">[</span><span class="n">_</span><span class="p">],</span> <span class="nc">T</span><span class="p">](</span><span class="n">expr</span><span class="p">:</span> <span class="nc">Fetch</span><span class="p">[</span><span class="nc">M</span><span class="p">,</span><span class="nc">T</span><span class="p">])</span>
<span class="p">(</span><span class="k">implicit</span> <span class="n">semigroupal</span><span class="p">:</span> <span class="nc">Semigroupal</span><span class="p">[</span><span class="nc">Fetch</span><span class="p">[</span><span class="nc">M</span><span class="p">,</span> <span class="o">?</span><span class="p">]],</span>
<span class="n">invariant</span><span class="p">:</span> <span class="nc">Invariant</span><span class="p">[</span><span class="nc">Fetch</span><span class="p">[</span><span class="nc">M</span><span class="p">,</span><span class="o">?</span><span class="p">]]):</span> <span class="nc">Fetch</span><span class="p">[</span><span class="nc">M</span><span class="p">,</span><span class="nc">T</span><span class="p">]</span> <span class="o">=</span>
<span class="n">macro</span> <span class="n">liftTuplesImpl</span><span class="p">[</span><span class="nc">M</span><span class="p">,</span> <span class="nc">T</span><span class="p">]</span>
</code></pre></div>
<p>made more complicated than usual by the presence of implicit parameters. We want
a guarantee before the macro is even invoked that
the <code>Fetch</code> being passed to us is amenable to fancy tupling, which
essentially means that there's a <code>Semigroupal</code> typeclass for it. You might wonder why
we need guarantees beyond knowing that the expression is in fact a <code>Fetch</code>: It turns
out that <code>Fetch</code> isn't <em>always</em> semigroupal, just as it doesn't always have
a <code>ConcurrentEffect</code> - the implicit conversions that make it so
depend on implicits of the specific type parameters.</p>
<p>As is the custom in these parts, heavy lifting occurs in a method ending in <code>Impl</code>.
It's not required to have a silly ending that rhymes with "pimple," but if you give your
macro a nice name, then other people will be jealous.</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">liftTuplesImpl</span><span class="p">[</span><span class="nc">M</span><span class="p">[</span><span class="n">_</span><span class="p">],</span> <span class="nc">T</span><span class="p">](</span><span class="n">c</span><span class="p">:</span> <span class="nc">Context</span><span class="p">)(</span><span class="n">expr</span><span class="p">:</span> <span class="n">c</span><span class="p">.</span><span class="nc">Tree</span><span class="p">)(</span> <span class="n">semigroupal</span><span class="p">:</span> <span class="n">c</span><span class="p">.</span><span class="nc">Tree</span><span class="p">,</span> <span class="n">invariant</span><span class="p">:</span> <span class="n">c</span><span class="p">.</span><span class="nc">Tree</span><span class="p">)</span> <span class="o">=</span> <span class="o">???</span>
</code></pre></div>
<p>Notice that the keyword <code>implicit</code> no longer shows up. The evidence expressions are passed
to us as Abstract Syntax <code>Tree</code>s, just like any other argument.</p>
<h2>Overlapping recursion</h2>
<p><img src="./images/pile-of-cats.jpg" width=400></p>
<p>The expression tree we're passed represents nested application of map and flatMap.
Aggressively cleaned up, the abstract syntax tree will look like</p>
<div class="highlight"><pre><span></span><code> <span class="nc">Apply</span><span class="p">(</span><span class="nc">TypeApply</span><span class="p">(</span><span class="nc">Select</span><span class="p">(</span><span class="n">vx</span><span class="p">,</span> <span class="n">flatMapName</span><span class="p">,</span> <span class="n">typeParams</span><span class="p">),</span>
<span class="nc">Function</span><span class="p">(</span><span class="n">args</span><span class="p">,</span> <span class="nc">PossiblyNestedMap</span><span class="p">))</span>
</code></pre></div>
<p>where</p>
<ul>
<li><code>Apply(fun, args)</code> represents <code>fun(args)</code></li>
<li><code>TypeApply(fun, tparams)</code> represents <code>fun[tparams]</code></li>
<li><code>Function(arg :: Nil, body)</code> represents <code>arg => body</code></li>
<li><code>PossiblyNestedMap</code> represents the entire expression, recursively.</li>
</ul>
<p>The first 3 are predefined members of <code>Trees</code>. We'll have to write an extractor
for the last.</p>
<p>The <a href="https://docs.scala-lang.org/overviews/quasiquotes/syntax-summary.html">quasiquote</a>
string context allows one to express the AST more like the code itself, but it has some
limitations, but I couldn't figure out how to use it for recursive extraction. I
was able to use it for tree construction in places, as you'll see below.</p>
<p>Here's a full-blown extractor that looks for such patterns, which I'll annotate
line by line. The <code>unapply</code> method
matches application of <code>vx.flatMap[T]</code></p>
<div class="highlight"><pre><span></span><code> <span class="k">object</span> <span class="nc">PossiblyNestedMap</span> <span class="p">{</span>
<span class="k">def</span> <span class="nf">unapply</span><span class="p">(</span><span class="n">tree</span><span class="p">:</span> <span class="nc">Tree</span><span class="p">):</span> <span class="nc">Option</span><span class="p">[</span><span class="nc">Tree</span><span class="p">]</span> <span class="o">=</span> <span class="n">tree</span> <span class="k">match</span> <span class="p">{</span>
<span class="k">case</span> <span class="nc">Apply</span><span class="p">(</span><span class="nc">TypeApply</span><span class="p">(</span><span class="nc">Select</span><span class="p">(</span><span class="n">vx</span><span class="p">,</span> <span class="n">flatMapName</span><span class="p">),</span> <span class="n">outerMethTypeParam</span><span class="p">),</span>
</code></pre></div>
<p>to a closure</p>
<div class="highlight"><pre><span></span><code> <span class="nc">Function</span><span class="p">(</span><span class="n">vArgDef</span> <span class="o">::</span> <span class="nc">Nil</span><span class="p">,</span>
</code></pre></div>
<p>whose body is another <code>PosiblyNestedMap</code> - that is, <code>unapply</code> will be called
recursively on the body,</p>
<div class="highlight"><pre><span></span><code> <span class="nc">PossiblyNestedMap</span><span class="p">(</span>
</code></pre></div>
<p>and we pick apart whatever got returned from that recursive call
into the inner <code>wx.mapOrFlatMap[T]</code>,</p>
<div class="highlight"><pre><span></span><code> <span class="nc">Apply</span><span class="p">(</span><span class="nc">TypeApply</span><span class="p">(</span><span class="nc">Select</span><span class="p">(</span><span class="n">wx</span><span class="p">,</span> <span class="n">innerMeth</span><span class="p">),</span> <span class="n">innerMethTypeParam</span><span class="p">),</span>
</code></pre></div>
<p>applied to an inner closure whose <code>expr</code> body we for now don't care about</p>
<div class="highlight"><pre><span></span><code> <span class="nc">Function</span><span class="p">(</span><span class="n">wArgDef</span> <span class="o">::</span> <span class="nc">Nil</span><span class="p">,</span> <span class="n">expr</span><span class="p">)</span> <span class="o">::</span> <span class="nc">Nil</span><span class="p">)))</span> <span class="o">::</span> <span class="nc">Nil</span><span class="p">)</span>
</code></pre></div>
<p>but may itself have once been a <code>PossiblyNestedMap</code> - meaning our recursive call to <code>unapply</code>
would have called itself again. Note that the <code>innerMeth</code> may be a <code>map</code> or a <code>flatMap</code>
depending on where we are in the <code>for</code> expression.</p>
<p>Our basic plan is to re-express this as one <code>(vx,wx).tupled.flatMap( ... )</code>
and return the applicative expression from the extractor. Assuming we've
written the extractor, the main job of our macro will be to transform the tree recursively,
replacing every instance
of an appropriate <code>PossiblyNestedMap</code> with its applicative equivalent.</p>
<div class="highlight"><pre><span></span><code> <span class="k">class</span> <span class="nc">TupleLiftingTransformer</span> <span class="k">extends</span> <span class="nc">Transformer</span> <span class="p">{</span>
<span class="k">override</span> <span class="k">def</span> <span class="nf">transform</span><span class="p">(</span><span class="n">tree</span><span class="p">:</span> <span class="nc">Tree</span><span class="p">):</span> <span class="nc">Tree</span> <span class="o">=</span> <span class="n">tree</span> <span class="k">match</span> <span class="p">{</span>
<span class="k">case</span> <span class="nc">PossiblyNestedMap</span><span class="p">(</span><span class="n">xformed</span><span class="p">)</span> <span class="o">⇒</span> <span class="n">xformed</span>
<span class="k">case</span> <span class="n">_</span> <span class="o">⇒</span> <span class="bp">super</span><span class="p">.</span><span class="n">transform</span><span class="p">(</span><span class="n">tree</span><span class="p">)</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">(</span><span class="k">new</span> <span class="nc">TupleLiftingTransformer</span><span class="p">).</span><span class="n">transform</span><span class="p">(</span><span class="n">expr</span><span class="p">)</span>
</code></pre></div>
<p>In our extractor, by the way, there'll be a second, more boring case, for an non-nested
map application. We still need to transform it recursively in case there's something
interesting lurking in sub-expressions, but that's all we have to do. Finally, there's
the most common case, where no map of any sort is found.</p>
<div class="highlight"><pre><span></span><code> <span class="k">case</span> <span class="nc">Apply</span><span class="p">(</span><span class="nc">TypeApply</span><span class="p">(</span><span class="nc">Select</span><span class="p">(</span><span class="n">wm</span><span class="p">,</span> <span class="n">comb</span><span class="p">),</span> <span class="n">_</span><span class="p">),</span> <span class="nc">Function</span><span class="p">(</span><span class="n">_</span> <span class="o">::</span> <span class="nc">Nil</span><span class="p">,</span> <span class="n">_</span><span class="p">)</span> <span class="o">::</span> <span class="nc">Nil</span><span class="p">)</span> <span class="o">⇒</span>
<span class="nc">Some</span><span class="p">(</span><span class="bp">super</span><span class="p">.</span><span class="n">transform</span><span class="p">(</span><span class="n">tree</span><span class="p">))</span>
<span class="k">case</span> <span class="n">_</span> <span class="o">=></span> <span class="nc">None</span>
</code></pre></div>
<h2>Cats have claws</h2>
<p><img src="./images/cat-claw.jpg"></p>
<p>You might expect that <code>vx</code> and <code>wx</code> would represent expressions of type <code>Fetch</code>,
but you would be wrong. You see, <code>Fetch</code> doesn't actually have a <code>flatMap</code>
method, but can be converted implicitly into a <code>FlatMap.Ops</code>, which does.
If we examine <code>vx</code> in the debugger, we might (well, I did) see something like this</p>
<div class="highlight"><pre><span></span><code> <span class="n">cats</span><span class="p">.</span><span class="n">syntax</span><span class="p">.</span><span class="n">`package`</span><span class="p">.</span><span class="n">all</span><span class="p">.</span><span class="n">toFlatMapOps</span><span class="p">[[</span><span class="nc">A</span><span class="p">]</span><span class="nc">Fetch</span><span class="p">[</span><span class="nc">F</span><span class="p">,</span><span class="nc">A</span><span class="p">],</span> <span class="nc">Author</span><span class="p">]</span>
<span class="p">(...</span> <span class="n">our</span> <span class="n">getWhatever</span> <span class="n">call</span> <span class="p">...)</span>
<span class="p">(</span><span class="n">fetch</span><span class="p">.</span><span class="n">`package`</span><span class="p">.</span><span class="n">fetchM</span><span class="p">[</span><span class="nc">F</span><span class="p">](</span><span class="n">evidence$47</span><span class="p">))</span>
</code></pre></div>
<p>The implicit <code>toFlatMapOps</code> method requires evidence that its argument is in the <code>FlatMap</code>
typeclass. <code>FlatMap</code> is implemented by <code>Monad</code>, an instance of which the implicit
<code>fetchM</code> method can provide, as long as it has evidence that the <code>F</code> type parameter
is itself a <code>Monad</code>, which was in turn provided by the implicit parameter of
<code>LiftFetchTuples.apply</code>, to which scala assigned the symbol <code>evidence$47</code>.
So there's a hell of a lot of implicit evidence being called forth so that <code>Fetch</code>
can avoid implementing its own <code>flatMap</code>.</p>
<p>Our immediate problem is to retrieve the actual <code>Fetch</code> from the <code>FlatMap.Ops</code>
it was converted to.
Fortunately, these <code>Ops</code> classes all have a <code>.self</code> method that returns the
original object. So we'll assign</p>
<div class="highlight"><pre><span></span><code> <span class="kd">val</span> <span class="n">vm</span> <span class="o">=</span> <span class="n">q</span><span class="s">"$vx.self"</span>
<span class="kd">val</span> <span class="n">wm</span> <span class="o">=</span> <span class="n">q</span><span class="s">"$wx.self"</span>
</code></pre></div>
<p>here using the convenient quasiquote string context.</p>
<p>Having extracted our <code>get</code> calls, we now need to make sure that they're independent;
i.e. the argument to <code>vm</code>'s closure should not be used anywhere within <code>wm</code>. This
is as easy as searching for the <code>v</code> symbol in the <code>w</code> closure. If we find the
symbol, we give up, printout out an informative message and carrying on with
remaining transformation of the tree:</p>
<div class="highlight"><pre><span></span><code> <span class="k">if</span><span class="p">(</span><span class="n">wm</span><span class="p">.</span><span class="n">find</span><span class="p">(</span><span class="n">vArgDef</span><span class="p">.</span><span class="n">symbol</span> <span class="o">==</span> <span class="n">_</span><span class="p">.</span><span class="n">symbol</span><span class="p">).</span><span class="n">isDefined</span><span class="p">)</span>
<span class="n">c</span><span class="p">.</span><span class="n">info</span><span class="p">(</span><span class="n">vUsed</span><span class="p">.</span><span class="n">get</span><span class="p">.</span><span class="n">pos</span><span class="p">,</span> <span class="s">s"Not lifting, because </span><span class="si">${</span><span class="n">vValDef</span><span class="p">.</span><span class="n">symbol</span><span class="si">}</span><span class="s"> is used on rhs"</span><span class="p">,</span> <span class="kc">true</span><span class="p">)</span>
<span class="nc">Some</span><span class="p">(</span><span class="bp">super</span><span class="p">.</span><span class="n">transform</span><span class="p">(</span><span class="n">tree</span><span class="p">))</span>
<span class="p">}</span>
</code></pre></div>
<p>If we pass this test, it should be fine to create a new tupled <code>Fetch</code>. Since
<code>.tupled</code> is just going to trigger another implicit search, it's cleaner to
call the <code>tuple2</code> function it will end up calling, explicitly passing in its implicit
evidence, which we have from the arguments of our macro:</p>
<div class="highlight"><pre><span></span><code> <span class="kd">val</span> <span class="n">newQual</span> <span class="o">=</span> <span class="n">q</span><span class="s">"_root_.cats.Semigroupal.tuple2($vm, $wm)($semigroupal, $invariant)"</span>
</code></pre></div>
<p>When we <code>flatMap</code> over this new monad, it's closure will take a <code>Tuple2[V,W]</code>,
where <code>V</code> and <code>W</code> were the types taken by the two original closures, Here, we extract
the types of the original closure arguments, create the new tuple type, and synthesize
a new argument with a guaranteed unique name:</p>
<div class="highlight"><pre><span></span><code> <span class="kd">val</span> <span class="n">vt</span> <span class="o">=</span> <span class="n">oldClosure</span><span class="p">.</span><span class="n">vparams</span><span class="p">.</span><span class="n">head</span><span class="p">.</span><span class="n">tpt</span><span class="p">.</span><span class="n">tpe</span>
<span class="kd">val</span> <span class="n">wt</span> <span class="o">=</span> <span class="n">oldInnerClosure</span><span class="p">.</span><span class="n">vparams</span><span class="p">.</span><span class="n">head</span><span class="p">.</span><span class="n">tpt</span><span class="p">.</span><span class="n">tpe</span>
<span class="kd">val</span> <span class="n">tupleOfVWtt</span><span class="p">:</span> <span class="nc">Tree</span> <span class="o">=</span> <span class="n">tq</span><span class="s">"($vt,$wt)"</span>
<span class="kd">val</span> <span class="n">vwArgName</span> <span class="o">=</span> <span class="n">internal</span><span class="p">.</span><span class="n">reificationSupport</span><span class="p">.</span><span class="n">freshTermName</span><span class="p">(</span><span class="s">"x$"</span><span class="p">)</span>
<span class="kd">val</span> <span class="n">vwArgDef</span> <span class="o">=</span> <span class="nc">ValDef</span><span class="p">(</span><span class="nc">Modifiers</span><span class="p">(</span><span class="nc">Flag</span><span class="p">.</span><span class="nc">SYNTHETIC</span> <span class="o">|</span> <span class="nc">Flag</span><span class="p">.</span><span class="nc">PARAM</span><span class="p">),</span> <span class="n">vwArgName</span><span class="p">,</span> <span class="n">tupleOfVWtt</span><span class="p">,</span> <span class="nc">EmptyTree</span><span class="p">)</span>
</code></pre></div>
<p>The body of the new closure still refers to the old <code>v</code> and <code>w</code>, which we must
populate with values from the tuple.</p>
<div class="highlight"><pre><span></span><code> <span class="kd">val</span> <span class="n">newBody</span> <span class="o">=</span> <span class="nc">Block</span><span class="p">(</span>
<span class="n">c</span><span class="p">.</span><span class="n">internal</span><span class="p">.</span><span class="n">valDef</span><span class="p">(</span><span class="n">vValDef</span><span class="p">.</span><span class="n">symbol</span><span class="p">,</span> <span class="n">q</span><span class="s">"$vwArgName._1"</span><span class="p">),</span>
<span class="n">c</span><span class="p">.</span><span class="n">internal</span><span class="p">.</span><span class="n">valDef</span><span class="p">(</span><span class="n">wValDef</span><span class="p">.</span><span class="n">symbol</span><span class="p">,</span> <span class="n">q</span><span class="s">"$vwArgName._2"</span><span class="p">),</span>
<span class="n">transform</span><span class="p">(</span><span class="n">body</span><span class="p">)</span>
<span class="kd">val</span> <span class="n">newClosure</span> <span class="o">=</span> <span class="nc">Function</span><span class="p">(</span><span class="n">vwArgDef</span> <span class="o">::</span> <span class="nc">Nil</span><span class="p">,</span> <span class="n">newBody</span><span class="p">)</span>
</code></pre></div>
<p>Altogether, the new closure we build is going to look a bit like</p>
<div class="highlight"><pre><span></span><code> <span class="n">x$123</span><span class="p">:</span> <span class="p">(</span><span class="nc">V</span><span class="p">,</span> <span class="nc">W</span><span class="p">)</span> <span class="o">=></span> <span class="p">{</span>
<span class="kd">val</span> <span class="n">v</span> <span class="o">=</span> <span class="n">x$123</span><span class="p">.</span><span class="n">_1</span>
<span class="kd">val</span> <span class="n">w</span> <span class="o">=</span> <span class="n">x$123</span><span class="p">.</span><span class="n">_2</span>
<span class="n">originalBodyOfYieldExpression</span><span class="p">(</span><span class="n">v</span><span class="p">,</span> <span class="n">w</span><span class="p">)</span> <span class="c1">// more or less</span>
<span class="p">}</span>
</code></pre></div>
<p>Finally putting together the full map/flatMap over the new tupled <code>Fetch</code> with the new
closure, (This is one of the places where quasiquotes annoyingly didn't work.)</p>
<div class="highlight"><pre><span></span><code> <span class="kd">val</span> <span class="n">ret</span> <span class="o">=</span> <span class="nc">Apply</span><span class="p">(</span><span class="nc">TypeApply</span><span class="p">(</span><span class="nc">Select</span><span class="p">(</span><span class="n">newQual</span><span class="p">,</span> <span class="n">innerMeth</span><span class="p">),</span> <span class="n">innerMethTypeParam</span><span class="p">),</span> <span class="n">newClosure</span> <span class="o">::</span> <span class="nc">Nil</span><span class="p">)</span>
</code></pre></div>
<h2>OK, not finally</h2>
<p>We're not really done. First we have to typecheck our new expression,</p>
<div class="highlight"><pre><span></span><code> <span class="kd">val</span> <span class="n">rett</span> <span class="o">=</span> <span class="n">c</span><span class="p">.</span><span class="n">typecheck</span><span class="p">(</span><span class="n">ret</span><span class="p">)</span>
</code></pre></div>
<p>and even after we do that, there's still a bit of a mess to clean up. One big pitfall
of scala macros is that, in addition to the implicit "ownership" of AST elements that are
components of other AST elements, every symbol we use has to be explicitly owned by another
symbol. For example, all <code>val</code>s declared directly in a function body have symbols
whose <code>.owner</code> is the symbol of the containing <code>Function</code>. The tree of symbol ownership
must exactly parallel the tree of AST elements. Even the slightest mistake will lead
to obscure errors during the delambdafy phase, much later in the compilation</p>
<p>Symbol ownership gets assigned
automatically for us during initial typing, but when we muck up the tree as I've just
done, correct ownership must sometimes be established manually. In this case, we want
to make sure
1. that the new closure has the same owner as the old outer closure, and
2. that the new <code>v</code> and <code>w</code> vals belong to the new closure, instead of
the old outer and inner closures respectively.</p>
<p>To change the owner of the new closure, we need first to get its symbol; as that
symbol was assigned just now when we typechecked, we need to extract it from the
typed expression:</p>
<div class="highlight"><pre><span></span><code> <span class="kd">val</span> <span class="nc">Apply</span><span class="p">(</span><span class="n">_</span><span class="p">,</span> <span class="n">newClosureTyped</span> <span class="o">::</span> <span class="nc">Nil</span><span class="p">)</span> <span class="o">=</span> <span class="n">rett</span>
<span class="n">c</span><span class="p">.</span><span class="n">internal</span><span class="p">.</span><span class="n">setOwner</span><span class="p">(</span><span class="n">newClosureTyped</span><span class="p">.</span><span class="n">symbol</span><span class="p">,</span> <span class="n">oldClosure</span><span class="p">.</span><span class="n">symbol</span><span class="p">.</span><span class="n">owner</span><span class="p">)</span>
</code></pre></div>
<p>Now we ensure that the the two old parameters belong to the new closure,</p>
<div class="highlight"><pre><span></span><code> <span class="n">c</span><span class="p">.</span><span class="n">internal</span><span class="p">.</span><span class="n">changeOwner</span><span class="p">(</span><span class="n">rett</span><span class="p">,</span> <span class="n">vValDef</span><span class="p">.</span><span class="n">symbol</span><span class="p">.</span><span class="n">owner</span><span class="p">,</span> <span class="n">newClosureTyped</span><span class="p">.</span><span class="n">symbol</span><span class="p">)</span>
<span class="n">c</span><span class="p">.</span><span class="n">internal</span><span class="p">.</span><span class="n">changeOwner</span><span class="p">(</span><span class="n">rett</span><span class="p">,</span> <span class="n">wValDef</span><span class="p">.</span><span class="n">symbol</span><span class="p">.</span><span class="n">owner</span><span class="p">,</span> <span class="n">newClosureTyped</span><span class="p">.</span><span class="n">symbol</span><span class="p">)</span>
<span class="n">c</span><span class="p">.</span><span class="n">info</span><span class="p">(</span><span class="n">tree</span><span class="p">.</span><span class="n">pos</span><span class="p">,</span> <span class="s">s"Lifting to </span><span class="si">$</span><span class="n">rett</span><span class="s">"</span><span class="p">,</span> <span class="kc">true</span><span class="p">)</span>
<span class="nc">Some</span><span class="p">(</span><span class="n">rett</span><span class="p">)</span>
</code></pre></div>
<h1>It's alive!</h1>
<p><img src="./images/young-frankenstein.jpg" width=550></p>
<p>The <code>LiftFetchTuples</code>'d code compiles, with a few new messages that the macro generated:</p>
<pre>
[info] /Users/pnf/dev/scala-playground/core/src/main/scala/fetchy/Fetchy.scala:202:25: Not lifting, because value bff is used on rhs
[info] fn ← getFirstName(bff)
[info] ^
[info] /Users/pnf/dev/scala-playground/core/src/main/scala/fetchy/Fetchy.scala:202:10: Lifting to cats.syntax.`package`.all.toFunctorOps[[A]fetch.Fetch[F,A], (fetchy.TestBFF.FirstName, fetchy.TestBFF.LastName)](cats.Semigroupal.tuple2[[A]fetch.Fetch[F,A], fetchy.TestBFF.FirstName, fetchy.TestBFF.LastName](cats.syntax.`package`.all.toFlatMapOps[[A]fetch.Fetch[F,A], fetchy.TestBFF.FirstName](TestBFF.this.getFirstName[F](bff)(evidence$54, evidence$55))(fetch.`package`.fetchM[F](evidence$54)).self, cats.syntax.`package`.all.toFunctorOps[[A]fetch.Fetch[F,A], fetchy.TestBFF.LastName](TestBFF.this.getLastName[F](bff)(evidence$54, evidence$55))(fetch.`package`.fetchM[F](evidence$54)).self)(fetch.`package`.fetchM[F](evidence$54), fetch.`package`.fetchM[F](evidence$54)))(fetch.`package`.fetchM[F](evidence$54)).map[String](((x$13: (fetchy.TestBFF.FirstName, fetchy.TestBFF.LastName)) => {
[info] val fn: fetchy.TestBFF.FirstName = x$13._1;
[info] val ln: fetchy.TestBFF.LastName = x$13._2;
[info] scala.StringContext.apply("", " ", "").s(fn, ln)
[info] }))
[info] fn ← getFirstName(bff)
</pre>
<p>The "Not lifting" message is straightforward: we need to know <code>bff</code> before we can
call <code>getFirstName</code>, so <code>bff</code> and <code>fn</code> cannot be tupled.</p>
<p>What we can initially tell from the second message and the daunting scroll bar
is that we apparently generated a
huge amount of code. The full insanity is more easily appreciated by adding
some line-feeds and shortening a few FQCN prefixes. One thing that's clear
immediately is that the vast majority of the code was actually inserted by
scala implicit searches:</p>
<div class="highlight"><pre><span></span><code><span class="n">toFunctorOps</span><span class="p">[[</span><span class="nc">A</span><span class="p">]</span><span class="nc">Fetch</span><span class="p">[</span><span class="nc">F</span><span class="p">,</span><span class="nc">A</span><span class="p">],</span> <span class="p">(</span><span class="nc">FirstName</span><span class="p">,</span> <span class="nc">LastName</span><span class="p">)]</span>
<span class="p">(</span>
<span class="nc">Semigroupal</span><span class="p">.</span><span class="n">tuple2</span><span class="p">[[</span><span class="nc">A</span><span class="p">]</span><span class="nc">Fetch</span><span class="p">[</span><span class="nc">F</span><span class="p">,</span><span class="nc">A</span><span class="p">],</span> <span class="nc">FirstName</span><span class="p">,</span> <span class="nc">LastName</span><span class="p">]</span> <span class="c1">// <-- LOOK HERE SECOND</span>
<span class="p">(</span>
<span class="n">toFlatMapOps</span><span class="p">[[</span><span class="nc">A</span><span class="p">]</span><span class="nc">Fetch</span><span class="p">[</span><span class="nc">F</span><span class="p">,</span><span class="nc">A</span><span class="p">],</span> <span class="nc">FirstName</span><span class="p">]</span>
<span class="p">(</span>
<span class="n">getFirstName</span><span class="p">[</span><span class="nc">F</span><span class="p">](</span><span class="n">bff</span><span class="p">)(</span><span class="n">ev$54</span><span class="p">,</span> <span class="n">ev$55</span><span class="p">)</span> <span class="c1">// <-- LOOK HERE FIRST</span>
<span class="p">)</span>
<span class="p">(</span><span class="n">fetchM</span><span class="p">[</span><span class="nc">F</span><span class="p">](</span><span class="n">ev$54</span><span class="p">)).</span><span class="n">self</span><span class="p">,</span>
<span class="n">toFunctorOps</span><span class="p">[[</span><span class="nc">A</span><span class="p">]</span><span class="nc">Fetch</span><span class="p">[</span><span class="nc">F</span><span class="p">,</span><span class="nc">A</span><span class="p">],</span> <span class="nc">LastName</span><span class="p">]</span>
<span class="p">(</span>
<span class="n">getLastName</span><span class="p">[</span><span class="nc">F</span><span class="p">](</span><span class="n">bff</span><span class="p">)(</span><span class="n">ev$54</span><span class="p">,</span> <span class="n">ev$55</span><span class="p">)</span>
<span class="p">)</span>
<span class="p">(</span><span class="n">fetchM</span><span class="p">[</span><span class="nc">F</span><span class="p">](</span><span class="n">ev$54</span><span class="p">)).</span><span class="n">self</span>
<span class="p">)</span>
<span class="p">(</span>
<span class="n">fetchM</span><span class="p">[</span><span class="nc">F</span><span class="p">](</span><span class="n">ev$54</span><span class="p">),</span> <span class="n">fetchM</span><span class="p">[</span><span class="nc">F</span><span class="p">](</span><span class="n">ev$54</span><span class="p">)</span>
<span class="p">)</span>
<span class="p">)</span>
<span class="p">(</span>
<span class="n">fetch</span><span class="p">.</span><span class="n">`package`</span><span class="p">.</span><span class="n">fetchM</span><span class="p">[</span><span class="nc">F</span><span class="p">](</span><span class="n">ev$54</span><span class="p">)</span>
<span class="p">)</span>
<span class="p">.</span><span class="n">map</span><span class="p">[</span><span class="nc">String</span><span class="p">](((</span><span class="n">x$13</span><span class="p">:</span> <span class="p">(</span><span class="nc">FirstName</span><span class="p">,</span> <span class="nc">LastName</span><span class="p">))</span> <span class="o">=></span> <span class="p">{</span>
<span class="kd">val</span> <span class="n">fn</span><span class="p">:</span> <span class="nc">FirstName</span> <span class="o">=</span> <span class="n">x$13</span><span class="p">.</span><span class="n">_1</span><span class="p">;</span>
<span class="kd">val</span> <span class="n">ln</span><span class="p">:</span> <span class="nc">LastName</span> <span class="o">=</span> <span class="n">x$13</span><span class="p">.</span><span class="n">_2</span><span class="p">;</span>
<span class="n">scala</span><span class="p">.</span><span class="nc">StringContext</span><span class="p">.</span><span class="n">apply</span><span class="p">(</span><span class="s">""</span><span class="p">,</span> <span class="s">" "</span><span class="p">,</span> <span class="s">""</span><span class="p">).</span><span class="n">s</span><span class="p">(</span><span class="n">fn</span><span class="p">,</span> <span class="n">ln</span><span class="p">)</span>
<span class="p">}))</span>
</code></pre></div>
<p>At the <code>LOOK HERE FIRST</code> comment, you can see the original
<code>getFirstName[F](bff)(ev$54, ev$55)</code>, with its <code>ConcurrentEffect</code> and <code>Par</code>
evidence.
It was implicitly wrapped in a call to <code>toFlatMapOps</code>, which was
implicitly provided with monadic evidence <code>(fetchM[F](ev$54))</code>, which
itself relied on the <code>ConcurrentEffect</code> evidence. Now we undo that
wrapping with <code>.self</code>.</p>
<p>At <code>LOOK HERE SECOND</code>, you see the two <code>Fetch</code> queries passed to
<code>Semigroupal.tuple2</code>, the result of which must itself be wrapped
<code>toFunctorOps</code> in order to call <code>.map</code> on it.</p>
<p>At the bottom, you see the new combined closure, which takes a tuple
and extracts the original <code>fn</code> and <code>ln</code> to use in the string expression.</p>
<p>But the really exciting part is that, when we run it, we see the
same batch grouping as if we had performed the tupling manually!</p>
<pre>
<mark style="color: blue">Requesting [tid=15] 3 x BFF</mark>
<mark style="color: red">Receiving [tid=15] 3 x BFF</mark>
<mark style="color: blue">Requesting [tid=13] 3 x Last Name</mark>
<mark style="color: blue">Requesting [tid=14] 3 x First Name</mark>
<mark style="color: red">Receiving [tid=13] 3 x Last Name</mark>
<mark style="color: red">Receiving [tid=14] 3 x First Name</mark>
List(Heinz Doofenshmirtz, Tree Trunks, Biff Loman)
</pre>
<h1>What have we learned today?</h1>
<p><img src="./images/ub-iwerks-teacher.png"></p>
<p>If you've made it this far, I hope you've learned that not all self-deprecation
should be taken ironically. I warned you that this post constituted self-abuse,
and that it would be stupidly obscure. Maybe you learned something about Fetch
(having filtered out by ignorant misrepresentations), or maybe you are
now more than ever convinced that you either do or do not want to write scala macros -
self-knowledge that can be useful in life and work.</p>
<p>I haven't learned anything. I never learn. I did, however come away with a
bit of a distaste for writing Haskell-style programs in Scala.</p>
<p>The way Cats implements typeclasses through implicit evidence and conversions
makes sense - or can be made sense of - but it seems really complicated. When
encountering a <code>.map</code>, the first assumption of many programmers will be that this
method was implemented in the class or some trait of its target. By this point,
we're all used to the standard typeclass pattern, where a <code>Foo</code> is wrapped with
an <code>implicit class SpecialFooOps(foo: Foo)</code>, but the process can get
byzantine when, say <code>Foo</code> has a type parameter, <code>Foo[T]</code>, and SpecialFooOps can only
be instantiated with implicit evidence of <code>Bar[T]</code>, and the implicit <code>def</code> providing
that evidence in turn relies on some further evidence of a <code>Wiz[T]</code>.</p>
<p>This would be less of a problem if IDEs were capable of following a
long chain of implicit dependencies, but, at this point at least,
IntelliJ is not, and the only reliable way I found to untangle
the implicitness was to set break points in the debugger, then to paste
the observed values into an editor and format them. What's worse,
IntelliJ is as likely to fail to locate an implicit parameter -
hurling jagged red lighting down upon screens of innocent source -
as to smilingly approve the worst delinquencies, abandoning you to the
tender mercies of scalac typer errors.</p>
<h2>And for what?</h2>
<p><img src="./images/trois-mousquetires.jpg" width=400></p>
<p>These typing complications are, unfortunately, most evident when <code>for</code> constructs
are involved. For example, the reason I wrote,</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">getBFFs</span><span class="p">[</span><span class="nc">F</span><span class="p">[</span><span class="n">_</span><span class="p">]:</span> <span class="nc">ConcurrentEffect</span> <span class="p">:</span> <span class="nc">Par</span><span class="p">]</span> <span class="o">=</span> <span class="nc">List</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">).</span><span class="n">traverse</span><span class="p">(</span><span class="n">getBFF</span><span class="p">[</span><span class="nc">F</span><span class="p">])</span>
</code></pre></div>
<p>breaking the <code>for</code> construct into its own method,
was not actually to emphasize composability (which I guess it sort of does),
but so I wouldn't have to look at this:</p>
<p><img src="./images/getBFFsInRed.png" width=400></p>
<p>Clearly, one can't expect an IDE to intuit the arbitrary manipulations of my macro,
but this example is practically straight out of Fetch documentation.</p>
<p>I maintain that the element most central to broad feasibility of what
Martin Odersky calls "categorical programming" is the <code>for</code> or <code>do</code> construct,
by which an involution of closures presents itself in procedural guise.
Beyond dispute, when the illusion works, the code looks beautiful.
It is also claimed that, when the beautiful code compiles, it is probably correct,
because otherwise typechecking would have failed. I'm dubious of that point
but prepared for argument's sake to accept its truth given a sufficiently clever
typechecker. However, with an insufficiently clever typechecker - one whose
inferencing abilities require frequent nudges, one that disagrees with its multiple
personalities across the tool-chain - you are forced to mentally re-ravel the tidy stack
of left-arrows into the underlying mess of combinators to begin to guess what went wrong.</p>
<p>Of course I enjoy that sort of thing. Or anyway I find it conveniently distracting sometimes.</p>
<p>I close with my 2nd or 3rd favorite quotation from Oscar Wilde:</p>
<blockquote>
<p>Jack. Is that clever?</p>
<p>Algernon. It is perfectly phrased! and quite as true as any observation in civilised life should be.</p>
<p>Jack. I am sick to death of cleverness. Everybody is clever nowadays. You can’t go anywhere without meeting clever people. The thing has become an absolute public nuisance. I wish to goodness we had a few fools left.</p>
</blockquote>
<div class="footnote">
<hr>
<ol>
<li id="fn:verbiage">
<p>A word that is frowned upon my employer as ostensibly meaningless.
It's not. I use it here in the sense of the manifestation of verbosity,
excessive use of many words were few - perhaps none - would have sufficed.] <a class="footnote-backref" href="#fnref:verbiage" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
</ol>
</div>Do you believe in magic?2018-09-16T00:00:00-04:002018-09-16T00:00:00-04:00Peter Fraenkeltag:blog.podsnap.com,2018-09-16:/rollo.html<p><img src="./images/believe-in-magic.jpg" width=500></p>
<p>Donald Knuth famously opined:</p>
<blockquote>
<p>“The real problem is that programmers have spent far too much time
worrying about efficiency in the wrong places and at the wrong
times; premature optimization is the root of all evil (or at least
most of it) in programming.”</p>
</blockquote>
<p>I broadly agree. It's incredibly difficult to guess which parts of
our code will turn out to be performance bottlenecks, and,
conversely, which parts will become development bottlenecks because of
the obscurity introduced by manual optimization. Eventually one
develops instincts for this sort of thing, though they may be of less
general applicability than we think …</p><p><img src="./images/believe-in-magic.jpg" width=500></p>
<p>Donald Knuth famously opined:</p>
<blockquote>
<p>“The real problem is that programmers have spent far too much time
worrying about efficiency in the wrong places and at the wrong
times; premature optimization is the root of all evil (or at least
most of it) in programming.”</p>
</blockquote>
<p>I broadly agree. It's incredibly difficult to guess which parts of
our code will turn out to be performance bottlenecks, and,
conversely, which parts will become development bottlenecks because of
the obscurity introduced by manual optimization. Eventually one
develops instincts for this sort of thing, though they may be of less
general applicability than we think, and they may be rapidly outdated
by evolution of the computational ecosystem.</p>
<p>There are many sad stories about optimization gone wrong, but this
story will be sadder still, because it will be optimization we didn't
deliberately perform, but implicitly accepted by choosing to
believe in magic.</p>
<h2>What we write about when we write about code</h2>
<p><img alt="The Team" src="./images/angrymop.png" width=400></p>
<p>It's even debatable whether what we write is actually "code" - in the
traditional sense of instructions given to a machine. We certainly
type a lot, but what we type gets passed through so many stages of
interpretation, compilation, scheduling, rearrangement, speculative
second-guessing by the time it comes to actually
shooting photons from a pixel, that it might be more accurate to say
that we're typing out suggestions to a team of magical animated brooms.</p>
<p>It would be nice to think that our helpers had read Knuth, but,
seriously? They're brooms. Moreover, if they don't do any
optimization at all, we'll get angry, because they're going to do
stupid things like taking forever to fill up a bucket because they
insist on checking array bounds after each added drop. Of course if
they over-optimize we'll <a href="https://www.youtube.com/watch?v=Rrm8usaH0sM">also get angry</a>.</p>
<h1>Let's make a hash of it</h1>
<p>Not all optimizations are unreasonable. It recently transpired that an
assiduous young programmer took the time to profile their code and discovered
a surprising hotspot. They had a class like this</p>
<div class="highlight"><pre><span></span><code> <span class="k">case</span> <span class="k">class</span> <span class="nc">Foo</span><span class="p">(</span><span class="n">r</span><span class="p">:</span> <span class="nc">Range</span><span class="p">,</span> <span class="n">thing</span><span class="p">:</span> <span class="nc">String</span><span class="p">)</span> <span class="p">{</span>
<span class="k">def</span> <span class="nf">bar</span> <span class="o">=</span> <span class="n">r</span><span class="p">.</span><span class="n">map</span> <span class="p">{</span><span class="n">i</span> <span class="o">=></span> <span class="s">s"Once there was a </span><span class="si">$</span><span class="n">thing</span><span class="s"> named </span><span class="si">$</span><span class="n">thing</span><span class="si">$</span><span class="n">i</span><span class="s">"</span><span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<p>perhaps exactly like this, but probably not. The main thing is that it's a case
class, so it has equality semantics, e.g.</p>
<div class="highlight"><pre><span></span><code> <span class="nc">Foo</span><span class="p">(</span><span class="mi">1</span> <span class="n">to</span> <span class="mi">5</span><span class="p">)</span> <span class="o">==</span> <span class="nc">Foo</span><span class="p">(</span><span class="mi">1</span> <span class="n">to</span> <span class="mi">5</span><span class="p">)</span>
</code></pre></div>
<p>Scala creates the <code>.equals</code> method for us, in a more or less obvious fashion,
requiring both <code>r</code> and <code>thing</code> to match. When you have equality, you also have
to have a compatible <code>.hashCode</code>, and scala defines one of these for us too,
as a combination of hashes of the two members. A good <code>.hashCode</code> is generally
expected to satisfy two criteria. First, there should be no detectable pattern
in the two codes for two distinct items are distributed modulo powers of 2, so
if we use these as keys in a hash table, our buckets will fill up evenly.
Second, they should be much cheaper to compute than full equality,
so we can quickly determine that two
items are different, or that a particular item is not present in a hash table.</p>
<p>So scala is going to have to compute the hash of the <code>Range</code> member, which ought
to be easy, as it's completely determined by its starting, ending and step values,
so we compute three cheap integer hashes and then mix them together (somehow - we'll
see how scala does it later).
But there's a rub. A scala <code>Range</code> is also a <code>Seq</code>, like <code>List</code> and <code>Vector</code> and
so on, and <code>Seq</code>s with like elements are expected to compare and hash equal. So
we can't "cheat" and compute the hashcode of a <code>Range</code> using only the three
integers that describe it; we have to actually elaborate all the numbers in the
<code>Range</code>. While it's incredibly cheap to create a range <code>(0 to Int.MaxValue)</code>, it's
incredibly expensive to calculate that range's <code>.hashCode</code>.</p>
<p>(Minor point: in scala, we'll usually use the built-in <code>.##</code> instead of the
java <code>.hashCode</code> method, to comport with scala equality conventions, such
as equality of numbers when represented either as <code>Int</code> or <code>Long</code>.)</p>
<p>What our diligently profiling developer discovered is that their program was spending
an unreasonable amount of time computing <code>Range</code> hashes.</p>
<h2>In which I have a clever idea</h2>
<p><img alt="Clever Little Piglet" src="./images/piglet.png" width=500></p>
<p>So, methought, what if we computed the <code>Range</code> hash the cheap way
using just 3 numbers, and for the fully elaborated <code>Seq</code>s, iterate
over them as usual, mixing in hashes for every element, but we check
to see if the difference between elements is constant, i.e. if they
could have been expressed as ranges. If the difference is not
constant, we'll use the full all-element hash; if it is constant,
we'll throw out the full hash and compute the cheap one. The existing
code in the scala collections library looks approximately like:</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">hash</span><span class="p">(</span><span class="n">a</span><span class="p">:</span> <span class="nc">Array</span><span class="p">[</span><span class="nc">Int</span><span class="p">])</span> <span class="o">=</span> <span class="p">{</span>
<span class="kd">var</span> <span class="n">h</span> <span class="o">=</span> <span class="n">seed</span>
<span class="kd">var</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span>
<span class="k">while</span><span class="p">(</span><span class="n">i</span> <span class="o"><</span> <span class="n">a</span><span class="p">.</span><span class="n">length</span><span class="p">)</span> <span class="p">{</span>
<span class="n">h</span> <span class="o">=</span> <span class="n">mix</span><span class="p">(</span><span class="n">h</span><span class="p">,</span> <span class="n">a</span><span class="p">(</span><span class="n">i</span><span class="p">).</span><span class="n">##</span><span class="p">)</span>
<span class="n">i</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="p">}</span>
<span class="n">h</span>
<span class="p">}</span>
</code></pre></div>
<p>and, very roughly, assuming we've checked the length already, we'll change it
to:</p>
<div class="highlight"><pre><span></span><code> <span class="kd">var</span> <span class="n">h</span> <span class="o">=</span> <span class="n">mix</span><span class="p">(</span><span class="n">mix</span><span class="p">(</span><span class="n">seed</span><span class="p">,</span><span class="n">a</span><span class="p">(</span><span class="mi">0</span><span class="p">).</span><span class="n">##</span><span class="p">),</span> <span class="n">a</span><span class="p">(</span><span class="mi">1</span><span class="p">).</span><span class="n">##</span><span class="p">)</span>
<span class="kd">val</span> <span class="n">diff</span> <span class="o">=</span> <span class="n">a</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span> <span class="o">-</span> <span class="n">a</span><span class="p">(</span><span class="mi">0</span>
<span class="kd">var</span> <span class="n">valid</span> <span class="o">=</span> <span class="kc">true</span>
<span class="kd">var</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">2</span>
<span class="k">while</span><span class="p">(</span><span class="n">i</span> <span class="o"><</span> <span class="n">a</span><span class="p">.</span><span class="n">length</span><span class="p">)</span> <span class="p">{</span>
<span class="n">h</span> <span class="o">=</span> <span class="n">mix</span><span class="p">(</span><span class="n">h</span><span class="p">,</span> <span class="n">a</span><span class="p">(</span><span class="n">i</span><span class="p">).</span><span class="n">##</span><span class="p">)</span>
<span class="n">valid</span> <span class="o">=</span> <span class="n">valid</span> <span class="o">&&</span> <span class="p">(</span><span class="n">a</span><span class="p">(</span><span class="n">i</span><span class="p">)</span> <span class="o">-</span> <span class="n">a</span><span class="p">(</span><span class="n">i</span><span class="o">-</span><span class="mi">1</span><span class="p">))</span><span class="o">==</span><span class="n">diff</span>
<span class="p">}</span>
<span class="k">if</span><span class="p">(</span><span class="n">valid</span><span class="p">)</span> <span class="n">rangeHash</span><span class="p">(</span><span class="n">a</span><span class="p">(</span><span class="mi">0</span><span class="p">),</span><span class="n">diff</span><span class="p">,</span><span class="n">a</span><span class="p">(</span><span class="n">i</span><span class="o">-</span><span class="mi">1</span><span class="p">))</span>
<span class="k">else</span> <span class="n">h</span>
</code></pre></div>
<p>We're actually checking the difference in hash codes of the elements rather than the
difference in the elements themselves, as this generic method could end up
with elements that don't support subtraction.
The pathological case of enormous sequences that could have been ranges
if only they'd thought of it
will now be slightly more expensive, since we compute the hash of
three additional elements, but the relevance of that cost goes down
as 1/n, we hope.</p>
<p>Here's the actual code we'll be testing:</p>
<div class="highlight"><pre><span></span><code><span class="k">final</span> <span class="k">def</span> <span class="nf">arrayHash</span><span class="p">[</span><span class="nd">@specialized</span> <span class="nc">T</span><span class="p">](</span><span class="n">a</span><span class="p">:</span> <span class="nc">Array</span><span class="p">[</span><span class="nc">T</span><span class="p">],</span> <span class="n">i0</span><span class="p">:</span> <span class="nc">Int</span><span class="p">,</span> <span class="n">seed</span><span class="p">:</span> <span class="nc">Int</span><span class="p">):</span> <span class="nc">Int</span> <span class="o">=</span> <span class="p">{</span>
<span class="kd">var</span> <span class="n">h</span> <span class="o">=</span> <span class="n">seed</span>
<span class="kd">var</span> <span class="n">i</span> <span class="o">=</span> <span class="n">i0</span>
<span class="kd">val</span> <span class="n">l</span> <span class="o">=</span> <span class="n">a</span><span class="p">.</span><span class="n">length</span>
<span class="k">while</span> <span class="p">(</span><span class="n">i</span> <span class="o"><</span> <span class="n">l</span><span class="p">)</span> <span class="p">{</span>
<span class="kd">val</span> <span class="n">hash</span> <span class="o">=</span> <span class="n">a</span><span class="p">(</span><span class="n">i</span><span class="p">).</span><span class="o">##</span>
<span class="n">h</span> <span class="o">=</span> <span class="n">mix</span><span class="p">(</span><span class="n">h</span><span class="p">,</span> <span class="n">hash</span><span class="p">)</span>
<span class="n">i</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="p">}</span>
<span class="n">finalizeHash</span><span class="p">(</span><span class="n">h</span><span class="p">,</span> <span class="n">a</span><span class="p">.</span><span class="n">length</span><span class="p">)</span>
<span class="p">}</span>
<span class="k">final</span> <span class="k">def</span> <span class="nf">arrayHashRangeCompatible</span><span class="p">[</span><span class="nd">@specialized</span> <span class="nc">T</span><span class="p">](</span><span class="n">a</span><span class="p">:</span> <span class="nc">Array</span><span class="p">[</span><span class="nc">T</span><span class="p">],</span> <span class="n">seed</span><span class="p">:</span> <span class="nc">Int</span><span class="p">):</span> <span class="nc">Int</span> <span class="o">=</span> <span class="p">{</span>
<span class="kd">var</span> <span class="n">h</span> <span class="o">=</span> <span class="n">seed</span>
<span class="kd">val</span> <span class="n">l</span> <span class="o">=</span> <span class="n">a</span><span class="p">.</span><span class="n">length</span>
<span class="n">l</span> <span class="k">match</span> <span class="p">{</span>
<span class="k">case</span> <span class="mi">0</span> <span class="o">⇒</span>
<span class="n">finalizeHash</span><span class="p">(</span><span class="n">h</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span>
<span class="k">case</span> <span class="mi">1</span> <span class="o">⇒</span>
<span class="n">finalizeHash</span><span class="p">(</span><span class="n">mix</span><span class="p">(</span><span class="n">h</span><span class="p">,</span> <span class="n">a</span><span class="p">(</span><span class="mi">0</span><span class="p">).</span><span class="n">##</span><span class="p">),</span> <span class="mi">1</span><span class="p">)</span>
<span class="k">case</span> <span class="n">_</span> <span class="o">⇒</span>
<span class="kd">val</span> <span class="n">initial</span> <span class="o">=</span> <span class="n">a</span><span class="p">(</span><span class="mi">0</span><span class="p">).</span><span class="o">##</span>
<span class="n">h</span> <span class="o">=</span> <span class="n">mix</span><span class="p">(</span><span class="n">h</span><span class="p">,</span> <span class="n">initial</span><span class="p">)</span>
<span class="kd">val</span> <span class="n">h0</span> <span class="o">=</span> <span class="n">h</span>
<span class="kd">var</span> <span class="n">prev</span> <span class="o">=</span> <span class="n">a</span><span class="p">(</span><span class="mi">1</span><span class="p">).</span><span class="o">##</span>
<span class="kd">val</span> <span class="n">rangeDiff</span> <span class="o">=</span> <span class="n">prev</span> <span class="o">-</span> <span class="n">initial</span>
<span class="kd">var</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">2</span>
<span class="k">while</span> <span class="p">(</span><span class="n">i</span> <span class="o"><</span> <span class="n">l</span><span class="p">)</span> <span class="p">{</span>
<span class="n">h</span> <span class="o">=</span> <span class="n">mix</span><span class="p">(</span><span class="n">h</span><span class="p">,</span> <span class="n">prev</span><span class="p">)</span>
<span class="kd">val</span> <span class="n">hash</span> <span class="o">=</span> <span class="n">a</span><span class="p">(</span><span class="n">i</span><span class="p">).</span><span class="o">##</span>
<span class="k">if</span><span class="p">(</span><span class="n">rangeDiff</span> <span class="o">!=</span> <span class="n">hash</span> <span class="o">-</span> <span class="n">prev</span><span class="p">)</span>
<span class="k">return</span> <span class="n">oldArrayHash</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="p">,</span> <span class="n">mix</span><span class="p">(</span><span class="n">h</span><span class="p">,</span> <span class="n">hash</span><span class="p">))</span>
<span class="n">prev</span> <span class="o">=</span> <span class="n">hash</span>
<span class="n">i</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="p">}</span>
<span class="n">rangeHash</span><span class="p">(</span><span class="n">h0</span><span class="p">,</span> <span class="n">rangeDiff</span><span class="p">,</span> <span class="n">prev</span><span class="p">)</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">final</span> <span class="k">def</span> <span class="nf">rangeHash</span><span class="p">(</span><span class="n">h0</span><span class="p">:</span> <span class="nc">Int</span><span class="p">,</span> <span class="n">diff</span><span class="p">:</span> <span class="nc">Int</span><span class="p">,</span> <span class="n">end</span><span class="p">:</span> <span class="nc">Int</span><span class="p">,</span> <span class="n">seed</span><span class="p">:</span> <span class="nc">Int</span><span class="p">):</span> <span class="nc">Int</span> <span class="o">=</span>
<span class="n">finalizeHash</span><span class="p">(</span><span class="n">mix</span><span class="p">(</span><span class="n">mix</span><span class="p">(</span><span class="n">h0</span><span class="p">,</span> <span class="n">diff</span><span class="p">),</span> <span class="n">end</span><span class="p">),</span> <span class="n">seed</span><span class="p">)</span>
</code></pre></div>
<p>and these are the mysterious <code>mix</code> functions, straight from the scala 2.13 source:</p>
<div class="highlight"><pre><span></span><code> <span class="k">final</span> <span class="k">def</span> <span class="nf">mix</span><span class="p">(</span><span class="n">hash</span><span class="p">:</span> <span class="nc">Int</span><span class="p">,</span> <span class="n">data</span><span class="p">:</span> <span class="nc">Int</span><span class="p">):</span> <span class="nc">Int</span> <span class="o">=</span> <span class="p">{</span>
<span class="kd">var</span> <span class="n">h</span> <span class="o">=</span> <span class="n">mixLast</span><span class="p">(</span><span class="n">hash</span><span class="p">,</span> <span class="n">data</span><span class="p">)</span>
<span class="n">h</span> <span class="o">=</span> <span class="n">rotl</span><span class="p">(</span><span class="n">h</span><span class="p">,</span> <span class="mi">13</span><span class="p">)</span>
<span class="n">h</span> <span class="o">*</span> <span class="mi">5</span> <span class="o">+</span> <span class="mh">0xe6546b64</span>
<span class="p">}</span>
<span class="k">final</span> <span class="k">def</span> <span class="nf">mixLast</span><span class="p">(</span><span class="n">hash</span><span class="p">:</span> <span class="nc">Int</span><span class="p">,</span> <span class="n">data</span><span class="p">:</span> <span class="nc">Int</span><span class="p">):</span> <span class="nc">Int</span> <span class="o">=</span> <span class="p">{</span>
<span class="kd">var</span> <span class="n">k</span> <span class="o">=</span> <span class="n">data</span>
<span class="n">k</span> <span class="o">*=</span> <span class="mh">0xcc9e2d51</span>
<span class="n">k</span> <span class="o">=</span> <span class="n">rotl</span><span class="p">(</span><span class="n">k</span><span class="p">,</span> <span class="mi">15</span><span class="p">)</span>
<span class="n">k</span> <span class="o">*=</span> <span class="mh">0x1b873593</span>
<span class="n">hash</span> <span class="n">^</span> <span class="n">k</span>
<span class="p">}</span>
</code></pre></div>
<p>Some form of this range-compatible hash will probably <a href="https://github.com/scala/scala/pull/7212">make it into</a>
scala 2.13.</p>
<h2>Sweet, sweet benchmarking</h2>
<p>It's clear that it's going to be fast to calculate the hash code of an actual <code>Range</code>,
which will be O(3),
but it would be prudent to measure the O(n) effect of all the extra code
we added to the original array hash to keep track of whether it could have
been represented as a range.</p>
<p>Benchmarking the performance of java (or any JVM) code is notoriously difficult
(though not, as we'll see,notoriously enough). Java is optimized "just in
time" by JVMs that are clever enough to realize that the timing loop
you wrote doesn't actually do anything and should therefore be optimized away.
We need to play correspondingly clever tricks in order
to produce realistic estimates of the time our code will actually run in real life.
Fortunately, the jdk comes with a tool called <a href="http://openjdk.java.net/projects/code-tools/jmh/">jmh</a>
that does all that for us. It's a bit tedious to set up, but ultimately you
get some mvn fu and a bunch of annotations to use.
Actually, I didn't even have to set it up, because
the scala language developers have very responsibly included <code>jmh</code>-based performance tests
for critical code. The relevant piece of benchmark code for one of the
hash computations is roughly:</p>
<div class="highlight"><pre><span></span><code><span class="nd">@BenchmarkMode</span><span class="p">(</span><span class="nc">Array</span><span class="p">(</span><span class="nc">Mode</span><span class="p">.</span><span class="nc">AverageTime</span><span class="p">)</span>
<span class="nd">@Warmup</span><span class="p">(</span><span class="n">iterations</span> <span class="o">=</span> <span class="mi">10</span><span class="p">)</span>
<span class="nd">@Measurement</span><span class="p">(</span><span class="n">iterations</span> <span class="o">=</span> <span class="mi">10</span><span class="p">)</span>
<span class="nd">@OutputTimeUnit</span><span class="p">(</span><span class="nc">TimeUnit</span><span class="p">.</span><span class="nc">NANOSECONDS</span><span class="p">)</span>
<span class="p">...</span>
<span class="nd">@Benchmark</span> <span class="k">def</span> <span class="nf">A_arrayHashOrdered</span><span class="p">(</span><span class="n">bh</span><span class="p">:</span> <span class="nc">Blackhole</span><span class="p">):</span> <span class="nc">Unit</span> <span class="o">=</span> <span class="p">{</span>
<span class="kd">val</span> <span class="n">h</span> <span class="o">=</span> <span class="nc">MurmurHash3</span><span class="p">.</span><span class="n">arrayHashTestOrig</span><span class="p">(</span><span class="n">ordered</span><span class="p">,</span> <span class="nc">MurmurHash3</span><span class="p">.</span><span class="n">seqSeed</span><span class="p">)</span>
<span class="n">bh</span><span class="p">.</span><span class="n">consume</span><span class="p">(</span><span class="n">h</span><span class="p">)</span>
<span class="p">}</span>
</code></pre></div>
<p>When we run this (conveniently, by calling one of scala's <code>sbt</code> tasks),</p>
<div class="highlight"><pre><span></span><code>> bench/jmh:run scala.util.hashing.MurmurHash3Benchmark
</code></pre></div>
<p>something like 10 minutes goes by, and we eventually get a report</p>
<pre>
[info] MurmurHash3Benchmark.A3_arrayHashOrdered 10 avgt 20 <mark>17.796</mark> ± 0.274 ns/op
[info] MurmurHash3Benchmark.A3_arrayHashOrdered 100 avgt 20 167.554 ± 2.505 ns/op
[info] MurmurHash3Benchmark.A3_arrayHashOrdered 1000 avgt 20 1952.867 ± 12.756 ns/op
[info] MurmurHash3Benchmark.A3_arrayHashOrdered 10000 avgt 20 <mark style="color: red">19609.605</mark> ± 400.499 ns/op
[info] MurmurHash3Benchmark.B_rangeOptimizedArrayHashOrdered 10 avgt 20 <mark>18.925</mark> ± 0.244 ns/op
[info] MurmurHash3Benchmark.B_rangeOptimizedArrayHashOrdered 100 avgt 20 162.394 ± 1.271 ns/op
[info] MurmurHash3Benchmark.B_rangeOptimizedArrayHashOrdered 1000 avgt 20 1609.690 ± 16.071 ns/op
[info] MurmurHash3Benchmark.B_rangeOptimizedArrayHashOrdered 10000 avgt 20 <mark style="color: red">16111.728</mark> ± 155.433 ns/op
</pre>
<p>which at first glance looks reasonable - about <mark>19ns</mark> to compute the hash of an
array of length 10, up from <mark>18s</mark> with the original algorithm. But what the
hell is going on for the <mark style="color: red">longer arrays</mark>? It seems like adding extra work to the
algorithm actually made it 20% faster!</p>
<h2>News you can use!</h2>
<p>On the basis of this happy news, I resolved to add pointless extra
computation to all of my methods, starting with the array hash. I'll keep
track of the previous element's hash in <code>var dork</code>, and call <code>mix</code> on it
instead of <code>hash</code> in the cases where it happens to be the same as <code>hash</code>
anyway. We'll force the computer to decide which of two identical <code>mix</code>
invocations to make:</p>
<div class="highlight"><pre><span></span><code> <span class="k">final</span> <span class="k">def</span> <span class="nf">arrayHashTestStupid</span><span class="p">[</span><span class="nd">@specialized</span> <span class="nc">T</span><span class="p">](</span><span class="n">a</span><span class="p">:</span> <span class="nc">Array</span><span class="p">[</span><span class="nc">T</span><span class="p">],</span> <span class="n">seed</span><span class="p">:</span> <span class="nc">Int</span><span class="p">):</span> <span class="nc">Int</span> <span class="o">=</span> <span class="p">{</span>
<span class="kd">var</span> <span class="n">h</span> <span class="o">=</span> <span class="n">seed</span><span class="p">;</span> <span class="kd">var</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="kd">val</span> <span class="n">l</span> <span class="o">=</span> <span class="n">a</span><span class="p">.</span><span class="n">length</span>
<span class="kd">var</span> <span class="n">dork</span> <span class="o">=</span> <span class="n">seed</span>
<span class="k">while</span> <span class="p">(</span><span class="n">i</span> <span class="o"><</span> <span class="n">l</span><span class="p">)</span> <span class="p">{</span>
<span class="kd">val</span> <span class="n">hash</span> <span class="o">=</span> <span class="n">a</span><span class="p">(</span><span class="n">i</span><span class="p">).</span><span class="o">##</span>
<span class="k">if</span><span class="p">(</span><span class="n">dork</span> <span class="o">==</span> <span class="n">hash</span><span class="p">)</span> <span class="n">h</span> <span class="o">=</span> <span class="n">mix</span><span class="p">(</span><span class="n">h</span><span class="p">,</span> <span class="n">dork</span><span class="p">)</span>
<span class="k">else</span> <span class="n">h</span> <span class="o">=</span> <span class="n">mix</span><span class="p">(</span><span class="n">h</span><span class="p">,</span> <span class="n">hash</span><span class="p">)</span>
<span class="n">dork</span> <span class="o">=</span> <span class="n">hash</span>
<span class="n">i</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="p">}</span>
<span class="n">finalizeHash</span><span class="p">(</span><span class="n">h</span><span class="p">,</span> <span class="n">a</span><span class="p">.</span><span class="n">length</span><span class="p">)</span>
<span class="p">}</span>
</code></pre></div>
<p>And it seems to work quite well,</p>
<div class="highlight"><pre><span></span><code>info] Benchmark (size) Mode Cnt Score Error Units
[info] MurmurHash3Benchmark.A1_arrayHashOrderedStupid 10 avgt 20 18.778 ± 0.568 ns/op
[info] MurmurHash3Benchmark.A1_arrayHashOrderedStupid 100 avgt 20 163.833 ± 1.548 ns/op
[info] MurmurHash3Benchmark.A1_arrayHashOrderedStupid 1000 avgt 20 1626.775 ± 12.343 ns/op
[info] MurmurHash3Benchmark.A1_arrayHashOrderedStupid 10000 avgt 20 16261.636 ± 75.772 ns/op
</code></pre></div>
<p>a little worse for n=10 but consistently better for large arrays.</p>
<h1>Veterinary dentistry</h1>
<p><img src="./images/laughing_horse.png" width=400 alt="gift horse"></p>
<p>But still.</p>
<p>One is told not to look at such things, but it does seem that this gift horse has
something very unusual going on its mouth. Maybe we should look at its, uhm,
bite code.</p>
<p>There are various ways to show the byte code that scala (or javac) is generating.
If you're using IntelliJ, this is as easy as <strong>View/Show Byte Code</strong>. The sheer volume
is a little disorienting, but fortunately the byte code contains markers like
<code>LINENUMBER 137 L10</code>, which means that what follows is the code corresponding to
line 137 in the scala source, and the location in the byte code is now called <code>L10</code>.</p>
<p><em>[It turns out that, while the byte code makes sense, it doesn't explain the performance
anomaly,
so feel free to skip this section and go right to the next, even more tedious
section.]</em></p>
<p>So here's what we get for the bit of <code>arrayHash</code> starting with computation of the
hash of the <code>i</code>th element:</p>
<div class="highlight"><pre><span></span><code> L10
LINENUMBER 137 L10
INVOKEVIRTUAL scala/runtime/ScalaRunTime$.array_apply (Ljava/lang/Object;I)Ljava/lang/Object;
INVOKESTATIC scala/runtime/Statics.anyHash (Ljava/lang/Object;)I
L11
ISTORE 7
L12
</code></pre></div>
<p>Here's the call to <code>mix</code>,</p>
<div class="highlight"><pre><span></span><code> LINENUMBER 138 L12
ALOAD 0
ILOAD 3
ILOAD 7
INVOKEVIRTUAL scala/util/hashing/MurmurHash3.mix (II)I
ISTORE 3
</code></pre></div>
<p>and here's where we increment the index and jump back to the start of the loop.</p>
<div class="highlight"><pre><span></span><code> L13
LINENUMBER 139 L13
ILOAD 4
ICONST_1
IADD
ISTORE 4
L14
LINENUMBER 136 L14
GOTO L8
</code></pre></div>
<p>That seems reasonable enough. Now let's look at the corresponding snippet from
<code>arrayHashTestStupid</code>. It starts out about the same,</p>
<div class="highlight"><pre><span></span><code> INVOKEVIRTUAL scala/runtime/ScalaRunTime$.array_apply (Ljava/lang/Object;I)Ljava/lang/Object;
INVOKESTATIC scala/runtime/Statics.anyHash (Ljava/lang/Object;)I
L11
ISTORE 7
</code></pre></div>
<p>but here's where we compare <code>hash == dork</code> and jump to L13 if its not true:</p>
<div class="highlight"><pre><span></span><code>L12
LINENUMBER 151 L12
ILOAD 6
ILOAD 7
IF_ICMPNE L13
</code></pre></div>
<p>If we get here, they must have been equal, so we call <code>mix</code> on <code>dork</code>,</p>
<div class="highlight"><pre><span></span><code> LINENUMBER 152 L14
ALOAD 0
ILOAD 3
ILOAD 6
INVOKEVIRTUAL scala/util/hashing/MurmurHash3.mix (II)I
ISTORE 3
GOTO L15
</code></pre></div>
<p>otherwise, we jump here and call <code>mix</code> on <code>hash</code> directly:</p>
<div class="highlight"><pre><span></span><code> L13
LINENUMBER 154 L13
FRAME APPEND [I]
ALOAD 0
ILOAD 3
ILOAD 7
INVOKEVIRTUAL scala/util/hashing/MurmurHash3.mix (II)I
ISTORE 3
</code></pre></div>
<p>Irrespective of which path we took, here's where we store the
hash in <code>prev</code>, increment <code>i</code> and hop back to the start of the loop:</p>
<div class="highlight"><pre><span></span><code> L15
LINENUMBER 155 L15
FRAME SAME
ILOAD 7
ISTORE 6
L16
LINENUMBER 156 L16
ILOAD 4
ICONST_1
IADD
ISTORE 4
L17
LINENUMBER 149 L17
GOTO L8
</code></pre></div>
<p>This makes sense... but there's still nothing that explains why the vandalized
code should actually be faster.</p>
<h1>Optimal Hell</h1>
<p><img src="./images/bosco-hell.png" width="500" alt="Garden of Earthly Delights"/></p>
<p>Of course this isn't the end of the story. The Java JIT (just-in-time)
compiler will interpret the byte code at runtime and convert it into
actual machine instructions. While humans actually wrote the JVM,
it's customary to regard it as a magical artifact that transmutes
our high-level source into a perfect match for whatever CPU we happen
to be running on.</p>
<p>With a bit more work, we can look behind the curtain,
and disassemble the emergent machine code. Doing so requires a few java parameters,</p>
<div class="highlight"><pre><span></span><code> -XX:+UnlockDiagnosticVMOptions -XX:CompileCommand=print,*MurmurHash3.arrayHashTest*
</code></pre></div>
<p>which are googleable enough, but also a plugin called <code>hsdis</code>, not provided by default with
the jdk, to disassemble the machine code into human-readable form. The
<a href="https://wiki.openjdk.java.net/display/HotSpot/PrintAssembly">openjdk wiki</a>,
unhelpfully advises</p>
<blockquote>
<p>Once you succeed in building or downloading the hsdis binary library
(in the following named DLL), you have to install it next to your
libjvm.so (jvm.dll on Windows), in the same folder.</p>
</blockquote>
<p>Fortunately, the source code for the plugin is in the jdk source tree, but
building a plugin that worked on my macbook, with my version of java felt like
an interminable yak shave. The procedure, generally, is to obtain the source
java source,</p>
<div class="highlight"><pre><span></span><code>hg clone http://hg.openjdk.java.net/jdk8/jdk8
<span class="nb">cd</span> jdk8
bash ./get_source.sh
</code></pre></div>
<p>and, in the subdirectory for <code>hsdis</code>, untar the the source code for an appropriate
version of GNU <a href="https://www.gnu.org/software/binutils/">binutils</a>:</p>
<div class="highlight"><pre><span></span><code><span class="nb">cd</span> hotspot/src/share/tools/hsdis
wget http://ftp.gnu.org/gnu/binutils/binutils-2.28.1.tar.gz
tar -xf binutils-2.28.1.tar.gz
</code></pre></div>
<p>The trick is finding the precise version of binutils that happens to be
compatible with your version of your OS and with the version of the jdk.
Neither the latest version, 2.31, nor various versions recommended
on the internet over the past 5 years would ultimately compile, so 2.28
may or may not work for you. Once you've got all the source in place,
you run <code>make</code> in the <code>hsdis</code> directory, which will properly configure
and build binutils and incorporate its disassembler in a dynamic
library that you then need to install with the rest of java's libraries.
For me, the alphanumeric specifics were:</p>
<div class="highlight"><pre><span></span><code>make <span class="nv">BINUTILS</span><span class="o">=</span>binutils-2.28.1 all64
sudo cp -p build/macosx-amd64/hsdis-amd64.dylib /Library/Java/JavaVirtualMachines/jdk1.8.0_161.jdk/Contents/Home/jre/lib/server
</code></pre></div>
<p>Shockingly, once built and installed, this worked a treat. I added the
incantatory parameters to <code>javaOpts</code> in the <code>build.sbt</code>, sat back, and watched
my <code>tmux</code> buffer blur with yard after yard of lovely Intel assembly. High-speed
blur turned out to be an interesting way to parse the results. Here's a
side-by-side view, with the original <code>arrayHash</code> on the left and
<code>arrayHashStupid</code> on the right. I've made it too small to read, to make
the point that you can tell something is up just by the layout:</p>
<p><img src="./images/arrayHashBoth2.png" alt="blur" width=600></p>
<p>What's interesting is that the original code seems to result in long
vertical bands of assembly-looking stuff, with occasional burst
of comment text, while the stupid code produces a much more
homogeneous pattern of interspersed code and text. Let's zoom in
on one of those vertical bands,</p>
<pre>
[info] 0x0000000102ab5560: <mark>imul $0xcc9e2d51</mark>,0x10(%r14,%r13,4),%r11d
[info] 0x0000000102ab5569: movslq %r13d,%rax
[info] 0x0000000102ab556c: imul $0xcc9e2d51,0x2c(%r14,%rax,4),%r10d
[info] 0x0000000102ab5575: imul $0xcc9e2d51,0x24(%r14,%rax,4),%r9d
[info] 0x0000000102ab557e: imul $0xcc9e2d51,0x28(%r14,%rax,4),%r8d
[info] 0x0000000102ab5587: imul $0xcc9e2d51,0x20(%r14,%rax,4),%edi
[info] 0x0000000102ab5590: imul $0xcc9e2d51,0x14(%r14,%rax,4),%ecx
[info] 0x0000000102ab5599: imul $0xcc9e2d51,0x18(%r14,%rax,4),%edx
[info] 0x0000000102ab55a2: imul $0xcc9e2d51,0x1c(%r14,%rax,4),%ebp
</pre>
<p>and recall that <code>mixLast</code> contained the line:</p>
<div class="highlight"><pre><span></span><code> <span class="n">k</span> <span class="o">*=</span> <span class="mh">0xcc9e2d51</span>
</code></pre></div>
<p>and that <code>mixLast</code> is called once by <code>mix</code>, which is in turn called once per
iteration in <code>arrayHash</code>. If our instructions were being followed sequentially,
we'd see each of these multiplications interspersed with the bit rotation and
addition in <code>mix</code>, and the computation of each element's hash in <code>arrayHash</code>.
Instead, all the multiplications are occurring in a bunch. It appears that the JVM has</p>
<ol>
<li>inlined the calls from <code>arrayHash</code> to <code>mix</code> to <code>mixLast</code></li>
<li>unrolled the inner loop into chunks of 8 elements</li>
<li>rearranged the inlined code so that similar operations occur together</li>
</ol>
<p>Now let's look at the code following this burst of multiplications. We
see operations apparently corresponding to lines in <code>mix</code> and <code>mixLast</code>,
not applied either iteration-by-iteration or like-operation-by-operation
but scattered about:</p>
<pre>
[info] 0x0000000102ab55ab: <mark style="color: red">rol $0xf,%r11d</mark>
[info] 0x0000000102ab55af: <mark style="color: red">rol $0xf,%ebp</mark>
[info] 0x0000000102ab55b2: <mark style="color: green">imul $0x1b873593,%r11d,%r11d</mark>
[info] 0x0000000102ab55b9: xor %ebx,%r11d ;*ixor
[info] 0x0000000102ab55bc: <mark style="color: green">imul $0x1b873593,%ebp,%eax</mark>
[info] 0x0000000102ab55c2: rol $0xd,%r11d ;*ior ; - java.lang.Integer::rotateLeft@7 (line 1475)
[info] 0x0000000102ab55c6: <mark style="color: red">rol $0xf,%edx</mark>
[info] 0x0000000102ab55c9: <mark style="color: blue">mov %r11d,%ebx</mark>
[info] 0x0000000102ab55cc: <mark style="color: blue">shl $0x2,%ebx</mark>
[info] 0x0000000102ab55cf: <mark style="color: blue">add %r11d,%ebx</mark>
[info] 0x0000000102ab55d2: <mark style="color: green">imul $0x1b873593,%edx,%edx</mark>
[info] 0x0000000102ab55d8: add $0xe6546b64,%ebx
</pre>
<p>There are some <mark style="color: red">15-bit left rotations</mark>,
some <mark style="color: green">multiplication by 0x1b873593</mark>,
one case of the xor in <code>hash ^ k</code>, a
<mark style="color: blue">multiplication by 5</mark> and generally
a great scattering of snippets from <code>mix</code> and <code>mixLast</code>, applied
in some obscure order to various elements of the unrolled iteration
block.</p>
<p>Perhaps the specific arrangement has something to do
with assumptions about how my processor (i5-7360U) pipelines and
generally performs instruction-level parallelism, but based on
the benchmarks it appears
that <strong>the assumptions were wrong.</strong></p>
<p>The <code>arrayHashStupid</code> assembly, on the other hand, consists of repeated blocks
clearly applicable to a single iteration:
First the multiply, 15-bit rotate, multiply, xor sequence from
<code>mixLast</code>,</p>
<div class="highlight"><pre><span></span><code>[info] 0x0000000107170e54: imul $0xcc9e2d51,%r9d,%esi ;*imul
[info] 0x0000000107170e5b: rol $0xf,%esi
[info] 0x0000000107170e5e: imul $0x1b873593,%esi,%r10d
[info] 0x0000000107170e65: xor %ecx,%r10d ;*ixor
</code></pre></div>
<p>then the 13-bit rotate, sneaky multiplication by 5 and addition in <code>mix</code>,</p>
<div class="highlight"><pre><span></span><code>[info] 0x0000000107170e68: rol $0xd,%r10d ;*ior ; - java.lang.Integer::rotateLeft@7 (line 1475)
[info] 0x0000000107170e6c: mov %r10d,%ecx
[info] 0x0000000107170e6f: shl $0x2,%ecx
[info] 0x0000000107170e72: add %r10d,%ecx
[info] 0x0000000107170e75: add $0xe6546b64,%ecx ;*iadd
</code></pre></div>
<p>and finally the pointless conditional I added:</p>
<div class="highlight"><pre><span></span><code>[info] 0x0000000107170e7b: cmp %eax,%r9d
[info] 0x0000000107170e7e: je 0x0000000107170f78 ;*if_icmpne
</code></pre></div>
<p>There are 8 blocks like this, corresponding to the silly pair of
equivalent <code>mix</code> calls, unrolled into chunks of 4 iterations. Again,
we see unrolling and inlining, but this time we don't see major rearrangement of
the inlined code.</p>
<h2>Mystery solved</h2>
<p>So the mystery - why ruining the algorithm made it faster - has been solved.
We didn't so much make the code better as prevent the JVM from going to great
lengths to make it worse. Adding a little bit of complexity deterred
the JIT from introducing more of its own.</p>
<p>That is maddening.</p>
<h2>Mad as hell</h2>
<p><img src="./images/mad-as-hell.png" width=300></p>
<p>It seems like the JVM has an sick obsession with counter-effective loop-unrolling
strategies, but we don't have to take it anymore. Indeed, if we run with
<code>-XX:LoopUnrollLimit=0</code>, performance seems to be better all around.</p>
<div class="highlight"><pre><span></span><code>[info] MurmurHash3Benchmark.A_arrayHashOrdered 10 avgt 20 17.811 ± 0.252 ns/op
[info] MurmurHash3Benchmark.A_arrayHashOrdered 100 avgt 20 169.959 ± 7.789 ns/op
[info] MurmurHash3Benchmark.A_arrayHashOrdered 1000 avgt 20 1554.601 ± 8.282 ns/op
[info] MurmurHash3Benchmark.A_arrayHashOrdered 10000 avgt 20 16147.977 ± 416.489 ns/op
[info] MurmurHash3Benchmark.C_arrayHashOrderedStupid 10 avgt 20 18.444 ± 0.377 ns/op
[info] MurmurHash3Benchmark.C_arrayHashOrderedStupid 100 avgt 20 163.245 ± 3.010 ns/op
[info] MurmurHash3Benchmark.C_arrayHashOrderedStupid 1000 avgt 20 1566.859 ± 13.584 ns/op
[info] MurmurHash3Benchmark.C_arrayHashOrderedStupid 10000 avgt 20 15893.297 ± 66.690 ns/op
</code></pre></div>
<p>That's in some sense better, but very unsatisfying. If the JVM's loop unrolling is
generally useful, then shutting it off everywhere just to make this one routine
faster would be a mistake. And if the JVM's loop unrolling is <em>always</em> counter-
effective, then we don't want to call attention to the way we've been blindly
using it anyway for so long.</p>
<h2>Some puns are too stupid even for me</h2>
<p><img src="./images/graal-online.png" width=500></p>
<p><a href="https://www.graalvm.org/">Graal</a> <em>[The VM, not the game, right? Can you find a better
illustration? -ed]</em>
is worth learning about for many reasons, but
for present purposes, the most important reason is that it comes with a modern
rewrite of the JIT optimizer. The new JIT seems to be
unreasonably effective <a href="http://aleksandar-prokopec.com/resources/docs/graal-collections.pdf">for scala code</a>
(<a href="https://na.scaladays.org/schedule/twitters-quest-for-a-wholly-graal-runtime">twitter</a> now uses it
extensively in production),
and we might expect improvements generally, just due to renewed focus on an
aspect of the JVM that hasn't received much attention for years. The
downside of graal is that its name encourages people to make godawful puns that
they quickly regret.</p>
<p>You can download a full graal JRE compatible with Java 1.8 from <a href="https://github.com/oracle/graal/releases">github</a>.</p>
<p>Re-running the tests with the graal installation directory passed to <code>sbt -java-home</code>,
and no longer explicitly suppressing inlining, we get timings that are not only reasonable,
but better all around:</p>
<div class="highlight"><pre><span></span><code>[info] MurmurHash3Benchmark.A_arrayHashOrdered 10 avgt 20 15.299 ± 0.398 ns/op
[info] MurmurHash3Benchmark.A_arrayHashOrdered 100 avgt 20 158.088 ± 1.905 ns/op
[info] MurmurHash3Benchmark.A_arrayHashOrdered 1000 avgt 20 1592.159 ± 30.341 ns/op
[info] MurmurHash3Benchmark.A_arrayHashOrdered 10000 avgt 20 15584.329 ± 102.944 ns/op
[info] MurmurHash3Benchmark.C_arrayHashOrderedStupid 10 avgt 20 15.667 ± 0.202 ns/op
[info] MurmurHash3Benchmark.C_arrayHashOrderedStupid 100 avgt 20 159.971 ± 1.250 ns/op
[info] MurmurHash3Benchmark.C_arrayHashOrderedStupid 1000 avgt 20 1602.509 ± 32.796 ns/op
[info] MurmurHash3Benchmark.C_arrayHashOrderedStupid 10000 avgt 20 16031.193 ± 318.084 ns/op
</code></pre></div>
<p>As would be reasonably expected, the stupid version is slightly slower than the original,
though it is now (pleasantly) surprising how <em>very</em> slightly slower it is.
The generated assembly is also a lot less quirky, with two consecutive blocks like,</p>
<div class="highlight"><pre><span></span><code>[info] 0x000000010f481c1d: mov $0xcc9e2d51,%eax
[info] 0x000000010f481c22: imul 0x10(%rdx),%eax ; - scala.util.hashing.MurmurHash3::mixLast@3 (line 28)
[info] 0x000000010f481c26: rol $0xf,%eax ; - java.lang.Integer::rotateLeft@6 (line 1475)
[info] 0x000000010f481c29: imul $0x1b873593,%eax,%eax ; - scala.util.hashing.MurmurHash3::mixLast@17 (line 30)
[info] 0x000000010f481c2f: xor %eax,%ecx ; - scala.util.hashing.MurmurHash3::mixLast@21 (line 32)
[info] 0x000000010f481c31: rol $0xd,%ecx ; - java.lang.Integer::rotateLeft@6 (line 1475)
[info] 0x000000010f481c34: mov %ecx,%eax
[info] 0x000000010f481c36: shl $0x2,%eax
[info] 0x000000010f481c39: add %ecx,%eax ; - scala.util.hashing.MurmurHash3::mix@16 (line 19)
[info] 0x000000010f481c3b: lea -0x19ab949c(%rax),%eax ; - scala.util.hashing.MurmurHash3::mix@19 (line 19)
</code></pre></div>
<p>showing that the loop has been unrolled by 2, with no significant re-ordering of inlined
code within the iteration. The last line trickily uses the <code>lea</code> (load effective address)
instruction here just to add the 0xe6546b64, which happens to be
the negative of 0x19ab949c.</p>
<p>Note that graal's output is not only faster, but it is much closer to what you
might have imagined naively. Much closer, in fact, to a literal transcription
of the original algorithm in scala. <em>The optimal performance occurs near the
point of minimum optimization.</em></p>
<p>It's interesting to compare graal here with the output of Intel's latest optimizing compiler,
on a hand-coded <a href="https://gcc.godbolt.org/z/0oFPIU">C version</a> of <code>arrayHash</code>.
Like graal, icc unrolls the loop into two chunks,</p>
<div class="highlight"><pre><span></span><code> lea r10d, DWORD PTR [-234652+rdx+rdx*4] #14.20
imul eax, DWORD PTR [4+rdi+r9*4], -862048943 #5.5
shl eax, 15 #6.14
imul edx, eax, -51821 #7.5
xor r10d, edx #9.19
shl r10d, 13 #13.14
lea edx, DWORD PTR [-430675100+r10+r10*4] #14.20
</code></pre></div>
<p>(This disassembler likes to print in decimal, but the constants are
the same.) There is some additional trickery - like cramming the
multiplication by 5 into the <code>lea</code> call - but it's all in the category
of taking advantage of documented instructions to compress the code a
little. Once the technique has been explained, you can easily do it
yourself.</p>
<h1>Summary: The real magic turns out to be no magic</h1>
<p><img src="./images/miracle.jpg" width=400></p>
<p>Testing and benchmarking tell us whether our code behaved acceptably
under a particular set of circumstances, but they don't tell us why,
or whether the behavior will be robust to other circumstances. As
developers, we are continually obliged to acknowledge the limits of our
understanding. We can reason thoroughly about the code we write,
and perhaps adequately about the code we read, but there will be
some layers, beyond or beneath, whose soundness we ultimately need to
trust based on incomplete information and reputation. It is therefore
reassuring when the functioning of these layers is at least
<em>plausibly comprehensible</em>, that we can at least imagine understanding
it ourselves if we had enough time, that it is not simply
attributed to sorcery. Because sorcery, in addition to being
definitively opaque to understanding, is also sometimes dead wrong.</p>Parallelize all the things -- Deconstructing Haxl, with Clojure macros, topological sorting and Quasar fibers2016-10-28T00:00:00-04:002016-10-28T00:00:00-04:00Peter Fraenkeltag:blog.podsnap.com,2016-10-28:/qaxl.html<p><img alt="Parallel Lines" src="./images/parallel.jpg"></p>
<h1>Select, few</h1>
<p>Recently, there's been a lot of interest in DSLs to address the so-called
<a href="http://use-the-index-luke.com/sql/join/nested-loops-join-n1-problem">N+1 selects problem</a>,
which describes the very common situation where you</p>
<ol>
<li>Do one database query to get a list of things, say user IDs;</li>
<li>for each, call some sort of processing function that happens to</li>
<li>do another query, say for each user's name.</li>
</ol>
<p>It can be even worse than that, as in this example, where (for some reason)
we want to append the last name of every person in a group to the first name
of their best friend. (Maybe they're getting married.)</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn …</span></code></pre></div><p><img alt="Parallel Lines" src="./images/parallel.jpg"></p>
<h1>Select, few</h1>
<p>Recently, there's been a lot of interest in DSLs to address the so-called
<a href="http://use-the-index-luke.com/sql/join/nested-loops-join-n1-problem">N+1 selects problem</a>,
which describes the very common situation where you</p>
<ol>
<li>Do one database query to get a list of things, say user IDs;</li>
<li>for each, call some sort of processing function that happens to</li>
<li>do another query, say for each user's name.</li>
</ol>
<p>It can be even worse than that, as in this example, where (for some reason)
we want to append the last name of every person in a group to the first name
of their best friend. (Maybe they're getting married.)</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">mangleNamesFromGroup</span> <span class="p">[</span><span class="nv">gid</span><span class="p">]</span>
<span class="p">(</span><span class="nb">join </span><span class="s">", "</span> <span class="p">(</span><span class="nb">map </span> <span class="o">#</span><span class="p">(</span><span class="nb">str </span><span class="p">(</span><span class="nf">getFirstName</span> <span class="p">(</span><span class="nf">getBff</span> <span class="nv">%</span><span class="p">))</span> <span class="s">" "</span> <span class="p">(</span><span class="nf">getLastName</span> <span class="nv">%</span><span class="p">))</span>
<span class="p">(</span><span class="nf">getUserIds</span> <span class="nv">gid</span><span class="p">))))</span>
</code></pre></div>
<p>To the database, this looks like a call to select <code>id</code>s in the group, followed by
interleaved calls to <code>getBff</code>, <code>getFirstName</code>, <code>getLastName</code>. So it's sort of
the <code>3N+1</code> selects problem.</p>
<p>Code like this can be easily optimized in different ways, depending on
the capabilities of the database. E.g., maybe there are efficient plural versions of
all the <code>get</code> functions:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">mangleNamesFromGroup</span> <span class="p">[</span><span class="nv">gid</span><span class="p">]</span>
<span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">ids</span> <span class="p">(</span><span class="nf">getUserIds</span> <span class="nv">gid</span><span class="p">)</span>
<span class="nv">lns</span> <span class="p">(</span><span class="nf">getLastNames</span> <span class="nv">ids</span><span class="p">)</span>
<span class="nv">bfs</span> <span class="p">(</span><span class="nf">getBffs</span> <span class="nv">ids</span><span class="p">)</span>
<span class="nv">fns</span> <span class="p">(</span><span class="nf">getFirstNames</span> <span class="nv">bfs</span><span class="p">)]</span>
<span class="p">(</span><span class="nb">join </span><span class="s">","</span> <span class="p">(</span><span class="nb">map </span><span class="o">#</span><span class="p">(</span><span class="nb">str </span><span class="nv">%1</span> <span class="s">" "</span> <span class="nv">%2</span><span class="p">)</span> <span class="nv">ids</span> <span class="nv">lns</span><span class="p">))))</span>
</code></pre></div>
<p>Or, if the database supports joins directly, we could write a special bespoke
function to do the whole thing for us.</p>
<p>It's also possible that it's ok to continue doing repeated queries for
single values, <em>as long as similar queries are performed at
approximately the same time</em><sup id="fnref:sametime"><a class="footnote-ref" href="#fn:sametime">2</a></sup>. We still have to rewrite the
code so the queries aren't alternating and blocked on each other:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">mangleNamesFromGroup</span> <span class="p">[</span><span class="nv">gid</span><span class="p">]</span>
<span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">ids</span> <span class="p">(</span><span class="nf">getUserIds</span> <span class="nv">gid</span><span class="p">)</span>
<span class="nv">lns</span> <span class="p">(</span><span class="nb">map </span><span class="nv">getLastName</span> <span class="nv">ids</span><span class="p">)</span>
<span class="nv">bfs</span> <span class="p">(</span><span class="nb">map </span><span class="nv">getBff</span> <span class="nv">ids</span><span class="p">)</span>
<span class="nv">fns</span> <span class="p">(</span><span class="nb">map </span><span class="nv">getFirstName</span> <span class="nv">bfs</span><span class="p">)]</span>
<span class="p">(</span><span class="nb">join </span><span class="s">","</span> <span class="p">(</span><span class="nb">map </span><span class="o">#</span><span class="p">(</span><span class="nb">str </span><span class="nv">%1</span> <span class="s">" "</span> <span class="nv">%2</span><span class="p">)</span> <span class="nv">fns</span> <span class="nv">lns</span><span class="p">))))</span>
</code></pre></div>
<p>Now, a wave of <code>getLastName</code>s, followed by <code>getBff</code>s, and then the <code>getFirstName</code>s go out,
and assuming that some facility is grouping each batch into a single query, we now have
a <code>3+1=4</code> problem.</p>
<p>This is not awful, but... no actually it is. Why should we have to
contort our code around the best practices of whatever database we
happen to be using? Why should we have to refactor our code just
because we switch to a new database that wants the batching to be done
slightly differently? (And you also might wonder why we wait to do the
<code>getBff</code>s until the <code>getLastName</code>s are done.)</p>
<p>Don't you wish you could just sprinkle some pixie dust on the code we
started with, and it would somehow all be ok?</p>
<p>Haskell programmers, delicate dreamers that they are, prominently
<a href="https://ocharles.org.uk/blog/posts/2014-03-24-queries-in-loops-without-a-care-in-the-world.html">bemoaned</a> the lack of pixie dust.
And then they made some.</p>
<h1>Haxl arose</h1>
<p>In 2014, Simon Marlow (of <a href="http://shop.oreilly.com/product/0636920026365.do">Parallel and Concurrent Programming in Haskell</a>
fame) and collaborators at Facebook published
<a href="http://community.haskell.org/~simonmar/papers/haxl-icfp14.pdf">There is no Fork: an Abstraction for Efficient, Concurrent, and Concise Data Access</a>.
(There's also a nice <a href="https://www.youtube.com/watch?v=T-oekV8Pwv8">talk</a>, and Facebook even open-sourced the
<a href="https://github.com/facebook/Haxl">code</a>.)</p>
<p>There's a Haxl-inspired package for
<a href="https://github.com/kachayev/muse">clojure</a>, and
<a href="https://github.com/getclump/clump/blob/master/README.md">at</a>
<a href="https://47deg.github.io/fetch/">least</a> <a href="https://engineering.twitter.com/university/videos/introducing-stitch">three</a>
for Scala, though the last of these, Stitch, is a private, internal project at Twitter.</p>
<p>At the risk of over-simplifying, I'm going to divide their idea into
two pieces and largely talk about only one of them.</p>
<h2>Piece I</h2>
<p>One of the pieces
is about how to cause code written using standard Haskell language
features and library abstractions (<code>do</code> notation, applicative
functors - plus a little textual code transformation) to induce some
kind of batched I/O behavior. For example, using their <code>Fetch</code> type
might look like</p>
<div class="highlight"><pre><span></span><code> <span class="n">runHaxl</span> <span class="n">env</span> <span class="o">$</span> <span class="kr">do</span>
<span class="n">ids</span> <span class="ow"><-</span> <span class="n">getIds</span> <span class="n">grp</span>
<span class="n">mapM</span> <span class="n">getBffName</span>
<span class="kr">where</span>
<span class="n">getBffName</span> <span class="n">id</span> <span class="ow">=</span> <span class="kr">do</span>
<span class="n">bff</span> <span class="ow"><-</span> <span class="n">getBff</span> <span class="n">id</span>
<span class="n">fn</span> <span class="ow"><-</span> <span class="n">getFirstName</span> <span class="n">bff</span>
<span class="n">ln</span> <span class="ow"><-</span> <span class="n">getLastName</span> <span class="n">id</span>
<span class="n">fn</span> <span class="o">++</span> <span class="s">" "</span> <span class="o">++</span> <span class="n">ln</span>
</code></pre></div>
<p>and the scala implementations are along the lines of</p>
<div class="highlight"><pre><span></span><code> <span class="n">getIds</span><span class="p">(</span><span class="n">grp</span><span class="p">).</span><span class="n">traverse</span> <span class="p">{</span><span class="n">id</span> <span class="o">=></span>
<span class="k">for</span><span class="p">(</span><span class="n">bff</span> <span class="o"><-</span> <span class="n">getBff</span><span class="p">(</span><span class="n">id</span><span class="p">)</span>
<span class="n">fn</span> <span class="o"><-</span> <span class="n">getFirstName</span><span class="p">(</span><span class="n">bff</span><span class="p">)</span>
<span class="n">ln</span> <span class="o"><-</span> <span class="n">getLastName</span> <span class="n">id</span><span class="p">)</span>
<span class="k">yield</span> <span class="s">s"</span><span class="si">$</span><span class="n">fn</span><span class="s"> </span><span class="si">$</span><span class="n">ln</span><span class="s">"</span>
<span class="p">}</span>
</code></pre></div>
<p>(These might be not exactly right, but since I didn't tag this as a Haskell or Scala post, that's ok,
yes?) There was, of course, concurrency before Haxl, but, they note</p>
<blockquote>
<p>All these previous formulations of concurrency monads used some kind
of fork operation to explicitly indicate when to create a new thread
of control. In contrast, in this paper there is no fork. The
concurrency will be implicit in the structure of the computations we
write.</p>
</blockquote>
<p>They achieve this by introducing an applicative functor:</p>
<blockquote>
<p>This is the key piece of our design: when computations in <code>Fetch</code>
are composed using the <code><*></code> operator, both arguments of <code><*></code> can
be explored to search for Blocked computations, which creates the
possibility that a computation may be blocked on multiple things
simultaneously. This is in contrast to the monadic bind operator,
<code>>>=</code>, which does not admit exploration of both arguments, because
the right hand side cannot be evaluated without the result from the
left.</p>
</blockquote>
<p>This means that there's actually slightly more than the usual <code>do</code> magic going on
In this case, the <code>getBff</code> and <code>getLastName</code> ought to be completely independent, so they
actually preprocess the second <code>do</code> into</p>
<div class="highlight"><pre><span></span><code> <span class="kr">do</span>
<span class="p">(</span><span class="n">bff</span><span class="p">,</span><span class="n">ln</span><span class="p">)</span> <span class="ow"><-</span> <span class="p">(,)</span> <span class="o">$</span> <span class="p">(</span><span class="n">getBff</span> <span class="n">id</span><span class="p">)</span> <span class="o"><*></span> <span class="p">(</span><span class="n">getLastName</span> <span class="n">id</span><span class="p">)</span>
<span class="n">fn</span> <span class="ow"><-</span> <span class="n">getFirstName</span> <span class="n">bff</span>
<span class="n">fn</span> <span class="o">++</span> <span class="s">" "</span> <span class="o">++</span> <span class="n">ln</span>
</code></pre></div>
<p>using a compiler <a href="http://research.microsoft.com/en-us/um/people/simonpj/papers/list-comp/applicativedo.pdf">extension</a>.</p>
<p>There is much
<a href="http://elvishjerricco.github.io/2016/09/17/abstracting-async-concurrently.html">handwringing</a> over
the need to use an applicative implementation that differs from the
default <code>ap</code> defined for monads. Whether or not the controversy
interests you, it's not of much significance in Clojure, where
multi-argument <code>map</code>-like functions abound, and typological conundra
do not keep us up at night.</p>
<p>Moreover, I will argue that the need to use
applicatives in the first place is somewhat artificial. Once we admit
that we're willing to do some sort of dependency-aware code transformation (as with <code>do</code>),
we can avoid applicatives by doing just a little more of it.</p>
<h2>Piece II</h2>
<p>The language agnostic piece is that code that looks like it's doing normal stuff
like looping and fetching is actually constructing a dependency graph.
Instead of returning results directly, functions return a <em>mutable</em>
<code>Result</code> object that contains either</p>
<ol>
<li>an actual result <strong>value</strong>, </li>
<li>a list of other <code>Result</code>s that we're <strong>waiting</strong> on</li>
<li>a description of a <strong>query</strong> that could fulfill the <code>Result</code>.</li>
</ol>
<p>The actual <code>query</code> functions are memoized to return a <code>Result</code> object that starts out as
a <code>Result(query="whatever")</code>, but at some point in the future can have a value poked into it.</p>
<p>Non-query function calls are jiggered so that, if any of their arguments is a non-<strong>value</strong> <code>Result</code>, they
will themselves return a <code>Result(waiting=...)</code>, listing those <code>Results</code>, on which they depend.
Eventually, we end up
with a tree of nested <code>Result</code>s,
the leaves of which are queries to fulfill. We scoop up all the
leaves, figure out how to fulfill the queries and cram the results
back into their <code>Result</code> object, and repeat the program from the beginning.</p>
<p>On the second run, we get a new batch of leaf queries to fulfill, and we repeat until there are no
blocks, and a result can be computed.</p>
<p>The really important thing here is that, once we write our program
using Haxl, the "scoop" and "figure" work is <em>somewhat decomplected</em>
from program logic. I say somewhat, because in writing with the Haxl
DSL in the first place, we've already made concessions to the fact
that batching is in some way important.</p>
<h2>Running mangleNames under a Haxl-like system</h2>
<p>It's interesting to trace through our example program, noting what
<code>Result</code> trees might be produced in each pass. Actually, the trees I'm
noting were produced by a quick-and-dirty Clojure implementation that
will be explained in the next section, which you might want to skip to
if you prefer to read Clojure code than to imagine it. (Or if you'll
be annoyed when you discover that the next second is really a superset
of this one.)</p>
<p>Our haxlified <code>mangleNames</code> on the first pass might return something
like,</p>
<div class="highlight"><pre><span></span><code> Result[:waiting (Result[:query ("select id from groups where gid=" 37)])]
</code></pre></div>
<p>indicating that we're blocked on a result that's blocked on our first query.</p>
<p>Assume that our system knows how to run the query, the results get shoved back into the <code>Result</code>, giving
us a tree that looks like this:</p>
<div class="highlight"><pre><span></span><code> Result[:waiting [Result[:value [321 123]]]]
</code></pre></div>
<p>Now we run the program again, and the memoized groups query returns the now
fulfilled <code>Result[:value [321 123]</code> object, so we block a little further down. The new tree blocked on a different group of queries:</p>
<div class="highlight"><pre><span></span><code> Result[:waiting (
Result[:waiting (
Result[:waiting (
Result[:waiting (
Result[:query ("select bff from stuff where id=" 123)])]
Result[:query ("select lastName from stuff where id=" 123)])]
Result[:waiting (
Result[:waiting (
Result[:query ("select bff from stuff where id=" 321)])]
Result[:query ("select lastName from stuff where id=" 321)])])])]
</code></pre></div>
<p>Now we scoop up these two similar queries, let our database combine them which whatever clever magic it prefers,
and inject the answers into the <code>Result</code> objects so the tree looks like</p>
<div class="highlight"><pre><span></span><code> Result[:waiting (
Result[:waiting (
Result[:waiting (
Result[:waiting (
Result[:value 777])]
Result[:value "Trump"])]
Result[:waiting (
Result[:waiting (
Result[:value 888])]
Result[:value "Putin"])])])]
</code></pre></div>
<p>and run again, now yielding a batch of similar queries for name</p>
<div class="highlight"><pre><span></span><code> Result[:waiting (
Result[:waiting (
Result[:waiting (
Result[:query ("select firstName from stuff where id=" 777)])]
Result[:waiting (
Result[:query ("select firstName from stuff where id=" 888)])])])]]
</code></pre></div>
<p>whose results we stuff back into their <code>Result</code> holders:</p>
<div class="highlight"><pre><span></span><code> Result[:waiting (
Result[:waiting (
Result[:waiting (
Result[:value "Mutt"])]
Result[:waiting
(Result[:value "Jeff"])])])]
</code></pre></div>
<p>And we run the program a fourth time, finally getting back</p>
<div class="highlight"><pre><span></span><code> "Mutt Trump, Jeff Putin"
</code></pre></div>
<h2>A shoddy implementation in Clojure</h2>
<p>It was instructive for me to hack this up in Clojure, even though there's already a
a more professional
<a href="https://www.youtube.com/watch?v=T-oekV8Pwv8">implementation</a> called <a href="https://github.com/kachayev/muse">Muse</a>.</p>
<p>First, note that the Haskell Haxl <code>Result</code> has an <code>IORef</code> in the middle.</p>
<div class="highlight"><pre><span></span><code><span class="kr">data</span> <span class="kt">Result</span> <span class="n">a</span> <span class="ow">=</span> <span class="kt">Done</span> <span class="n">a</span> <span class="o">|</span> <span class="kt">Blocked</span> <span class="p">(</span><span class="kt">Seq</span> <span class="kt">BlockedRequest</span><span class="p">)</span> <span class="p">(</span><span class="kt">Fetch</span> <span class="n">a</span><span class="p">)</span>
<span class="kr">data</span> <span class="kt">BlockedRequest</span> <span class="ow">=</span> <span class="n">forall</span> <span class="n">a</span> <span class="o">.</span> <span class="kt">BlockedRequest</span> <span class="p">(</span><span class="kt">Request</span> <span class="n">a</span><span class="p">)</span> <span class="p">(</span><span class="kt">IORef</span> <span class="p">(</span><span class="kt">FetchStatus</span> <span class="n">a</span><span class="p">))</span>
<span class="kr">data</span> <span class="kt">FetchStatus</span> <span class="n">a</span> <span class="ow">=</span> <span class="kt">NotFetched</span> <span class="o">|</span> <span class="kt">FetchSuccess</span> <span class="n">a</span>
</code></pre></div>
<p>They don't actually avoid mutation altogether, and neither will we.</p>
<p>The possibly blocking object</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">deftype </span><span class="nv">Result</span> <span class="p">[</span><span class="nv">a</span><span class="p">])</span>
<span class="p">(</span><span class="kd">defmethod </span><span class="nv">print-method</span> <span class="nv">Result</span> <span class="p">[</span><span class="nv">o</span>, <span class="nv">w</span><span class="p">]</span> <span class="p">(</span><span class="nf">.write</span> <span class="nv">w</span> <span class="p">(</span><span class="nb">str </span><span class="s">"Result"</span> <span class="o">@</span><span class="p">(</span><span class="nf">.a</span> <span class="nv">o</span><span class="p">))))</span>
</code></pre></div>
<p>contains a <code>volatile</code> that will hold <code>[:value final-result]</code>,
<code>[:query some-query]</code> or <code>[:waiting list-of-results]</code>. I overrode <code>print-method</code> to
make the output a little more presentable here. You probably wouldn't do that in real life.</p>
<p>A basic utility function tries to extract results from a potential <code>Result</code>,
returning the extracted value if possible:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">try-get</span> <span class="p">[</span><span class="nv">r</span><span class="p">]</span>
<span class="p">(</span><span class="nb">if-not </span><span class="p">(</span><span class="nb">instance? </span><span class="nv">Result</span> <span class="nv">r</span><span class="p">)</span> <span class="nv">r</span>
<span class="p">(</span><span class="nf">match</span> <span class="p">[</span><span class="o">@</span><span class="p">(</span><span class="nf">.a</span> <span class="nv">r</span><span class="p">)]</span>
<span class="p">[[</span><span class="ss">:value</span> <span class="nv">v</span><span class="p">]]</span> <span class="nv">v</span>
<span class="ss">:else</span> <span class="nv">r</span><span class="p">)))</span>
</code></pre></div>
<p>The next most fundamental operation is to apply a function to a collection...</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">try-coll</span> <span class="p">[</span><span class="nv">coll</span> <span class="nv">f</span><span class="p">]</span>
<span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">coll</span> <span class="p">(</span><span class="nf">try-get</span> <span class="nv">coll</span><span class="p">)]</span>
</code></pre></div>
<p>or what we hope is a collection, as it might itself be a <code>Result</code>...</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="k">if </span><span class="p">(</span><span class="nb">instance? </span><span class="nv">Result</span> <span class="nv">coll</span><span class="p">)</span> <span class="nv">coll</span>
</code></pre></div>
<p>or contain unfulfilled <code>Result</code>s...</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">coll</span> <span class="p">(</span><span class="nb">map </span><span class="nv">try-get</span> <span class="nv">coll</span><span class="p">)</span>
<span class="nv">bs</span> <span class="p">(</span><span class="nb">filter </span><span class="o">#</span><span class="p">(</span><span class="nb">instance? </span><span class="nv">Result</span> <span class="nv">%</span><span class="p">)</span> <span class="nv">coll</span><span class="p">)]</span>
</code></pre></div>
<p>in which case, we note our dependency on them,</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="k">if </span><span class="p">(</span><span class="nb">seq </span><span class="nv">bs</span><span class="p">)</span> <span class="p">(</span><span class="nf">->Result</span> <span class="p">(</span><span class="nf">volatile!</span> <span class="p">[</span><span class="ss">:waiting</span> <span class="nv">bs</span><span class="p">]))</span>
</code></pre></div>
<p>but if the collection is complete, we actually evaluate the function,</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="nf">try-get</span> <span class="p">(</span><span class="nf">f</span> <span class="nv">coll</span><span class="p">)))))))</span>
</code></pre></div>
<p>Our queries are just placeholders:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="k">def </span><span class="nv">query</span> <span class="p">(</span><span class="nf">memoize</span> <span class="p">(</span><span class="k">fn </span><span class="p">[</span><span class="o">&</span> <span class="nv">q</span><span class="p">]</span> <span class="p">(</span><span class="nf">->Result</span> <span class="p">(</span><span class="nf">volatile!</span> <span class="p">[</span><span class="ss">:query</span> <span class="nv">q</span><span class="p">])))))</span>
</code></pre></div>
<p>And we write some helper methods to invoke functions whose arguments
might be <code>Result</code>s or which might return them:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">block-apply</span> <span class="p">[</span><span class="nv">f</span> <span class="o">&</span> <span class="nv">args</span><span class="p">]</span> <span class="p">(</span><span class="nf">try-coll</span> <span class="nv">args</span> <span class="o">#</span><span class="p">(</span><span class="nb">apply </span><span class="nv">f</span> <span class="nv">%</span><span class="p">)))</span>
<span class="p">(</span><span class="kd">defn </span><span class="nv">block-map</span> <span class="p">[</span><span class="nv">f</span> <span class="nv">coll</span><span class="p">]</span> <span class="p">(</span><span class="nf">try-coll</span> <span class="p">(</span><span class="nf">try-coll</span> <span class="nv">coll</span> <span class="o">#</span><span class="p">(</span><span class="nb">map </span><span class="nv">f</span> <span class="nv">%</span><span class="p">))</span> <span class="nv">identity</span><span class="p">))</span>
</code></pre></div>
<p>For demo purposes, it will be useful to extract all the query leaf nodes in a tree:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">reap</span> <span class="p">[</span><span class="nv">b</span><span class="p">]</span>
<span class="p">(</span><span class="nb">if-not </span><span class="p">(</span><span class="nb">instance? </span><span class="nv">Result</span> <span class="nv">b</span><span class="p">)</span> <span class="p">[]</span>
<span class="p">(</span><span class="nf">match</span> <span class="p">[</span><span class="o">@</span><span class="p">(</span><span class="nf">.a</span> <span class="nv">b</span><span class="p">)]</span>
<span class="p">[[</span><span class="ss">:query</span> <span class="nv">_</span><span class="p">]]</span> <span class="p">[</span><span class="nv">b</span><span class="p">]</span>
<span class="p">[[</span><span class="ss">:waiting</span> <span class="nv">rs</span><span class="p">]]</span> <span class="p">(</span><span class="nb">mapcat </span><span class="nv">reap</span> <span class="nv">rs</span><span class="p">))))</span>
</code></pre></div>
<h2>Testing out the shoddy implementation in Clojure</h2>
<p>As promised, this is going to look just like the pretend example in the second before last.</p>
<p>We define some query functions,</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">getUserIds</span> <span class="p">[</span><span class="nv">gid</span><span class="p">]</span> <span class="p">(</span><span class="nf">query</span> <span class="s">"select id from groups where gid="</span> <span class="nv">gid</span><span class="p">))</span>
<span class="p">(</span><span class="kd">defn </span><span class="nv">getFirstName</span> <span class="p">[</span><span class="nv">id</span><span class="p">]</span> <span class="p">(</span><span class="nf">query</span> <span class="s">"select firstName from stuff where id="</span> <span class="nv">id</span><span class="p">))</span>
<span class="p">(</span><span class="kd">defn </span><span class="nv">getLastName</span> <span class="p">[</span><span class="nv">id</span><span class="p">]</span> <span class="p">(</span><span class="nf">query</span> <span class="s">"select lastName from stuff where id="</span> <span class="nv">id</span><span class="p">))</span>
<span class="p">(</span><span class="kd">defn </span><span class="nv">getBff</span> <span class="p">[</span><span class="nv">id</span><span class="p">]</span> <span class="p">(</span><span class="nf">query</span> <span class="s">"select bff from stuff where id="</span> <span class="nv">id</span><span class="p">))</span>
</code></pre></div>
<p>in terms of which we write the program, using our <code>block-</code> helpers to indicate that the
<code>map</code> or function application might be blocking:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">mangleNamesFromGroup</span> <span class="p">[</span><span class="nv">gid</span><span class="p">]</span>
<span class="p">(</span><span class="nf">block-apply</span> <span class="nb">join </span><span class="s">", "</span>
<span class="p">(</span><span class="nf">block-map</span>
<span class="o">#</span><span class="p">(</span><span class="nf">block-apply</span> <span class="nb">str </span> <span class="p">(</span><span class="nf">block-apply</span> <span class="nv">getFirstName</span> <span class="p">(</span><span class="nf">getBff</span> <span class="nv">%</span><span class="p">))</span> <span class="s">" "</span>
<span class="p">(</span><span class="nf">getLastName</span> <span class="nv">%</span><span class="p">))</span>
<span class="p">(</span><span class="nf">getUserIds</span> <span class="nv">gid</span><span class="p">))))</span>
</code></pre></div>
<p>If we were doing the queries sequentially, there we would be alternating
calls to <code>getBff</code> and <code>getFirstName</code>, which would be difficult to optimize.
Let's see what our haxly conniving gets us:</p>
<div class="highlight"><pre><span></span><code><span class="nv">repl></span> <span class="p">(</span><span class="k">def </span><span class="nv">x1</span> <span class="p">(</span><span class="nf">mangleNamesFromGroup</span> <span class="mi">37</span><span class="p">))</span> <span class="nv">x1</span>
</code></pre></div>
<p>As expected, we're blocked on the first query, which we pretend returns <code>[321 123]</code>:</p>
<div class="highlight"><pre><span></span><code><span class="nv">repl></span> <span class="p">(</span><span class="kd">defn </span><span class="nv">fulfill</span> <span class="p">[</span><span class="nv">r</span> <span class="nv">v</span><span class="p">]</span> <span class="p">(</span><span class="nf">vreset!</span> <span class="p">(</span><span class="nf">.a</span> <span class="nv">r</span><span class="p">)</span> <span class="p">[</span><span class="ss">:value</span> <span class="nv">v</span><span class="p">])</span> <span class="nv">r</span><span class="p">)</span>
<span class="nv">repl></span> <span class="p">(</span><span class="nb">-> </span><span class="nv">x1</span> <span class="nv">reap</span> <span class="nb">first </span><span class="p">(</span><span class="nf">fulfill</span> <span class="p">[</span><span class="mi">123</span> <span class="mi">321</span><span class="p">]))</span>
<span class="nv">repl></span> <span class="nv">x1</span>
<span class="nv">Result</span><span class="p">[</span><span class="ss">:waiting</span> <span class="p">(</span><span class="nf">Result</span><span class="p">[</span><span class="ss">:value</span> <span class="p">[</span><span class="mi">123</span> <span class="mi">321</span><span class="p">]])]</span>
</code></pre></div>
<p>Now run the exact same function again,</p>
<div class="highlight"><pre><span></span><code><span class="nv">repl></span> <span class="p">(</span><span class="k">def </span><span class="nv">x2</span> <span class="p">(</span><span class="nf">mangleNamesFromGroup</span> <span class="mi">37</span><span class="p">))</span>
<span class="nv">repl></span> <span class="nv">x2</span>
<span class="nv">Result</span><span class="p">[</span><span class="ss">:waiting</span> <span class="p">(</span>
<span class="nv">Result</span><span class="p">[</span><span class="ss">:waiting</span> <span class="p">(</span>
<span class="nv">Result</span><span class="p">[</span><span class="ss">:waiting</span> <span class="p">(</span>
<span class="nv">Result</span><span class="p">[</span><span class="ss">:waiting</span> <span class="p">(</span>
<span class="nv">Result</span><span class="p">[</span><span class="ss">:query</span> <span class="p">(</span><span class="s">"select bff from stuff where id="</span> <span class="mi">123</span><span class="p">)])]</span>
<span class="nv">Result</span><span class="p">[</span><span class="ss">:query</span> <span class="p">(</span><span class="s">"select lastName from stuff where id="</span> <span class="mi">123</span><span class="p">)])]</span>
<span class="nv">Result</span><span class="p">[</span><span class="ss">:waiting</span> <span class="p">(</span>
<span class="nv">Result</span><span class="p">[</span><span class="ss">:waiting</span> <span class="p">(</span>
<span class="nv">Result</span><span class="p">[</span><span class="ss">:query</span> <span class="p">(</span><span class="s">"select bff from stuff where id="</span> <span class="mi">321</span><span class="p">)])]</span>
<span class="nv">Result</span><span class="p">[</span><span class="ss">:query</span> <span class="p">(</span><span class="s">"select lastName from stuff where id="</span> <span class="mi">321</span><span class="p">)])])])]</span>
</code></pre></div>
<p>This time we get all the queries depending on the user ids we have, and we
fake fulfillment:</p>
<div class="highlight"><pre><span></span><code><span class="nv">repl></span> <span class="p">(</span><span class="nb">doall </span><span class="p">(</span><span class="nb">map </span><span class="o">#</span><span class="p">(</span><span class="k">do </span><span class="p">(</span><span class="nf">fulfill</span> <span class="nv">%1</span> <span class="nv">%2</span><span class="p">))</span> <span class="p">(</span><span class="nf">reap</span> <span class="nv">x2</span><span class="p">)</span> <span class="p">[</span><span class="mi">777</span> <span class="s">"Trump"</span> <span class="mi">888</span> <span class="s">"Putin"</span><span class="p">]))</span>
<span class="nv">repl></span> <span class="nv">x2</span>
<span class="nv">Result</span><span class="p">[</span><span class="ss">:waiting</span> <span class="p">(</span>
<span class="nv">Result</span><span class="p">[</span><span class="ss">:waiting</span> <span class="p">(</span>
<span class="nv">Result</span><span class="p">[</span><span class="ss">:waiting</span> <span class="p">(</span>
<span class="nv">Result</span><span class="p">[</span><span class="ss">:waiting</span> <span class="p">(</span>
<span class="nv">Result</span><span class="p">[</span><span class="ss">:value</span> <span class="mi">777</span><span class="p">])]</span>
<span class="nv">Result</span><span class="p">[</span><span class="ss">:value</span> <span class="s">"Trump"</span><span class="p">])]</span>
<span class="nv">Result</span><span class="p">[</span><span class="ss">:waiting</span> <span class="p">(</span>
<span class="nv">Result</span><span class="p">[</span><span class="ss">:waiting</span> <span class="p">(</span>
<span class="nv">Result</span><span class="p">[</span><span class="ss">:value</span> <span class="mi">888</span><span class="p">])]</span>
<span class="nv">Result</span><span class="p">[</span><span class="ss">:value</span> <span class="s">"Putin"</span><span class="p">])])])]</span>
</code></pre></div>
<p>and again</p>
<div class="highlight"><pre><span></span><code><span class="nv">repl></span> <span class="p">(</span><span class="k">def </span><span class="nv">x3</span> <span class="p">(</span><span class="nf">mangleNamesFromGroup</span> <span class="mi">37</span><span class="p">))</span>
<span class="nv">repl></span> <span class="nv">x3</span>
<span class="nv">Result</span><span class="p">[</span><span class="ss">:waiting</span> <span class="p">(</span>
<span class="nv">Result</span><span class="p">[</span><span class="ss">:waiting</span> <span class="p">(</span>
<span class="nv">Result</span><span class="p">[</span><span class="ss">:waiting</span> <span class="p">(</span>
<span class="nv">Result</span><span class="p">[</span><span class="ss">:query</span> <span class="p">(</span><span class="s">"select firstName from stuff where id="</span> <span class="mi">777</span><span class="p">)])]</span>
<span class="nv">Result</span><span class="p">[</span><span class="ss">:waiting</span> <span class="p">(</span>
<span class="nv">Result</span><span class="p">[</span><span class="ss">:query</span> <span class="p">(</span><span class="s">"select firstName from stuff where id="</span> <span class="mi">888</span><span class="p">)])])])]]</span>
<span class="nv">repl></span> <span class="p">(</span><span class="nb">doall </span><span class="p">(</span><span class="nb">map </span><span class="o">#</span><span class="p">(</span><span class="k">do </span><span class="p">(</span><span class="nf">fulfill</span> <span class="nv">%1</span> <span class="nv">%2</span><span class="p">))</span> <span class="nv">y3</span> <span class="p">[</span><span class="s">"Mutt"</span> <span class="s">"Jeff"</span><span class="p">]))</span>
<span class="nv">repl></span> <span class="nv">x3</span>
<span class="nv">Result</span><span class="p">[</span><span class="ss">:waiting</span> <span class="p">(</span>
<span class="nv">Result</span><span class="p">[</span><span class="ss">:waiting</span> <span class="p">(</span>
<span class="nv">Result</span><span class="p">[</span><span class="ss">:waiting</span> <span class="p">(</span>
<span class="nv">Result</span><span class="p">[</span><span class="ss">:value</span> <span class="s">"Mutt"</span><span class="p">])]</span>
<span class="nv">Result</span><span class="p">[</span><span class="ss">:waiting</span>
<span class="p">(</span><span class="nf">Result</span><span class="p">[</span><span class="ss">:value</span> <span class="s">"Jeff"</span><span class="p">])])])]</span>
</code></pre></div>
<p>and again,</p>
<div class="highlight"><pre><span></span><code><span class="nv">repl></span> <span class="p">(</span><span class="k">def </span><span class="nv">x4</span> <span class="p">(</span><span class="nf">mangleNamesFromGroup</span> <span class="mi">37</span><span class="p">))</span>
<span class="nv">repl></span> <span class="nv">x4</span>
<span class="s">"Mutt Trump, Jeff Putin"</span>
</code></pre></div>
<h1>Exactly what problem have we solved?</h1>
<p>Or at least what problem would we have solved if we hadn't typed in the query responses manually?</p>
<p>Actually, I think we've solved three problems that are only incidentally related.</p>
<p>Problem 1: We've managed to batch similar queries together. That's
nice, but, as noted before, the database might have done this anyway
had we simply sent them at approximately the same time.</p>
<p>Problem 2: We've <em>desequentialized</em> individual, interleaved queries,
so they can be written to reflect the logical use of their results
rather than the preferred timing of execution.</p>
<p>Problem 3: We did this without explicitly forking.</p>
<p>If you're a web developer, you might think that this is like an elixir for gout,
as a binge on blocking IO is a bit of a luxury to begin with.
Had we been forced from the beginning to write non-blocking code - say, in javascript -
the sequentiality problem would likely never have arisen.<sup id="fnref:nodejs"><a class="footnote-ref" href="#fn:nodejs">1</a></sup> In fact, it might
have required willful effort to force the queries to run sequentially and interleaved.</p>
<p>To prove the point, let's temporarily drop the "there is no fork" part of our
adventure. We'll mock asynchronous queries as returning a <code>core.async</code> channel
that prints to stdout so we can see what it's doing and then, after a short delay,
just munges some strings:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">aquery</span> <span class="p">[</span><span class="nv">s</span> <span class="nv">i</span><span class="p">]</span> <span class="p">(</span><span class="nf">go</span> <span class="p">(</span><span class="nb">println </span><span class="p">(</span><span class="nb">str </span><span class="nv">s</span> <span class="s">"("</span> <span class="nv">i</span> <span class="s">") "</span><span class="p">))</span> <span class="p">(</span><span class="nf"><!</span> <span class="p">(</span><span class="nf">timeout</span> <span class="mi">1000</span><span class="p">))</span> <span class="p">(</span><span class="nb">str </span><span class="nv">s</span> <span class="s">"-"</span> <span class="nv">i</span><span class="p">)))</span>
</code></pre></div>
<p>The simplest async version of our program just launches the queries and dereferences
them immediately, using <code>async/map</code> to turn a sequence of channels into a single channel:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">mangleNamesFromGroupAsync</span> <span class="p">[</span><span class="nv">gid</span><span class="p">]</span>
<span class="p">(</span><span class="nf">a/<!!</span> <span class="p">(</span><span class="nf">a/map</span> <span class="nb">vector </span>
<span class="p">(</span><span class="nb">map </span><span class="o">#</span><span class="p">(</span><span class="nf">go</span> <span class="p">(</span><span class="nb">str </span><span class="p">(</span><span class="nf"><!</span> <span class="p">(</span><span class="nf">aquery</span> <span class="s">"getName"</span> <span class="p">(</span><span class="nf"><!</span> <span class="p">(</span><span class="nf">aquery</span> <span class="s">"getBff"</span> <span class="nv">%</span><span class="p">))))</span>
<span class="s">" "</span> <span class="p">(</span><span class="nf"><!</span> <span class="p">(</span><span class="nf">aquery</span> <span class="s">"getLastName"</span> <span class="nv">%</span><span class="p">))))</span>
<span class="p">(</span><span class="nf">a/<!!</span> <span class="p">(</span><span class="nf">aquery</span> <span class="s">"getIds"</span> <span class="nv">gid</span><span class="p">)))))</span> <span class="p">)</span>
</code></pre></div>
<p>With some minor reformatting:</p>
<div class="highlight"><pre><span></span><code><span class="nv">repl</span><span class="o">></span> <span class="ss">(</span><span class="nv">mangleNamesFromGroupAsync</span> <span class="mi">37</span><span class="ss">)</span>
<span class="nv">getIds</span><span class="ss">(</span><span class="mi">37</span><span class="ss">)</span>
<span class="o"><</span><span class="k">pause</span><span class="o">></span>
<span class="nv">getBff</span><span class="ss">(</span><span class="mi">123</span><span class="ss">)</span> <span class="nv">getBff</span><span class="ss">(</span><span class="mi">321</span><span class="ss">)</span>
<span class="o"><</span><span class="k">pause</span><span class="o">></span>
<span class="nv">getName</span><span class="ss">(</span><span class="mi">777</span><span class="ss">)</span> <span class="nv">getName</span><span class="ss">(</span><span class="mi">888</span><span class="ss">)</span>
<span class="o"><</span><span class="k">pause</span><span class="o">></span>
<span class="nv">getLastName</span><span class="ss">(</span><span class="mi">123</span><span class="ss">)</span> <span class="nv">getLastName</span><span class="ss">(</span><span class="mi">321</span><span class="ss">)</span>
<span class="o"><</span><span class="k">pause</span><span class="o">></span>
[<span class="s2">"</span><span class="s">Mutt Trump</span><span class="s2">"</span> <span class="s2">"</span><span class="s">Jeff Putin</span><span class="s2">"</span>]
</code></pre></div>
<p>that all the <code>getBff</code> queries were sent without waiting before the <code>getName</code> queries,
but with a lot less fuss and mutation, so maybe we were in nirvana all along...</p>
<h2>Not so fast</h2>
<p>First, while it might be second nature to clojurescript (or whatever
script) developers, this <code>(a/map vector (map #(go (<! aquery ...) ...)
...)</code> business is already a bit removed from plain old <code>(map (aquery
..))</code>. Without thinking about it too much, we complected our code
beyond its functional intent to also express an opinion about how
asynchronicity should be juggled.</p>
<p>Furthermore, it's still not as asynchronous as it should be. There's no reason for
the <code>getBff</code>s and <code>getLastName</code>s to go out in different waves, but we introduced an
artificial dependency in the order of arguments to <code>str</code>. To remove the dependency,
we need to contort a bit further, explicitly launching as many of our queries as we
can, before we need them:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">mangleNamesFromGroupAsync2</span> <span class="p">[</span><span class="nv">gid</span><span class="p">]</span>
<span class="p">(</span><span class="nf">a/<!!</span> <span class="p">(</span><span class="nf">a/map</span> <span class="nb">vector </span>
<span class="p">(</span><span class="nb">map </span><span class="o">#</span><span class="p">(</span><span class="nf">go</span> <span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">c1</span> <span class="p">(</span><span class="nf">go</span> <span class="p">(</span><span class="nf"><!</span> <span class="p">(</span><span class="nf">aquery</span> <span class="s">"getName"</span> <span class="p">(</span><span class="nf"><!</span> <span class="p">(</span><span class="nf">aquery</span> <span class="s">"getBff"</span> <span class="nv">%</span><span class="p">)))))</span>
<span class="nv">c2</span> <span class="p">(</span><span class="nf">aquery</span> <span class="s">"getLastName"</span> <span class="nv">%</span><span class="p">)]</span>
<span class="p">(</span><span class="nb">str </span><span class="p">(</span><span class="nf"><!</span> <span class="nv">c1</span><span class="p">)</span> <span class="s">" "</span> <span class="p">(</span><span class="nf"><!</span> <span class="nv">c2</span><span class="p">))))</span>
<span class="p">(</span><span class="nf">a/<!!</span> <span class="p">(</span><span class="nf">aquery</span> <span class="s">"getIds"</span> <span class="nv">gid</span><span class="p">)))))</span> <span class="p">)</span>
</code></pre></div>
<p>Now</p>
<div class="highlight"><pre><span></span><code><span class="nv">repl></span> <span class="p">(</span><span class="nf">mangleNamesFromGroupAsync2</span> <span class="mi">37</span><span class="p">)</span>
<span class="nv">getIds</span><span class="p">(</span><span class="mi">37</span><span class="p">)</span>
<span class="nv"><pause></span>
<span class="nv">getLastName</span><span class="p">(</span><span class="mi">123</span><span class="p">)</span> <span class="nv">getLastName</span><span class="p">(</span><span class="mi">321</span><span class="p">)</span>
<span class="nv">getBff</span><span class="p">(</span><span class="mi">123</span><span class="p">)</span> <span class="nv">getBff</span><span class="p">(</span><span class="mi">321</span><span class="p">)</span>
<span class="nv"><pause></span>
<span class="nv">getName</span><span class="p">(</span><span class="mi">777</span><span class="p">)</span> <span class="nv">getName</span><span class="p">(</span><span class="mi">888</span><span class="p">)</span>
<span class="nv"><pause></span>
<span class="p">[</span><span class="s">"Mutt Trump"</span> <span class="s">"Jeff Putin"</span><span class="p">]</span>
</code></pre></div>
<p>we have only two pauses.</p>
<p>So even with a certain amount of the magic taken care of by asynchronous constructs,
we still have to shovel the code around to get it to behave. Not only have we not
rid ourselves of fork, but we've made deploying it correctly a central part of the
coding challenge.</p>
<h1>A tree is a tree</h1>
<div class="highlight"><pre><span></span><code><span class="nv">I</span> <span class="nv">mean</span>, <span class="k">if</span> <span class="nv">you</span><span class="s1">'</span><span class="s">ve looked at a hundred thousand acres or so of trees — you know,</span>
<span class="nv">a</span> <span class="nv">tree</span> <span class="nv">is</span> <span class="nv">a</span> <span class="nv">tree</span>, <span class="nv">how</span> <span class="nv">many</span> <span class="nv">more</span> <span class="k">do</span> <span class="nv">you</span> <span class="nv">need</span> <span class="nv">to</span> <span class="nv">look</span> <span class="nv">at</span>?
<span class="o">-</span> <span class="nv">Ronald</span> <span class="nv">Reagan</span>
</code></pre></div>
<p>Our naive implementation</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">mangleNamesFromGroup</span> <span class="p">[</span><span class="nv">gid</span><span class="p">]</span>
<span class="p">(</span><span class="nb">join </span><span class="s">", "</span> <span class="p">(</span><span class="nb">map </span> <span class="o">#</span><span class="p">(</span><span class="nb">str </span><span class="p">(</span><span class="nf">getFirstName</span> <span class="p">(</span><span class="nf">getBff</span> <span class="nv">%</span><span class="p">))</span> <span class="s">" "</span> <span class="p">(</span><span class="nf">getLastName</span> <span class="nv">%</span><span class="p">))</span>
<span class="p">(</span><span class="nf">getUserIds</span> <span class="nv">gid</span><span class="p">))))</span>
</code></pre></div>
<p>has an implicit tree structure, defined by its nested parentheses.</p>
<div class="highlight"><pre><span></span><code> (map . . )
| |
#(str . " " . ) (getUserIds gid)
/ \
(getFirstName . ) (getLastName %)
|
(getBff %)
</code></pre></div>
<p>Then, when we ran</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">mangleNamesFromGroup</span> <span class="p">[</span><span class="nv">gid</span><span class="p">]</span>
<span class="p">(</span><span class="nf">block-apply</span> <span class="nb">join </span><span class="s">", "</span>
<span class="p">(</span><span class="nf">block-map</span>
<span class="o">#</span><span class="p">(</span><span class="nf">block-apply</span> <span class="nb">str </span> <span class="p">(</span><span class="nf">block-apply</span> <span class="nv">getFirstName</span> <span class="p">(</span><span class="nf">getBff</span> <span class="nv">%</span><span class="p">))</span> <span class="s">" "</span>
<span class="p">(</span><span class="nf">getLastName</span> <span class="nv">%</span><span class="p">))</span>
<span class="p">(</span><span class="nf">getUserIds</span> <span class="nv">gid</span><span class="p">))))</span>
</code></pre></div>
<p>we got a <code>Result</code> tree, whose edges were defined by references to
child <code>Result</code>s in the <code>:waiting</code> list. Admittedly, we didn't get it all at once,
but in 3 successive passes:</p>
<div class="highlight"><pre><span></span><code> :waiting______________
/ \
Pass 1 :query :waiting_______
getIds / \
:waiting__ :waiting___
/ \ \ / \ \
Pass 2 :query :query \ :query :query \
getLN getBff \ getLN getBff \
:query :query
Pass 3 getFN getFN
</code></pre></div>
<p>After we translated</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">mangleNamesFromGroupAsync</span> <span class="p">[</span><span class="nv">gid</span><span class="p">]</span>
<span class="p">(</span><span class="nf">a/<!!</span> <span class="p">(</span><span class="nf">a/map</span> <span class="nb">vector </span>
<span class="p">(</span><span class="nb">map </span><span class="o">#</span><span class="p">(</span><span class="nf">go</span> <span class="p">(</span><span class="nb">str </span><span class="p">(</span><span class="nf"><!</span> <span class="p">(</span><span class="nf">aquery</span> <span class="s">"getName"</span> <span class="p">(</span><span class="nf"><!</span> <span class="p">(</span><span class="nf">aquery</span> <span class="s">"getBff"</span> <span class="nv">%</span><span class="p">))))</span>
<span class="s">" "</span> <span class="p">(</span><span class="nf"><!</span> <span class="p">(</span><span class="nf">aquery</span> <span class="s">"getLastName"</span> <span class="nv">%</span><span class="p">))))</span>
<span class="p">(</span><span class="nf">a/<!!</span> <span class="p">(</span><span class="nf">aquery</span> <span class="s">"getIds"</span> <span class="nv">gid</span><span class="p">)))))</span> <span class="p">)</span>
</code></pre></div>
<p>to</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">mangleNamesFromGroupAsync2</span> <span class="p">[</span><span class="nv">gid</span><span class="p">]</span>
<span class="p">(</span><span class="nf">a/<!!</span> <span class="p">(</span><span class="nf">a/map</span> <span class="nb">vector </span>
<span class="p">(</span><span class="nb">map </span><span class="o">#</span><span class="p">(</span><span class="nf">go</span> <span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">c1</span> <span class="p">(</span><span class="nf">go</span> <span class="p">(</span><span class="nf"><!</span> <span class="p">(</span><span class="nf">aquery</span> <span class="s">"getName"</span> <span class="p">(</span><span class="nf"><!</span> <span class="p">(</span><span class="nf">aquery</span> <span class="s">"getBff"</span> <span class="nv">%</span><span class="p">)))))</span>
<span class="nv">c2</span> <span class="p">(</span><span class="nf">aquery</span> <span class="s">"getLastName"</span> <span class="nv">%</span><span class="p">)]</span>
<span class="p">(</span><span class="nb">str </span><span class="p">(</span><span class="nf"><!</span> <span class="nv">c1</span><span class="p">)</span> <span class="s">" "</span> <span class="p">(</span><span class="nf"><!</span> <span class="nv">c2</span><span class="p">))))</span>
<span class="p">(</span><span class="nf">a/<!!</span> <span class="p">(</span><span class="nf">aquery</span> <span class="s">"getIds"</span> <span class="nv">gid</span><span class="p">)))))</span> <span class="p">)</span>
</code></pre></div>
<p>we get an extremely similar tree, with edges defined by asynchronous waits.
(The <code><n></code> below are supposed to indicate the nth invocation of the closure.)</p>
<div class="highlight"><pre><span></span><code><span class="w"> </span><span class="n">map__________</span><span class="w"></span>
<span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="err">\</span><span class="w"></span>
<span class="w"> </span><span class="n">aquery</span><span class="w"> </span><span class="err">#</span><span class="p">(</span><span class="k">go</span><span class="w"> </span><span class="o"><</span><span class="mi">0</span><span class="o">></span><span class="w"> </span><span class="p">........</span><span class="w"> </span><span class="o"><</span><span class="mi">1</span><span class="o">></span><span class="w"> </span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="n">getIds</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="err">\</span><span class="w"></span>
<span class="w"> </span><span class="n">let______</span><span class="w"> </span><span class="n">let______</span><span class="w"></span>
<span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="err">\</span><span class="w"> </span><span class="err">\</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="err">\</span><span class="w"> </span><span class="err">\</span><span class="w"></span>
<span class="w"> </span><span class="n">aquery</span><span class="w"> </span><span class="n">aquery</span><span class="w"> </span><span class="err">\</span><span class="w"> </span><span class="n">aquery</span><span class="w"> </span><span class="n">aquery</span><span class="w"> </span><span class="err">\</span><span class="w"></span>
<span class="w"> </span><span class="n">getLN</span><span class="w"> </span><span class="n">getBFF</span><span class="w"> </span><span class="err">\</span><span class="w"> </span><span class="n">getLN</span><span class="w"> </span><span class="n">getBFF</span><span class="w"> </span><span class="err">\</span><span class="w"></span>
<span class="w"> </span><span class="n">aquery</span><span class="w"> </span><span class="n">aquery</span><span class="w"></span>
<span class="w"> </span><span class="n">getLN</span><span class="w"> </span><span class="n">getLN</span><span class="w"></span>
</code></pre></div>
<p>In this case, there are no formal batches, but the vertical position
of the node in the diagram indicates <em>roughly</em> when the query actually
occurs. In practice, they may be jittered up and down a bit, and the
order in which they are executed is no longer deterministic, but if
we, like Haxl, assume that queries are referentially transparent, the
programs will return identical results.</p>
<p>Another difference is, as noted previously, that, with no explicit
batching, the <code>core.async</code> approach relies on <em>someone else</em> to
recognize a run of similar queries, and if the someone lives on the
server, this will cause increased network chatter. At the same time,
the Haxl style has the ironic side effect of separating computation
and querying into distinct, <em>sequential</em> phases, in the name of
<em>desequentializing</em> the queries within each phase.</p>
<p>In both cases, similar functional expressions were converted into
similar dependency trees, with queries occurring in similar temporal
waves. And in both cases, the waves are ordered in the same way, with the
deepest dependencies coming first, that is in
<a href="https://en.wikipedia.org/wiki/Topological_sorting">topological sort order</a>.</p>
<p>We infer that the mechanics of converting a functional
expression into parallelized, asynchronous form (which so far, we've
done manually) ought to be very similar to the creation of <code>Result</code>
trees in a Haxl approach, and that both of them are essentially topological
sorts.</p>
<p>That one can achieve batching and parallelism through asynchronous code transformation
is not a novel observation. There's a 2014 paper from
Ramachandra, Chavan, Guravannavar, and Sudarshan at ITT actually called
<a href="https://arxiv.org/pdf/1402.5781v1.pdf">Program Transformations for Asynchronous and Batched Query Submission</a>.
They present a set of formal transformation rules for general procedural programs,
and refer to an implementation for Java in their <a href="https://www.cse.iitb.ac.in/infolab/dbridge/">DBridge</a>
optimizer.
Additionally, about 17 minutes into Jake Donham's
<a href="https://engineering.twitter.com/university/videos/introducing-stitch">Stitch talk</a>,
he describes a possible batching strategy as "accumulating calls until you hit some threshold or something like that,"
which seems to imply that these calls are arriving asynchronously.</p>
<p>It's worth noting that, while the ITT optimizer allows you not just to
write batching-oblivious code, but to write <em>exactly</em> the same code as
you would if queries were instantaneous, this is not an essential feature of
compile-time desequentializing. It's a convenience, but a less important
step beyond the main goal of letting code express functional
intent in a logical way.</p>
<p>(It may have occurred to the careful reader that I am not carefully
distinguishing concurrency and parallelism. This is true but uninteresting.)</p>
<h1>Gifts from the Gods</h1>
<p>What is best in life - according to the original script of Conan the Barbarian -
is to discover that there's some nominally difficult thing that might
actually be easy, because you can slough off nearly all the work onto other people. Once
we generalize "doing all our queries into batches" to "doing everything in parallel", we can
rely on a few great tools in the Clojure tool-shed.</p>
<h2>Gift I: Homoiconicity</h2>
<p>The first set of tools comes from the fact that algorithms expressed
in lisp are vastly easier to transform than in most languages. Haxl
derivatives create a <code>Result</code> structure at runtime; DBridge parses
java code into data structures to which transformation rules are then
applied. Clojure programs are already a data structure, so we get to
skip a step and transform the code directly.</p>
<p>The transformation, as noted, is really a topological sort. Given</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="nf">foo</span> <span class="nv">form1</span> <span class="nv">form2</span> <span class="nv">...</span><span class="p">)</span>
</code></pre></div>
<p>we simply want</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="nf">xform</span> <span class="o">`</span><span class="p">(</span><span class="nf">foo</span> <span class="nv">form1</span> <span class="nv">form2</span> <span class="nv">...</span><span class="p">))</span>
</code></pre></div>
<p>to become</p>
<div class="highlight"><pre><span></span><code> <span class="o">`</span><span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">c1</span> <span class="p">(</span><span class="nf">go</span> <span class="o">~</span><span class="p">(</span><span class="nf">xform</span> <span class="nv">form1</span><span class="p">))</span>
<span class="nv">c2</span> <span class="p">(</span><span class="nf">go</span> <span class="o">~</span><span class="p">(</span><span class="nf">xform</span> <span class="nv">form2</span><span class="p">))</span>
<span class="nv">...</span> <span class="p">]</span>
<span class="p">(</span><span class="nf">foo</span> <span class="p">(</span><span class="nf"><!</span> <span class="nv">c1</span><span class="p">)</span> <span class="p">(</span><span class="nf"><!</span> <span class="nv">c2</span><span class="p">)</span> <span class="nv">...</span><span class="p">)</span>
</code></pre></div>
<p>It's clear how this parallelizes independent argument forms; we can also see that
this depth-first recursive transformation has the effect of
hoisting nested arguments earlier in the evaluation sequence.
For a boring function of a function,</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="nf">foo</span> <span class="p">(</span><span class="nf">bar</span> <span class="nv">x</span><span class="p">))</span>
</code></pre></div>
<p>we get</p>
<div class="highlight"><pre><span></span><code> <span class="o">`</span><span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">c1</span> <span class="p">(</span><span class="nf">go</span> <span class="o">~</span><span class="p">(</span><span class="nf">xform</span> <span class="o">`</span><span class="p">(</span><span class="nf">bar</span> <span class="nv">x</span><span class="p">)))]</span>
<span class="p">(</span><span class="nf">foo</span> <span class="p">(</span><span class="nf"><!</span> <span class="nv">c1</span><span class="p">)))</span>
</code></pre></div>
<p>which becomes</p>
<div class="highlight"><pre><span></span><code> <span class="o">`</span><span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">c1</span> <span class="p">(</span><span class="nf">go</span>
<span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">c2</span> <span class="p">(</span><span class="nf">go</span> <span class="nv">x</span><span class="p">)]</span>
<span class="p">(</span><span class="nf">bar</span> <span class="p">(</span><span class="nf"><!</span> <span class="nv">c2</span><span class="p">))))]</span>
<span class="p">(</span><span class="nf">foo</span> <span class="p">(</span><span class="nf"><!</span> <span class="nv">c1</span><span class="p">)))</span>
</code></pre></div>
<p>In building up the new structure, we traverse the call graph exactly once, so
the whole procedure is O(N), where N is the total number of arguments of all
functions in the program.
It seems that the Haxl procedure requires
re-generating the graph each time leaf nodes are extracted, making it
O(N log N). This reflects the fact that
the Haxl ordering is stronger than topological: not only
is every query run before its result is required (which is of course a necessary
ordering), no
query is run before the previous batch completes (which is really not
necessary).</p>
<p>Once you get past a few ritual incantations, it is almost trivial to produce a macro
that does our transformation:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defmacro </span><span class="nv">parallelize-func-stupid</span> <span class="p">[</span><span class="nv">form</span><span class="p">]</span>
<span class="p">(</span><span class="k">let </span><span class="p">[[</span><span class="nv">f</span> <span class="o">&</span> <span class="nv">args</span><span class="p">]</span> <span class="nv">form</span> <span class="c1">;; Extract function and arguments</span>
<span class="nv">cs</span> <span class="p">(</span><span class="nf">repeatedly</span> <span class="p">(</span><span class="nb">count </span><span class="nv">args</span><span class="p">)</span> <span class="nv">gensym</span><span class="p">)</span> <span class="c1">;; Generate some channel names</span>
<span class="nv">pargs</span> <span class="p">(</span><span class="nb">map </span><span class="nv">parallelize-func-stupid</span> <span class="nv">args</span><span class="p">)</span> <span class="c1">;; Recursively transform the args</span>
<span class="nv">bs</span> <span class="p">(</span><span class="nb">interleave </span><span class="nv">cs</span> <span class="c1">;; Construct the binding block</span>
<span class="p">(</span><span class="nb">map </span><span class="p">(</span><span class="k">fn </span><span class="p">[</span><span class="nv">parg</span><span class="p">]</span> <span class="o">`</span><span class="p">(</span><span class="nf">a/go</span> <span class="o">~</span><span class="nv">parg</span><span class="p">))</span> <span class="nv">pargs</span><span class="p">))</span>
<span class="nv">args</span> <span class="p">(</span><span class="nb">map </span><span class="p">(</span><span class="k">fn </span><span class="p">[</span><span class="nv">c</span><span class="p">]</span> <span class="o">`</span><span class="p">(</span><span class="nf">a/<!</span> <span class="o">~</span><span class="nv">c</span><span class="p">))</span> <span class="nv">cs</span><span class="p">)]</span> <span class="c1">;; Deref the channels</span>
<span class="o">`</span><span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="o">~@</span><span class="nv">bs</span><span class="p">]</span> <span class="p">(</span><span class="o">~</span><span class="nv">f</span> <span class="o">~@</span><span class="nv">args</span><span class="p">))))</span>
</code></pre></div>
<p>Presto!</p>
<p>Well, an incomplete presto. This will actually blow up for arguments that are not
themselves function applications, which means it only works on infinitely sized programs.
Also, we've been acting as if <code>go</code> returned a promise that can be dereferenced an
arbitrary number of times, which of course it really doesn't.
We'll do better below.</p>
<h2>Gift II: JVM Fibers and Quasar</h2>
<p>So far, all my parallelization examples have used <code>core.async</code>, which is certainly not a bad way to go,
but it might be fun to look at an interesting development in the JVM world.
<a href="http://docs.paralleluniverse.co/quasar/">Quasar</a> is an implementation of fibers in Java,
with a Clojure API called <a href="http://docs.paralleluniverse.co/pulsar/">Pulsar</a>.</p>
<p>A fiber presents itself as very much like a normal thread, but much
cheaper to create. Like other inexpensive concurrency constructs,
fibers are, under the hood, essentially co-routines - multiple streams
of execution handing off control to each other cooperatively, so there
may be vastly more such streams than the number of threads available
(though of course the thread pool does limit the number of fibers that
can be actively doing anything at a given time).</p>
<p>In <code>core.async</code>, cooperation is achieved by rewriting code into
state-machine form, using macros. In Quasar, it's done by rewriting
at the byte-code level, so that methods marked "suspendable" can give
up control when waiting on each other. An advantage of the bytecode
approach is that stack traces are preserved, making debugging easier,
and fibers are said to be more efficient than channels when there are
vast numbers of them.</p>
<p>One of the available Pulsar APIs is a drop-in replacement for
<code>core.async</code>, but the requirement to wrap channel operations in a <code>go</code>
block now feels like an artificial inconvenience.
Additionally, what we need for
haxilating is not really a CSP channel but a promise, and, while
<code>clojure.core.async</code> implements promises on top of channels, the
Pulsar promise is closer to the core fiber abstraction, and
<code>co.paralleluniverse.pulsar.async</code> channels represent an additional
layer of abstraction, so I preferred to use promises directly.</p>
<p>The basic element of a Pulsar program is the suspendable function,
which you define with <code>defsfn</code>, e.g.</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="nf">q/defsfn</span> <span class="nv">my-suspendable</span> <span class="p">[</span><span class="nv">x</span><span class="p">]</span> <span class="p">(</span><span class="nf">other-suspendable</span> <span class="nv">x</span><span class="p">))</span>
</code></pre></div>
<p>It's not turtles all the way down: at some point, one of the
suspendables will actually give up control while waiting for a
callback from some external event, and Pulsar provides an <code>await</code>
macro to facilitate that:</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="nf">q/defsfn</span> <span class="nv">other-suspendable</span> <span class="p">[</span><span class="nv">x</span><span class="p">]</span> <span class="o">@</span><span class="p">(</span><span class="nf">fiber</span> <span class="p">(</span><span class="nf">q/await</span> <span class="nv">httpkit/request</span> <span class="nv">x</span><span class="p">)))</span>
</code></pre></div>
<p>Here, the macro expands to call the workhorse <code>httpkit/request</code> with
our argument <code>x</code>, plus a callback, and then faux blocks (meaning that
the Quasar runtime parks us) until that
callback is called. (Actually, Pulsar has already wrapped
<code>httpkit/kit</code> in a <code>defreq</code> macro, from which this example is
shamelessly copied.) One thing that's cool about these suspendable
functions is that they can be used in normal code if you want, but you
can identify them with <code>suspendable?</code>, which is going to be useful in
determining exactly which forms will benefit from parallelization.</p>
<p>The function parallelization macro in Pulsar-land is pretty much the
same as in <code>core.async</code> country, except that we create promises using
<code>(q/promise (fn [] form-to-fulfill-promise)))</code>:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defmacro </span><span class="nv">parallelize-func-stupid2</span> <span class="p">[</span><span class="nv">form</span><span class="p">]</span>
<span class="p">(</span><span class="k">let </span><span class="p">[[</span><span class="nv">f</span> <span class="o">&</span> <span class="nv">args</span><span class="p">]</span> <span class="nv">form</span>
<span class="nv">ps</span> <span class="p">(</span><span class="nf">repeatedly</span> <span class="p">(</span><span class="nb">count </span><span class="nv">args</span><span class="p">)</span> <span class="nv">gensym</span><span class="p">)</span>
<span class="nv">pargs</span> <span class="p">(</span><span class="nb">map </span><span class="nv">parallelize-func-stupid2</span> <span class="nv">args</span><span class="p">)</span>
<span class="nv">bs</span> <span class="p">(</span><span class="nb">interleave </span><span class="nv">ps</span> <span class="p">(</span><span class="nb">map </span><span class="p">(</span><span class="k">fn </span><span class="p">[</span><span class="nv">parg</span><span class="p">]</span> <span class="o">`</span><span class="p">(</span><span class="nf">q/promise</span> <span class="p">(</span><span class="k">fn </span><span class="p">[]</span> <span class="o">~</span><span class="nv">parg</span><span class="p">)))</span> <span class="nv">pargs</span><span class="p">))</span>
<span class="nv">args</span> <span class="p">(</span><span class="nb">map </span><span class="p">(</span><span class="k">fn </span><span class="p">[</span><span class="nv">p</span><span class="p">]</span> <span class="o">`</span><span class="p">(</span><span class="nb">deref </span><span class="o">~</span><span class="nv">p</span><span class="p">))</span> <span class="nv">ps</span><span class="p">)]</span>
<span class="o">`</span><span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="o">~@</span><span class="nv">bs</span><span class="p">]</span> <span class="p">(</span><span class="o">~</span><span class="nv">f</span> <span class="o">~@</span><span class="nv">args</span><span class="p">))))</span>
</code></pre></div>
<h1>QAXL - Quasar Haxl</h1>
<p>Per its evocative suffix, the above parallelization functions are on
dimwitted side. Fortunately, we mostly have to train it <em>not</em> to do
certain things, like insisting on transforming everything whether or not
it needs it.</p>
<p>Qaxl is structured around a central <code>parallelize</code> dispatch function that takes a form
and invokes one of several <code>parallelize-particular-form</code>s,
which in turn call <code>parallelize</code> recursively on elements of the form as needed:
Each of these functions returns a hashmap of</p>
<ul>
<li><code>:par</code> - which is <code>true</code> if any <code>q/suspendable?</code> functions were found within at any level</li>
<li><code>:form</code> - the parallelized form itself</li>
</ul>
<p>When <code>:par</code> is false, we know we can just leave that form alone.</p>
<p>In addition to the input form, it will also be necessary to pass a
hash-map <code>s2p</code> of symbols to promises. Usually, <code>s2p</code> will be passed
down unchanged when processing sub-forms, but it will be augmented in
the course of processing a <code>(let ...)</code> form.</p>
<p>The fundamental building block for a number of form handlers is a utility for taking a sequence of
these parallelized <code>{:par ... :form ...}</code>, and pulling out the ones where <code>:par</code> is true
into <code>let</code> bindings of promises. This is a reduction over a list of forms by a function
<code>(fn [[bs args] p] ...)</code>, which
accumulates bindings into a vector <code>bs</code>, and arguments into a vector <code>args</code> (just
assume for now that the main <code>parallelize</code> dispatch function exists):</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn- </span><span class="nv">par-to-bindings</span> <span class="p">[</span><span class="nv">forms</span> <span class="nv">s2p</span><span class="p">]</span> <span class="c1">;; => [bs args]</span>
<span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">ps</span> <span class="p">(</span><span class="nb">map </span><span class="o">#</span><span class="p">(</span><span class="nf">parallelize</span> <span class="nv">%</span> <span class="nv">s2p</span><span class="p">)</span> <span class="nv">forms</span><span class="p">)</span>
<span class="p">[</span><span class="nv">bs</span> <span class="nv">args</span><span class="p">]</span> <span class="p">(</span><span class="nb">reduce </span><span class="p">(</span><span class="k">fn </span><span class="p">[[</span><span class="nv">bs</span> <span class="nv">args</span><span class="p">]</span> <span class="nv">p</span><span class="p">]</span>
<span class="p">(</span><span class="k">if </span><span class="p">(</span><span class="ss">:par</span> <span class="nv">p</span><span class="p">)</span>
</code></pre></div>
<p>If <code>p</code> is <code>:par</code>allelizable, then <code>bs</code> is augmented to bind a new promise, and <code>args</code> is
augmented to dereference that promise,</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">ch</span> <span class="p">(</span><span class="nb">gensym </span><span class="s">"p"</span><span class="p">)]</span>
<span class="p">[(</span><span class="nb">concat </span><span class="nv">bs</span> <span class="p">[</span><span class="nv">ch</span> <span class="o">`</span><span class="p">(</span><span class="nf">q/promise</span> <span class="p">(</span><span class="k">fn </span><span class="p">[]</span> <span class="o">~</span><span class="p">(</span><span class="ss">:form</span> <span class="nv">p</span><span class="p">)))])</span>
<span class="p">(</span><span class="nb">conj </span><span class="nv">args</span> <span class="o">`</span><span class="p">(</span><span class="nb">deref </span><span class="o">~</span><span class="nv">ch</span><span class="p">))])</span>
</code></pre></div>
<p>while if it's not parallelizable, <code>bs</code> stays the same, and we just append <code>p</code>'s form to <code>args</code>:</p>
<div class="highlight"><pre><span></span><code> <span class="p">[</span><span class="nv">bs</span> <span class="p">(</span><span class="nb">conj </span><span class="nv">args</span> <span class="p">(</span><span class="ss">:form</span> <span class="nv">p</span><span class="p">))]))</span>
<span class="p">[[]</span> <span class="p">[]]</span>
<span class="nv">ps</span><span class="p">)]</span>
<span class="p">[</span><span class="nv">ps</span> <span class="nv">bs</span> <span class="p">(</span><span class="nb">seq </span><span class="nv">args</span><span class="p">)]))</span>
</code></pre></div>
<p>With the binding function defined, <code>parallelize-func</code> doesn't have to do much more than
build the <code>let</code> if there are promise bindings, or a normal form if there aren't.</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn- </span><span class="nv">parallelize-func</span> <span class="p">[</span><span class="nv">forms</span> <span class="nv">s2p</span><span class="p">]</span>
<span class="p">(</span><span class="k">let </span><span class="p">[[</span><span class="nv">ps</span> <span class="nv">bs</span> <span class="nv">forms</span><span class="p">]</span> <span class="p">(</span><span class="nf">par-to-bindings</span> <span class="nv">forms</span> <span class="nv">s2p</span><span class="p">)]</span>
<span class="p">(</span><span class="k">if </span><span class="p">(</span><span class="nb">seq </span><span class="nv">bs</span><span class="p">)</span>
<span class="p">{</span><span class="ss">:form</span> <span class="o">`</span><span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="o">~@</span><span class="nv">bs</span><span class="p">]</span> <span class="p">(</span><span class="o">~@</span><span class="nv">forms</span><span class="p">))</span> <span class="ss">:par</span> <span class="nv">true</span><span class="p">}</span>
<span class="p">{</span><span class="ss">:form</span> <span class="o">`</span><span class="p">(</span><span class="o">~@</span><span class="nv">forms</span><span class="p">)}</span>
</code></pre></div>
<p>In some cases, we can't just transform the way a function is called, but we need to
actually change the function. For example, it would be silly to transform</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="nb">map </span><span class="nv">foo</span> <span class="p">[</span><span class="mi">1</span> <span class="mi">2</span> <span class="mi">3</span><span class="p">])</span>
</code></pre></div>
<p>into</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="nf">let</span>
<span class="p">[</span><span class="nv">p30443</span>
<span class="p">(</span><span class="nf">q/promise</span> <span class="p">(</span><span class="k">fn </span><span class="p">[]</span> <span class="nv">foo</span><span class="p">))]</span>
<span class="p">(</span><span class="nb">map </span><span class="o">@</span><span class="nv">p30443</span> <span class="p">[</span><span class="mi">1</span> <span class="mi">2</span> <span class="mi">3</span><span class="p">]))</span>,
</code></pre></div>
<p>We really need a completely new map, which doesn't turn the function itself
into a promise, but instead creates promises for each <em>application</em> of the function,
<code>deref</code>ing them in bulk after they've all been launched.</p>
<p>Haxl and friends have this problem too. You can see the explicit <code>.traverse</code> in the
Scala example above, and Haxl actually implements <code>mapM</code> for their fetch class as
<code>traverse</code>. In our case, since we've got the whole code form to play with, we can keep
the same name, but only bother transforming it if necessary.</p>
<p>We start with our <code>par-to-bindings</code> utility, after which we'll know if there's
anything parallel going on:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn- </span><span class="nv">parallelize-map</span> <span class="p">[[</span><span class="nv">_</span> <span class="o">&</span> <span class="nv">forms</span><span class="p">]</span> <span class="nv">s2p</span><span class="p">]</span>
<span class="p">(</span><span class="k">let </span><span class="p">[[</span><span class="nv">ps</span> <span class="nv">bs</span> <span class="p">[</span><span class="nv">f</span> <span class="o">&</span> <span class="nv">args</span><span class="p">]]</span> <span class="p">(</span><span class="nf">par-to-bindings</span> <span class="nv">forms</span> <span class="nv">s2p</span><span class="p">)</span>
<span class="nv">m1</span> <span class="p">(</span><span class="nb">if-not </span><span class="p">(</span><span class="ss">:par</span> <span class="p">(</span><span class="nb">first </span><span class="nv">ps</span><span class="p">))</span>
<span class="o">`</span><span class="p">(</span><span class="nb">map </span><span class="o">~</span><span class="nv">f</span> <span class="o">~@</span><span class="nv">args</span><span class="p">)</span>
</code></pre></div>
<p>If we do need to parallelize, we do so by <code>map</code>ping twice. First, to create
a sequence of promises, and then to <code>deref</code> them:</p>
<div class="highlight"><pre><span></span><code> <span class="o">`</span><span class="p">(</span><span class="nb">map </span><span class="nv">deref</span>
<span class="p">(</span><span class="nb">doall </span><span class="p">(</span><span class="nb">map </span><span class="p">(</span><span class="k">fn </span><span class="p">[</span><span class="o">&</span> <span class="nv">xs#</span><span class="p">]</span>
<span class="p">(</span><span class="nf">q/promise</span> <span class="p">(</span><span class="k">fn </span><span class="p">[]</span> <span class="p">(</span><span class="nb">apply </span><span class="o">~</span><span class="nv">f</span> <span class="nv">xs#</span><span class="p">))))</span>
<span class="o">~@</span><span class="nv">args</span> <span class="p">))))]</span>
<span class="p">{</span><span class="ss">:form</span> <span class="p">(</span><span class="k">if </span><span class="p">(</span><span class="nb">seq </span><span class="nv">bs</span><span class="p">)</span> <span class="o">`</span><span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="o">~@</span><span class="nv">bs</span><span class="p">]</span> <span class="o">~</span><span class="nv">m1</span><span class="p">)</span> <span class="nv">m1</span><span class="p">)</span> <span class="ss">:par</span> <span class="p">(</span><span class="nb">some </span><span class="ss">:par</span> <span class="nv">ps</span><span class="p">)}))</span>
</code></pre></div>
<p>Note the <code>doall</code>, without which laziness will resequentialize our masterpieces.</p>
<p>Not surprisingly, parallelizing existing <code>(let ...)</code> forms
is among the more complicated operations, because we need to remember which symbols in
the original code correspond to which promises, causing <code>s2p</code> to evolve as we process.
As usual, "evolve" means a reduction over something, in this case
the original bindings, while accumulating in <code>bs1</code> the bindings of promises to
generated symbols, in <code>bs2</code> the bindings of original symbols to the deref'd promises,
and in <code>s2p</code> the new map of symbol bindings.</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">parallelize-let</span> <span class="p">[[</span><span class="nv">_</span> <span class="nv">bs</span> <span class="o">&</span> <span class="nv">forms</span><span class="p">]</span> <span class="nv">s2p</span><span class="p">]</span>
<span class="p">(</span><span class="k">let </span><span class="p">[[</span><span class="nv">bs1</span> <span class="nv">bs2</span><span class="p">]</span> <span class="p">(</span><span class="nf">reduce</span>
<span class="p">(</span><span class="k">fn </span><span class="p">[[</span><span class="nv">bs1</span> <span class="nv">bs2</span> <span class="nv">s2p</span><span class="p">]</span> <span class="p">[</span><span class="nv">s</span> <span class="nv">form</span><span class="p">]]</span>
</code></pre></div>
<p>We parallelize the righthand side of each original binding, using the current symbol map:</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="k">let </span><span class="p">[{</span><span class="ss">:keys</span> <span class="p">[</span><span class="nv">form</span> <span class="nv">par</span><span class="p">]}</span> <span class="p">(</span><span class="nf">parallelize</span> <span class="nv">form</span> <span class="nv">s2p</span><span class="p">)]</span>
<span class="p">(</span><span class="k">if </span><span class="nv">par</span>
<span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">ch</span> <span class="p">(</span><span class="nb">gensym </span><span class="s">"p"</span><span class="p">)]</span>
</code></pre></div>
<p>Bind the promises:</p>
<div class="highlight"><pre><span></span><code> <span class="p">[(</span><span class="nb">concat </span><span class="nv">bs1</span> <span class="p">[</span><span class="nv">ch</span> <span class="o">`</span><span class="p">(</span><span class="nf">q/promise</span> <span class="p">(</span><span class="k">fn </span><span class="p">[]</span> <span class="o">~</span><span class="nv">form</span><span class="p">))])</span>
</code></pre></div>
<p>Bind original symbols to the deref'd promises:</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="nb">concat </span><span class="nv">bs2</span> <span class="p">[</span><span class="nv">s</span> <span class="o">`</span><span class="p">(</span><span class="nb">deref </span><span class="o">~</span><span class="nv">ch</span><span class="p">)])</span>
</code></pre></div>
<p>Augment the symbol table:</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="nb">assoc </span><span class="nv">s2p</span> <span class="nv">s</span> <span class="nv">ch</span><span class="p">)])</span>
</code></pre></div>
<p>Unless the rhs wasn't parallelized, in which case we paste the binding as-is:</p>
<div class="highlight"><pre><span></span><code> <span class="p">[</span><span class="nv">bs1</span> <span class="p">(</span><span class="nb">concat </span><span class="nv">bs2</span> <span class="p">[</span><span class="nv">s</span> <span class="nv">form</span><span class="p">])</span> <span class="nv">s2p</span><span class="p">])))</span>
<span class="p">[[]</span> <span class="p">[]</span> <span class="nv">s2p</span><span class="p">]</span>
<span class="p">(</span><span class="nf">partition</span> <span class="mi">2</span> <span class="nv">bs</span><span class="p">))</span>
<span class="nv">ps</span> <span class="p">(</span><span class="nf">parallelize-forms</span> <span class="nv">forms</span> <span class="nv">s2p</span><span class="p">)]</span>
<span class="p">{</span><span class="ss">:form</span> <span class="o">`</span><span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="o">~@</span><span class="p">(</span><span class="nb">concat </span><span class="nv">bs1</span> <span class="nv">bs2</span><span class="p">)]</span> <span class="o">~@</span><span class="p">(</span><span class="nb">map </span><span class="ss">:form</span> <span class="nv">ps</span><span class="p">))</span> <span class="ss">:par</span> <span class="p">(</span><span class="nb">or </span><span class="p">(</span><span class="nb">seq </span><span class="nv">bs1</span><span class="p">)</span> <span class="p">(</span><span class="nb">some </span><span class="ss">:par</span> <span class="nv">ps</span><span class="p">))}))</span>
</code></pre></div>
<p>This relies on dispatching in recursions of <code>parallelize</code> to <code>parallelize-subst</code>, which
makes the substitutions dictated by <code>s2p</code></p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn- </span><span class="nv">parallelize-subst</span> <span class="p">[</span><span class="nv">s</span> <span class="nv">s2p</span><span class="p">]</span>
<span class="p">(</span><span class="k">if </span><span class="p">(</span><span class="nb">symbol? </span><span class="nv">s</span><span class="p">)</span>
<span class="p">(</span><span class="k">if </span><span class="p">(</span><span class="nb">contains? </span><span class="nv">s2p</span> <span class="nv">s</span><span class="p">)</span>
<span class="p">{</span><span class="ss">:form</span> <span class="o">`</span><span class="p">(</span><span class="nb">deref </span><span class="o">~</span><span class="p">(</span><span class="nb">get </span><span class="nv">s2p</span> <span class="nv">s</span><span class="p">))</span> <span class="ss">:s2p</span> <span class="nv">s2p</span> <span class="ss">:par</span> <span class="nv">true</span><span class="p">}</span>
</code></pre></div>
<p>and, while we're at it, also marks symbols as parallelizable if they represent suspendable functions:</p>
<div class="highlight"><pre><span></span><code> <span class="p">{</span><span class="ss">:form</span> <span class="nv">s</span> <span class="ss">:par</span> <span class="p">(</span><span class="nf">some-></span> <span class="nv">s</span> <span class="nb">resolve var-get </span><span class="nv">q/suspendable?</span><span class="p">)</span> <span class="ss">:s2p</span> <span class="nv">s2p</span><span class="p">})</span>
<span class="p">{</span><span class="ss">:form</span> <span class="nv">s</span> <span class="ss">:s2p</span> <span class="nv">s2p</span><span class="p">}))</span>
</code></pre></div>
<h1>Are we done?</h1>
<p>If the goal is to mirror the <code>do</code>/<code>mapM</code> style of Haxl, then maybe yes.</p>
<p>With our query faked as a simple Quasar suspendable,</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="nf">q/defsfn</span> <span class="nv">qquery</span> <span class="p">[</span><span class="nv">s</span> <span class="nv">i</span><span class="p">]</span> <span class="p">(</span><span class="nb">println </span><span class="p">(</span><span class="nb">str </span><span class="nv">s</span> <span class="s">"("</span> <span class="nv">i</span> <span class="s">")"</span><span class="p">))</span> <span class="p">(</span><span class="nf">q/sleep</span> <span class="mi">2000</span><span class="p">)</span> <span class="p">(</span><span class="nb">get </span><span class="nv">aanswers</span> <span class="p">[</span><span class="nv">s</span> <span class="nv">i</span><span class="p">]))</span>
</code></pre></div>
<p>(<code>q/sleep</code> is non-blocking in fiber-land), the test program is reasonably pretty:</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="nf">qaxl</span> <span class="p">(</span><span class="nb">map </span><span class="o">#</span><span class="p">(</span><span class="nb">str </span><span class="p">(</span><span class="nf">qquery</span> <span class="s">"getName"</span> <span class="p">(</span><span class="nf">qquery</span> <span class="s">"getBff"</span> <span class="nv">%</span><span class="p">))</span> <span class="p">(</span><span class="nf">qquery</span> <span class="s">"getLastName"</span> <span class="nv">%</span><span class="p">))</span>
<span class="p">(</span><span class="nf">qquery</span> <span class="s">"getIds"</span> <span class="mi">37</span><span class="p">)))</span>
</code></pre></div>
<p>We get</p>
<div class="highlight"><pre><span></span><code><span class="nv">getIds</span><span class="ss">(</span><span class="mi">37</span><span class="ss">)</span>
<span class="o"><</span><span class="k">pause</span><span class="o">></span>
<span class="nv">getLastName</span><span class="ss">(</span><span class="mi">321</span><span class="ss">)</span> <span class="nv">getBff</span><span class="ss">(</span><span class="mi">123</span><span class="ss">)</span> <span class="nv">getBff</span><span class="ss">(</span><span class="mi">321</span><span class="ss">)</span> <span class="nv">getLastName</span><span class="ss">(</span><span class="mi">123</span><span class="ss">)</span>
<span class="o"><</span><span class="k">pause</span><span class="o">></span>
<span class="nv">getName</span><span class="ss">(</span><span class="mi">888</span><span class="ss">)</span> <span class="nv">getName</span><span class="ss">(</span><span class="mi">777</span><span class="ss">)</span>
<span class="o"><</span><span class="k">pause</span><span class="o">></span>
<span class="ss">(</span><span class="s2">"</span><span class="s">MuttTrump</span><span class="s2">"</span> <span class="s2">"</span><span class="s">JeffPutin</span><span class="s2">"</span><span class="ss">)</span>
</code></pre></div>
<p>which is exactly what we want.</p>
<p>In around a
<a href="https://gist.github.com/pnf/2d2fbba570106d4aa14d4820e3260fee">hundred lines</a>
of code, we've handled function application,
<code>map</code> and simple <code>let</code> (minus the fancy destructuring capabilities),
but if the goal is to make arbitrary code fully parallelized simply by
wrapping it in our <code>qaxl</code> macro, there's a long way to go. Handling
the special forms, macros and higher-order functions of the standard
Clojure libraries would be daunting enough. Dealing correctly with
anything we might find in a library may be impossible.</p>
<h1>Takeaways</h1>
<ol>
<li>Launching waves of asynchronous queries is pretty much equivalent
to full-on batching, and if you write non-blocking code for a living,
you've probably been doing it without thinking about it.</li>
<li>The transformation into waves is basically a topological sort.</li>
<li>In Haxl, you sort by repeatedly pruning the leaves of a sequence
of trees emitted by a monadic expression.</li>
<li>In Qaxl, you sort the normal way, by DFS of the tree.</li>
<li>With both, the concurrency is implicit in the structure of the computation, and
there is <em>still no fork.</em></li>
</ol>
<div class="footnote">
<hr>
<ol>
<li id="fn:nodejs">
<p>See, for example, chapter 6 of <a href="https://www.nodejsdesignpatterns.com/">NodeJS Design Patterns</a>. <a class="footnote-backref" href="#fnref:nodejs" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
<li id="fn:sametime">
<p>The <a href="http://developer.couchbase.com/documentation/server/4.0/developer-guide/batching-operations.html">couchbase documentation</a>,
for example, explains that </p>
<blockquote>
<p>Asynchronous clients inherently batch operations: because the
application receives the response at a later stage in the
application, it may attain batching via issuing many requests in
sequence.</p>
</blockquote>
<p>implying that either the server or the client-side API code is doing
something clever when similar queries are made around the same time.
<a href="http://williamedwardscoder.tumblr.com/post/16516763725/how-i-got-massively-faster-db-with-async-batching">This post</a> makes the point more generally. <a class="footnote-backref" href="#fnref:sametime" title="Jump back to footnote 2 in the text">↩</a></p>
</li>
</ol>
</div>Losing my religion: Switching to windows + linux under Hyper-V2016-10-27T00:00:00-04:002016-10-27T00:00:00-04:00Peter Fraenkeltag:blog.podsnap.com,2016-10-27:/windy.html<p>(See <strong>Updates</strong> section at end for latest notes.)</p>
<p><img alt="virgil-n-me" src="./images/dante.jpg"></p>
<p>In the middle of the journey of our life I came to myself within a
dark wood where I was no longer comfortable identifying myself as a
Mac user. Superficially, the relationship with my MacBook air had
been rewarding. It lasted all day, ran the necessary few bits of
commercial software I needed, and otherwise did a great job of
pretending to be running unix. Also, I felt a pleasant sense of
well-moderated individuality when my fellow developers and I exchanged
knowing glances at our discreetly weathered, FP-inflected decal
collections.</p>
<p>But there …</p><p>(See <strong>Updates</strong> section at end for latest notes.)</p>
<p><img alt="virgil-n-me" src="./images/dante.jpg"></p>
<p>In the middle of the journey of our life I came to myself within a
dark wood where I was no longer comfortable identifying myself as a
Mac user. Superficially, the relationship with my MacBook air had
been rewarding. It lasted all day, ran the necessary few bits of
commercial software I needed, and otherwise did a great job of
pretending to be running unix. Also, I felt a pleasant sense of
well-moderated individuality when my fellow developers and I exchanged
knowing glances at our discreetly weathered, FP-inflected decal
collections.</p>
<p>But there is an equivalent, in the realm of our relationships with machinery,
to the discovery that someone we know has seen Cats more than a dozen times or
regards Robert Ludlum as a serious author. For me, that came with the breathless
speculation about an upcoming Mac "refresh," said to feature
an "e-Paper function key row" and better multimedia Siri integration (or something).
This sort of vulgarity did not sit well with my professed enthusiasm for middle-period
Mahler and/or the Oxford comma.</p>
<p>Also, I couldn't follow along with
<a href="http://jvns.ca/categories/strace/">cool posts on strace</a>.</p>
<p>If I had the courage of my convictions, I would just run linux and be
done with it, but hewing that way without revealing myself as a
pretentious idiot would be difficult. To use linux without embarrassment,
one actually needs to know what one is doing, and I do not.</p>
<p>Armed with this self-knowledge, I hatched the idea of selling the Mac and buying
a newish Windows 10 laptop. I would be a contrarian! Not only that, but, since
I don't hang around with anyone who knows anything about developing on Windows,
I could easily spin my accomplishments as impressive.</p>
<h1><img alt="Reality bytes" src="./images/vampiri.jpg"></h1>
<p>Getting anything done on Windows is a bloody fucking nightmare. They made a big deal
recently about <a href="https://msdn.microsoft.com/en-us/commandline/wsl/about">WSL</a>, the "Windows
Subsystem for Linux". It's also known as "bash for windows", which is a grotesquely
stupid name that belies the ambitious goal of executing undiluted linux elf64
binaries by fully emulating the linux kernel interface. Unfortunately,
it <a href="https://github.com/Microsoft/BashOnWindows/issues">doesn't really work</a>,
a fact which is both illustrated and exemplified by
<a href="https://github.com/ethanhs/WSL-Programs">lists</a> of programs you can run.
Yeah, it does a decent job of letting you pipe about text, but, while you
might evaluate "programs you can run" with the 80:20 rule, the decency of an
API is better appraised with the 100:0 rule. Installing a mountain of dependencies
and than discovering that
<a href="https://github.com/Microsoft/BashOnWindows/issues/541">the JVM hangs</a>
is the sort of experience that not even masochists enjoy.</p>
<p>If you are indeed mostly interested in a Potemkin village of apparently functional
unix command-line utilities, there are less ambitious projects like
<a href="http://www.cygwin.com">cygwin</a> and <a href="http://www.mingw.org">MinGW</a>, the latter of which
underlies <a href="https://github.com/git-for-windows/build-extra">git-bash</a>.
Unlike WSL, these do not attempt binary compatibility with actual linux, but require
you to compile especially for the framework. The process for doing so is pretty
well baked by now, so the likelihood that someone else has already run into and
solved the dev-related issue you have is quite high. Generally speaking, google
is your friend.</p>
<p>Sometimes, though, it's a friend in the misery-loves-company sense.
It was, for example, pretty shocking to discover that python
setuptools has been <a href="https://github.com/pypa/setuptools/pull/536">broken under MinGW</a>
for half a year, and nobody really cares.
To install <a href="http://blog.getpelican.com">pelican</a> (without which you would not be
reading this deathless prose), it is actually necessary to patch two different
setuptools source. Moreover, aborted setups will have left your installation in
an inconsistent state that can only be repaired by manually removing files.
(Doubly exasperatingly, another thing you can't install without patching is
virtualenv, so there is truly no way to avoid a day of awfulness.)</p>
<p>Similar problems crop up with other, supposedly OS-independent languages and
tools. For example <a href="http://scalameta.org">scalameta</a> examples don't run.
These issues reflect the broad preference of the development community for
working in linux-like environments. Windows problems get ironed out when its time to
ship to end-users... but who cares about end users?</p>
<h1><a href="http://www.virtualrealityitalia.org/associazione-culturale-vr-italia.php"><img alt="VR" src="./images/vr-italia.jpg"></a></h1>
<p>If WSL, MinGW and Cygwin are a little half assed, perhaps virtualization can
provide the full buttock. The remainder of this post is a recipe for getting
linux to run well and continuously enough under Windows 10's built-in "Hyper-V"
that you truly have the best of both worlds. It's a little complicated.</p>
<h2>Installing Hyper-V</h2>
<p>Read about <a href="https://msdn.microsoft.com/virtualization/hyperv_on_windows/quick_start/walkthrough_compatibility">requirements</a> first.
In summary, you can't be running the cheaper "Home" variety of Windows, and <code>systeminfo.exe</code>
must report that your hardware supports virtualization.</p>
<p>If you pass those hurdles, may need to make sure that virtualization is enabled
in your BIOS. Then you can enable it in Windows from the "Turn Windows features
on or off" screen, and reboot.</p>
<h2>Setting up a network environment</h2>
<p>You will want your linux installation to be able to communicate both with the
outside world and with the windows host. What seems to be the easiest way to
accomplish this is to create two "virtual switches", which will show up as two
interfaces in linux.</p>
<p>Run the "Hyper-V Manager" and from the right-hand pane launch the "Virtual Switch
Manager." Create two switches.</p>
<ol>
<li>Call the first "External Virtual Switch" and select an "External network", connected
to your host's main network adapter.</li>
<li>Call the second "Internal Virtual Switch" and select "Internal network" (<em>not</em>
"Private network"; that's a network that not even the host can see).</li>
</ol>
<p>It will turn out that this isn't quite enough, but we'll repair it later.</p>
<h2>Create the virtual machine</h2>
<p>Again from the right-hand Hyper-V pane, select "New Virtual Machine" and follow
the wizard. Things to be careful about:</p>
<ol>
<li>You want a "generation 2" VM. Otherwise it's only going to know about quaint
old things like floppies and CD-roms. This setting cannot be changed.</li>
<li>You should disable secure boot, which apparently causes some versions of linux
not to boot.</li>
<li>You do <em>not</em> want dynamic memory, as appealing as that sounds, because there are
memory corruption problems, supposedly related to linux issues. You will be able
to adjust the amount of memory later, but if you're serious about living in
both worlds simultaneously, you'll set it to half the available RAM.</li>
<li>Create a new virtual disk. Have the courage to make it large.</li>
<li>On the networking page, pick the <em>External</em> switch.</li>
<li>Under "Installation options', you might as well enter the location of your
linux .iso.</li>
</ol>
<p>Now, from the lower right pane, start the VM and connect to it (I think one of the
reasons Hyper-V is less resource intensive is that it doesn't bother emulating the
display unless you're connected). Go through the full installation process, with
as many reboots as it requires.</p>
<h2>Fix the networking</h2>
<p>At this point, you have a bridged networking environment. Both the VM and your host
have DHCP leases from the network you're on. To connect from the host to the guest
and vice-versa, you'll need to know the IP addresses you've been given, which will
be a bit awkward (and it will be impossible to maintain a connection when you're away
from WiFi).</p>
<p>Shut down the linux guest, and select its "Settings". Choose to "add hardware", and
add the internal network switch. You'll also find an "automatic start" option, and
it will be best to disable this entirely.</p>
<p>Now, in the Windows control panel, find the
"Network Connections" page. You'll see a bewildering variety of little networky
icons, two of which will have the names you gave to your virtual switches.
Select "Properties" for the internal switch, highlight the "Internet Protocol Version 4"
line, and select "Properties" again. Set a static IP address with a nice ring to it,
preferably not one starting 192.168.1, as that will someday soon clash with a
dynamic IP address you're handed on another network. The mask should be 255.255.255.0,
and you can leave gateway blank.</p>
<p>Boot the guest; <code>ifconfig</code> should now show you two <code>eth</code> interfaces. Of those, one
will have an IP address and the other won't. Whichever one doesn't have an address is
the internal switch; note its hardware address.
Depending on your distro, the next steps will be different.
On Ubuntu, you'll right-click on the network icon on the menu bar and "Edit
connections." There will be two connections, with unhelpful names. Open them in turn
to find the one with the hardware device ID of the internal switch. Then, on that
connection's IPv4 tab, manually set an address on the same subnet as you chose
on windows (and the same subnet mask).</p>
<p>Verify now that you can ping the linux guest from the windows host. You might also
install openssh-server and verify that you can ssh to the linux guest. Most likely,
you will not be able to ping the windows guest from linux, however. This is due
to the stupidity of the windows firewall.</p>
<h2>Fix the firewall</h2>
<p>Windows supports the notion of a network "profile", which can be either "private" or
"public". When you connect to a network for the first time, you're given the option
of allowing network discovery, and if you say yes, you'll have designated the network
as private. Otherwise, it's public.</p>
<p>The Windows firewall can be turned on and off separately for public and private
networks. You certainly don't want to turn it off for public networks, but turning
it off for private networks is an acceptable compromise. That can be done from the control
panel in a more or less obvious way.</p>
<p>You will find, with some annoyance, that Windows had decided to designate your
internal switch as a public network. Changing this is incredibly difficult and will
require the Windows abomination known as PowerShell. From the start menu (or whatever
they call it these days), find PowerShell, right-click it and launch as administrator.
Within PowerShell, type <code>Get-NetConnectionProfile</code>. You should see your external
and internal switches; at least the internal one will show "NetworkCategory: Public".</p>
<p>You can change this state of affairs with something like</p>
<div class="highlight"><pre><span></span><code>Set-NetConnectionProfile -InterfaceAlias "vEthernet (Internal Virtual Switch) 2" -NetworkCategory Private
</code></pre></div>
<p>(using the exact InterfaceAlias listed by the previous command).</p>
<p>This is nice, but you'll find that it gets reset every time you boot. We thus want
to fix it every time we boot. Find a nice directory and create <code>SetInternalPrivate.ps1</code>,
containing this nonsense (adjusting the first line as appropriate):</p>
<div class="highlight"><pre><span></span><code><span class="x">$log = "c:\Users\YOU\wherever\SetInternalPrivate.log"</span>
<span class="x">New-Item -ItemType "File" $log -Force</span>
<span class="x">Get-Date | Out-File $log -Append</span>
<span class="x">Import-Module NetConnection</span>
<span class="x">Get-Module | Out-File $log -Append</span>
<span class="x">$profiles = Get-NetConnectionProfile | Where-Object </span><span class="cp">{</span><span class="nv">$_.InterfaceAlias</span> <span class="o">-</span><span class="na">like</span> <span class="s2">"*Internal Virtual Switch*"</span><span class="cp">}</span><span class="x"></span>
<span class="x">$profiles | ForEach-Object </span><span class="cp">{</span><span class="nv">$_</span> <span class="o">|</span> <span class="na">Set</span><span class="o">-</span><span class="na">NetConnectionProfile</span> <span class="o">-</span><span class="na">NetworkCategory</span> <span class="s2">"Private"</span><span class="cp">}</span><span class="x"></span>
<span class="x">$profiles | Out-File $log -Append</span>
<span class="x">Get-NetConnectionProfile | Out-File $log -Append</span>
<span class="x">Set-NetFirewallProfile -Profile Private -Enabled False | Out-File $log -Append</span>
<span class="x">Get-NetFirewallProfile | Out-File $log -Append</span>
</code></pre></div>
<p>The last line makes sure the firewall is off for private networks - another setting
that seems to get undone by Windows on your behalf with startling frequency.
If you run all of this in your administrator PowerShell window, you should now be
able to ping the host from the guest.</p>
<p>In the same directory, create a <code>SetInternalPrivate.bat</code> containing</p>
<div class="highlight"><pre><span></span><code> powershell.exe -Command c:\Users\YOU\wherever\setInternalPrivate.ps1
</code></pre></div>
<p>Now find Task Scheduler in the start menu, right-click and run as administrator.
There's a little folder structure on the left; create a new folder for yourself.
In that folder, Create a Basic Task, triggered "When I log on", to "Start a Program" and
choose your <code>.bat</code> file. Immediately select the new task's properties.</p>
<p>There are a few possible ways forward now, but the most robust seems to be the following.
Under security options, run it as yourself and "Run only when the user is logged on", but
check "Run with highest privileges." On the Conditions tab, you want to uncheck
"Start the task only if the computer is on AC power" (you can imagine how much fun I
had discovering that).</p>
<p>Now reboot the windows host. If all has gone well, when you log in,
you'll see a black command window pop up to run the script, and you should then
be able to verify that the internal switch is private and that the firewall is
off for private networks.</p>
<p>From the Hyper-V console, start the VM. The reason we disabled auto-startup is that
we wanted to make sure it got started after the network was fixed.</p>
<h2>Sharing is caring</h2>
<p>On Windows, make sure that your home directory, or some other directory is
shared, and shared only with you. (I.e. not literally "shared", but "made available
in some sense that we don't wish to describe using the English language".)</p>
<p>On linux, we invoke the ritual incantations to mount this shared drive. The
first horror is that we'll have to store our windows password in plain-text.
Create something like <code>/home/you/.credentials-cifs</code> containing</p>
<div class="highlight"><pre><span></span><code>username=you
password=topsecret
</code></pre></div>
<p>and chmod it to 0600. (Hopefully, you chose to encrypt your
windows disk with bitlocker, because otherwise, yes, your password is
potentially visible to anyone with physical access to your computer.)</p>
<p>Now (<code>sudo</code>'d) edit <code>/etc/fstab</code> and add</p>
<div class="highlight"><pre><span></span><code>//192.168.19.100/Users/you /media/win cifs uid=you,credentials=/home/you/.credentials-cifs,iocharset=utf8,sec=ntlm 0 0
</code></pre></div>
<p>Barring typos, you should now be able to <code>sudo mount -a</code> and see your windows
home directory under /media/win.</p>
<p><img alt="la dolce vita" src="./images/la-dolce-vita-23.jpg"></p>
<p>Once the VM is launched, I find that everything else just works.
I usually don't bother to
"Connect" to the VM at all, but just keep a putty window open, with tmux running on
the guest. Because of the internal switch, I can remain connected across
suspensions and multiple wifi networks. The one main change to my development
workflow is that I run emacs in text mode (under tmux), as I haven't found
a glitch-free combination of emacs builds and windows X-servers, but I've found
I actually prefer it this way.</p>
<h1>Updates</h1>
<h2>Sun Oct 30 14:42:27 EDT 2016</h2>
<p>It is suggested that one not select "secure boot" when setting up the VM. Apparently, some versions of
linux will not boot. I obeyed and didn't investigate.</p>
<p>There is a trove of perplexing information on Ubuntu under Hyper-V in
<a href="https://technet.microsoft.com/en-us/windows-server-docs/compute/hyper-v/supported-ubuntu-virtual-machines-on-hyper-v">this technet article</a>
After reading it, I switched to the supposedly lighter weight virtual kernel:</p>
<div class="highlight"><pre><span></span><code> sudo apt-get install linux-virtual-lts-xenial
sudo apt-get install linux-tools-virtual-lts-xenial linux-cloud-tools-virtual-lts-xenial
</code></pre></div>
<p>Despite this, <code>uname</code> still reports <code>4.4.0-45-generic</code>, but nothing seems to be broken.</p>
<p>I've recently been setting X11 forwarding in putty and running the <a href="https://sourceforge.net/projects/vcxsrv/">VcXrv</a>
X11 server, which seems to be XMing but compiled directly with VC++ and supposedly more performant. It seems to work.</p>
<p>After some research and fiddling, I discovered that exported X11 <code>emacs</code> will render properly if you</p>
<div class="highlight"><pre><span></span><code> <span class="nb">export</span> <span class="nv">LIBGL_ALWAYS_INDIRECT</span><span class="o">=</span><span class="m">1</span>
</code></pre></div>
<p>It still produces a lot of <code>GLib</code> and <code>GTk</code> warnings on the console.</p>Scala Partial Functions Are Disgusting and Unnecessary2016-06-28T00:00:00-04:002016-06-28T00:00:00-04:00Peter Fraenkeltag:blog.podsnap.com,2016-06-28:/impartial.html<div class="highlight"><pre><span></span><code>Thou whoreson zed! thou unnecessary letter!
- King Lear, 2.2.61
</code></pre></div>
<p><a href="https://en.wikipedia.org/wiki/Partial_function">Partial functions</a>
are great in theory. By "great in theory", I mean that the basic idea is pretty simple but with very little additional
research you can make weighty pronouncements involving the words "in" and "theory", as in, "in category theory, something something surjective algebra
manifold."</p>
<p>The simplest example of a partial function is something like the logarithm, which isn't defined for arguments less than or equal to
zero. Of course, it's also not defined for arguments that are strings or colors or egg-laying mammals, a restriction we usually …</p><div class="highlight"><pre><span></span><code>Thou whoreson zed! thou unnecessary letter!
- King Lear, 2.2.61
</code></pre></div>
<p><a href="https://en.wikipedia.org/wiki/Partial_function">Partial functions</a>
are great in theory. By "great in theory", I mean that the basic idea is pretty simple but with very little additional
research you can make weighty pronouncements involving the words "in" and "theory", as in, "in category theory, something something surjective algebra
manifold."</p>
<p>The simplest example of a partial function is something like the logarithm, which isn't defined for arguments less than or equal to
zero. Of course, it's also not defined for arguments that are strings or colors or egg-laying mammals, a restriction we usually discuss in
terms of types. And one can discuss a function's domain in <a href="https://en.wikipedia.org/wiki/Dependent_type">terms of type</a>, but we
<a href="https://stackoverflow.com/questions/12935731/any-reason-why-scala-does-not-explicitly-support-dependent-types/12937819">usually</a>
choose not to.</p>
<p>In Scala, when you write a normal function, like <code>d: Double => Math.log(d)</code>, you get a <code>Function</code>, which is just a class that happens to have an <code>apply</code> method</p>
<div class="highlight"><pre><span></span><code> <span class="k">object</span> <span class="nc">loggy</span> <span class="k">extends</span> <span class="nc">Function1</span><span class="p">[</span><span class="nc">Double</span><span class="p">,</span><span class="nc">Double</span><span class="p">]</span> <span class="p">{</span>
<span class="k">def</span> <span class="nf">apply</span><span class="p">(</span><span class="n">d</span><span class="p">:</span> <span class="nc">Double</span><span class="p">):</span> <span class="nc">Double</span> <span class="o">=</span> <span class="nc">Math</span><span class="p">.</span><span class="n">log</span><span class="p">(</span><span class="n">d</span><span class="p">)</span>
<span class="p">}</span>
</code></pre></div>
<p>and a <code>PartialFunction</code> is just like this, except you can ask where it is defined:</p>
<div class="highlight"><pre><span></span><code> <span class="k">object</span> <span class="nc">ploggy</span> <span class="k">extends</span> <span class="nc">PartialFunction</span><span class="p">[</span><span class="nc">Double</span><span class="p">,</span><span class="nc">Double</span><span class="p">]</span> <span class="p">{</span>
<span class="k">def</span> <span class="nf">apply</span><span class="p">(</span><span class="n">d</span><span class="p">:</span> <span class="nc">Double</span><span class="p">):</span> <span class="nc">Double</span> <span class="o">=</span> <span class="nc">Math</span><span class="p">.</span><span class="n">log</span><span class="p">(</span><span class="n">d</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">isDefinedAt</span><span class="p">(</span><span class="n">d</span><span class="p">:</span> <span class="nc">Double</span><span class="p">):</span> <span class="nc">Boolean</span> <span class="o">=</span> <span class="n">d</span> <span class="o">></span> <span class="mf">0.0</span>
<span class="p">}</span>
</code></pre></div>
<p>If you just call <code>ploggy(0.0)</code>, you'll get the usual floating point exception, but Scala gives you other, spiffier ways to invoke it, such as</p>
<div class="highlight"><pre><span></span><code> <span class="n">ploggy</span><span class="p">.</span><span class="n">applyOrElse</span><span class="p">(</span><span class="mf">0.0</span><span class="p">,</span> <span class="mi">42</span><span class="o">*</span><span class="n">_</span><span class="p">)</span>
</code></pre></div>
<p>where we've taken the small liberty of assuming that if you passed a negative argument to our log function, you probably meant to multiply by 42
instead, and</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="n">ploggy</span> <span class="n">orElse</span> <span class="n">anotherPartialFunction</span><span class="p">)(</span><span class="mf">0.0</span><span class="p">)</span>
</code></pre></div>
<p>where our backup function is also partial.</p>
<p>Those formulations are not commonly used. More popular is the <code>collect</code> combinator</p>
<div class="highlight"><pre><span></span><code> <span class="nc">Seq</span><span class="p">(</span><span class="o">-</span><span class="mf">2.0</span><span class="p">,</span><span class="mf">1.0</span><span class="p">).</span><span class="n">collect</span><span class="p">(</span><span class="n">ploggy</span><span class="p">)</span>
</code></pre></div>
<p>which combines a <code>filter</code> on <code>.isDefinedAt</code> and <code>map</code> over what's left, yielding <code>Seq(0.0)</code> in this case. It is even more common to see <code>collect</code>
used with a match-like syntax</p>
<div class="highlight"><pre><span></span><code> <span class="nc">Some</span><span class="p">(</span><span class="mi">0</span><span class="p">).</span><span class="n">collect</span> <span class="p">{</span>
<span class="k">case</span> <span class="mi">1</span> <span class="o">=></span> <span class="s">"Hello"</span>
<span class="k">case</span> <span class="mi">2</span> <span class="o">=></span> <span class="s">"Goodbye"</span>
<span class="p">}</span>
</code></pre></div>
<p>here applied to an <code>Option</code> rather than a sequence and in this case expected to return <code>None</code>.</p>
<p>If we ask politely,</p>
<div class="highlight"><pre><span></span><code> scala -Xprint:typer -e <span class="s1">'Some(1).collect { case 1 => "Hello"; case 2 => "Goodbye"}'</span>
</code></pre></div>
<p>Scala will reveal the trained hippopotamus behind the curtain:</p>
<div class="highlight"><pre><span></span><code> <span class="p">...</span>
<span class="n">scala</span><span class="p">.</span><span class="nc">Some</span><span class="p">.</span><span class="n">apply</span><span class="p">[</span><span class="nc">Int</span><span class="p">](</span><span class="mi">1</span><span class="p">).</span><span class="n">collect</span><span class="p">[</span><span class="nc">String</span><span class="p">](({</span>
<span class="nd">@SerialVersionUID</span><span class="p">(</span><span class="n">value</span> <span class="o">=</span> <span class="mi">0</span><span class="p">)</span> <span class="k">final</span> <span class="o"><</span><span class="n">synthetic</span><span class="o">></span> <span class="k">class</span> <span class="nc">$anonfun</span> <span class="k">extends</span> <span class="n">scala</span><span class="p">.</span><span class="n">runtime</span><span class="p">.</span><span class="nc">AbstractPartialFunction</span><span class="p">[</span><span class="nc">Int</span><span class="p">,</span><span class="nc">String</span><span class="p">]</span> <span class="k">with</span> <span class="nc">Serializable</span> <span class="p">{</span>
<span class="k">def</span> <span class="nf"><</span><span class="n">init</span><span class="o">></span><span class="p">():</span> <span class="o"><</span><span class="n">$anon</span><span class="p">:</span> <span class="nc">Int</span> <span class="o">=></span> <span class="nc">String</span><span class="o">></span> <span class="o">=</span> <span class="p">{</span>
<span class="n">$anonfun</span><span class="p">.</span><span class="bp">super</span><span class="p">.</span><span class="o"><</span><span class="n">init</span><span class="o">></span><span class="p">();</span>
<span class="p">()</span>
<span class="p">};</span>
<span class="k">final</span> <span class="k">override</span> <span class="k">def</span> <span class="nf">applyOrElse</span><span class="p">[</span><span class="nc">A1</span> <span class="o"><:</span> <span class="nc">Int</span><span class="p">,</span> <span class="nc">B1</span> <span class="o">>:</span> <span class="nc">String</span><span class="p">](</span><span class="n">x1</span><span class="p">:</span> <span class="nc">A1</span><span class="p">,</span> <span class="n">default</span><span class="p">:</span> <span class="nc">A1</span> <span class="o">=></span> <span class="nc">B1</span><span class="p">):</span> <span class="nc">B1</span> <span class="o">=</span> <span class="p">((</span><span class="n">x1</span><span class="p">.</span><span class="k">asInstanceOf</span><span class="p">[</span><span class="nc">Int</span><span class="p">]:</span> <span class="nc">Int</span><span class="p">):</span> <span class="nc">Int</span> <span class="nd">@unchecked</span><span class="p">)</span> <span class="k">match</span> <span class="p">{</span>
<span class="k">case</span> <span class="mi">1</span> <span class="o">=></span> <span class="s">"Hello"</span>
<span class="k">case</span> <span class="mi">2</span> <span class="o">=></span> <span class="s">"Goodbye"</span>
<span class="k">case</span> <span class="p">(</span><span class="n">defaultCase$</span> <span class="o">@</span> <span class="n">_</span><span class="p">)</span> <span class="o">=></span> <span class="n">default</span><span class="p">.</span><span class="n">apply</span><span class="p">(</span><span class="n">x1</span><span class="p">)</span>
<span class="p">};</span>
<span class="k">final</span> <span class="k">def</span> <span class="nf">isDefinedAt</span><span class="p">(</span><span class="n">x1</span><span class="p">:</span> <span class="nc">Int</span><span class="p">):</span> <span class="nc">Boolean</span> <span class="o">=</span> <span class="p">((</span><span class="n">x1</span><span class="p">.</span><span class="k">asInstanceOf</span><span class="p">[</span><span class="nc">Int</span><span class="p">]:</span> <span class="nc">Int</span><span class="p">):</span> <span class="nc">Int</span> <span class="nd">@unchecked</span><span class="p">)</span> <span class="k">match</span> <span class="p">{</span>
<span class="k">case</span> <span class="mi">1</span> <span class="o">=></span> <span class="kc">true</span>
<span class="k">case</span> <span class="mi">2</span> <span class="o">=></span> <span class="kc">true</span>
<span class="k">case</span> <span class="p">(</span><span class="n">defaultCase$</span> <span class="o">@</span> <span class="n">_</span><span class="p">)</span> <span class="o">=></span> <span class="kc">false</span>
<span class="p">}</span>
<span class="p">};</span>
<span class="k">new</span> <span class="n">$anonfun</span><span class="p">()</span>
<span class="p">}:</span> <span class="nc">PartialFunction</span><span class="p">[</span><span class="nc">Int</span><span class="p">,</span><span class="nc">String</span><span class="p">]))</span>
<span class="p">};</span>
<span class="p">...</span>
</code></pre></div>
<p>This is no ordinary hippopotamus. The scala typer has divined the presence of a match block where a <code>PartialFunction</code> is expected</p>
<div class="highlight"><pre><span></span><code> <span class="k">case</span> <span class="nc">Match</span><span class="p">(</span><span class="n">sel</span><span class="p">,</span> <span class="n">cases</span><span class="p">)</span> <span class="k">if</span> <span class="p">(</span><span class="n">sel</span> <span class="n">ne</span> <span class="nc">EmptyTree</span><span class="p">)</span> <span class="o">&&</span> <span class="p">(</span><span class="n">pt</span><span class="p">.</span><span class="n">typeSymbol</span> <span class="o">==</span> <span class="nc">PartialFunctionClass</span><span class="p">)</span> <span class="o">=></span> <span class="p">...</span>
</code></pre></div>
<p>and replicated our innocuous looking switch clause into two different methods of a brand new class definition. It feels imbalanced,
a conflation of concerns, that the central typing logic of the compiler is in the business of extending one very specific class in
<code>scala-library.jar</code>.</p>
<h1>Oh Snozzcumbers</h1>
<div class="highlight"><pre><span></span><code> ‘It’s disgusterous!, the BFG gurgled. ‘It’s sickable! It’s
rotsome! It’s maggotwise! Try it yourself, this foulsome
partial function!'
‘No, thank you,‘ Sophie said, backing away. ‘It’s all you’re
going to be guzzling around here from now on so you might as well
get used to it,’ said the BFG. ‘Go on, you snipsy little winkle,
have a go!’
Sophie took a small nibble. ‘Ugggggggh!’ she spluttered. ‘Oh no!
Oh gosh! Oh help!’ She spat it out quickly. ‘It tastes of
frogskins!’ she gasped. ‘And rotten fish!’
‘Worse than that!’ cried the BFG, roaring with laughter. ‘To me it
is tasting of clockcoaches and slime-wanglers!’
</code></pre></div>
<p>And yet Dahl was writing in the context of an alternative to cannibalism, while partial functions are an alternative to maybe... well, an alternative to
<code>Maybe</code>, or <code>Option</code> as we Scalactites call it. <code>Some(1).flatMap{case 1 => Some("Hello"); case 2 => Some("Goodbye"); case _ => None}</code> emerges
from the typer without significant change:</p>
<div class="highlight"><pre><span></span><code> <span class="n">scala</span><span class="p">.</span><span class="nc">Some</span><span class="p">.</span><span class="n">apply</span><span class="p">[</span><span class="nc">Int</span><span class="p">](</span><span class="mi">1</span><span class="p">).</span><span class="n">flatMap</span><span class="p">[</span><span class="nc">String</span><span class="p">](((</span><span class="n">x0$1</span><span class="p">:</span> <span class="nc">Int</span><span class="p">)</span> <span class="o">=></span> <span class="n">x0$1</span> <span class="k">match</span> <span class="p">{</span>
<span class="k">case</span> <span class="mi">1</span> <span class="o">=></span> <span class="n">scala</span><span class="p">.</span><span class="nc">Some</span><span class="p">.</span><span class="n">apply</span><span class="p">[</span><span class="nc">String</span><span class="p">](</span><span class="s">"Hello"</span><span class="p">)</span>
<span class="k">case</span> <span class="mi">2</span> <span class="o">=></span> <span class="n">scala</span><span class="p">.</span><span class="nc">Some</span><span class="p">.</span><span class="n">apply</span><span class="p">[</span><span class="nc">String</span><span class="p">](</span><span class="s">"Goodbye"</span><span class="p">)</span>
<span class="k">case</span> <span class="n">_</span> <span class="o">=></span> <span class="n">scala</span><span class="p">.</span><span class="nc">None</span>
<span class="p">}))</span>
</code></pre></div>
<p>I'm not being entirely fair, because the lambda is going to get turned into a <code>Function1</code> during the uncurry phase,</p>
<div class="highlight"><pre><span></span><code> scala -Xprint:uncurry -e <span class="s1">'Some(1).flatMap {case 1 => Some("Hello"); case 2 => Some("Goodbye"); case _ => None}'</span>
</code></pre></div>
<div class="highlight"><pre><span></span><code> <span class="nc">Some</span><span class="p">[</span><span class="nc">Int</span><span class="p">](</span><span class="mi">1</span><span class="p">).</span><span class="n">flatMap</span><span class="p">[</span><span class="nc">String</span><span class="p">]({</span>
<span class="nd">@SerialVersionUID</span><span class="p">(</span><span class="n">value</span> <span class="o">=</span> <span class="mi">0</span><span class="p">)</span> <span class="k">final</span> <span class="o"><</span><span class="n">synthetic</span><span class="o">></span> <span class="k">class</span> <span class="nc">$anonfun</span> <span class="k">extends</span> <span class="n">scala</span><span class="p">.</span><span class="n">runtime</span><span class="p">.</span><span class="nc">AbstractFunction1</span><span class="p">[</span><span class="nc">Int</span><span class="p">,</span><span class="nc">Option</span><span class="p">[</span><span class="nc">String</span><span class="p">]]</span> <span class="k">with</span> <span class="nc">Serializable</span> <span class="p">{</span>
<span class="k">def</span> <span class="nf"><</span><span class="n">init</span><span class="o">></span><span class="p">():</span> <span class="o"><</span><span class="n">$anon</span><span class="p">:</span> <span class="nc">Int</span> <span class="o">=></span> <span class="nc">Option</span><span class="p">[</span><span class="nc">String</span><span class="p">]</span><span class="o">></span> <span class="o">=</span> <span class="p">{</span>
<span class="n">$anonfun</span><span class="p">.</span><span class="bp">super</span><span class="p">.</span><span class="o"><</span><span class="n">init</span><span class="o">></span><span class="p">();</span>
<span class="p">()</span>
<span class="p">};</span>
<span class="k">final</span> <span class="k">def</span> <span class="nf">apply</span><span class="p">(</span><span class="n">x0$1</span><span class="p">:</span> <span class="nc">Int</span><span class="p">):</span> <span class="nc">Option</span><span class="p">[</span><span class="nc">String</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span>
<span class="k">case</span> <span class="o"><</span><span class="n">synthetic</span><span class="o">></span> <span class="kd">val</span> <span class="n">x1</span><span class="p">:</span> <span class="nc">Int</span> <span class="o">=</span> <span class="n">x0$1</span><span class="p">;</span>
<span class="n">x1</span> <span class="k">match</span> <span class="p">{</span>
<span class="k">case</span> <span class="mi">1</span> <span class="o">=></span> <span class="k">new</span> <span class="nc">Some</span><span class="p">[</span><span class="nc">String</span><span class="p">](</span><span class="s">"Hello"</span><span class="p">)</span>
<span class="k">case</span> <span class="mi">2</span> <span class="o">=></span> <span class="k">new</span> <span class="nc">Some</span><span class="p">[</span><span class="nc">String</span><span class="p">](</span><span class="s">"Goodbye"</span><span class="p">)</span>
<span class="k">case</span> <span class="n">_</span> <span class="o">=></span> <span class="n">scala</span><span class="p">.</span><span class="nc">None</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">};</span>
<span class="p">(</span><span class="k">new</span> <span class="o"><</span><span class="n">$anon</span><span class="p">:</span> <span class="nc">Int</span> <span class="o">=></span> <span class="nc">Option</span><span class="p">[</span><span class="nc">String</span><span class="p">]</span><span class="o">></span><span class="p">():</span> <span class="nc">Int</span> <span class="o">=></span> <span class="nc">Option</span><span class="p">[</span><span class="nc">String</span><span class="p">])</span>
<span class="p">})</span>
</code></pre></div>
<p>While we're spared gratuitous doubling of the core logic, we still get
a bespoke <code>.class</code> file and constructor.</p>
<h1>Lambda functions are the new normal</h1>
<p>But not for long! With
Scala 2.12 and Java 1.8, lambda functions will be implemented with
invoke-dynamic. Under 2.11, we can get a taste of this advanced
feature with <code>-Ydelambdafy:method</code>,</p>
<div class="highlight"><pre><span></span><code> scala -Ydelambdafy:method -Xprint:uncurry -e <span class="s1">'Some(1).flatMap {case 1 => Some("Hello"); case 2 => Some("Goodbye"); case _ => None}'</span>
</code></pre></div>
<div class="highlight"><pre><span></span><code> <span class="nc">Some</span><span class="p">[</span><span class="nc">Int</span><span class="p">](</span><span class="mi">1</span><span class="p">).</span><span class="n">flatMap</span><span class="p">[</span><span class="nc">String</span><span class="p">]({</span>
<span class="k">final</span> <span class="o"><</span><span class="n">artifact</span><span class="o">></span> <span class="k">def</span> <span class="nf">$anonfun</span><span class="p">(</span><span class="n">x0$1</span><span class="p">:</span> <span class="nc">Int</span><span class="p">):</span> <span class="nc">Option</span><span class="p">[</span><span class="nc">String</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span>
<span class="k">case</span> <span class="o"><</span><span class="n">synthetic</span><span class="o">></span> <span class="kd">val</span> <span class="n">x1</span><span class="p">:</span> <span class="nc">Int</span> <span class="o">=</span> <span class="n">x0$1</span><span class="p">;</span>
<span class="n">x1</span> <span class="k">match</span> <span class="p">{</span>
<span class="k">case</span> <span class="mi">1</span> <span class="o">=></span> <span class="k">new</span> <span class="nc">Some</span><span class="p">[</span><span class="nc">String</span><span class="p">](</span><span class="s">"Hello"</span><span class="p">)</span>
<span class="k">case</span> <span class="mi">2</span> <span class="o">=></span> <span class="k">new</span> <span class="nc">Some</span><span class="p">[</span><span class="nc">String</span><span class="p">](</span><span class="s">"Goodbye"</span><span class="p">)</span>
<span class="k">case</span> <span class="n">_</span> <span class="o">=></span> <span class="n">scala</span><span class="p">.</span><span class="nc">None</span>
<span class="p">}</span>
<span class="p">};</span>
<span class="p">((</span><span class="n">x0$1</span><span class="p">:</span> <span class="nc">Int</span><span class="p">)</span> <span class="o">=></span> <span class="n">$anonfun</span><span class="p">(</span><span class="n">x0$1</span><span class="p">))</span>
<span class="p">})</span>
</code></pre></div>
<p>which, again, doesn't look that much different from the original. Now that lambda functions have gone mainstream, now that Brian
Goetz has decreed that they should be pretty and efficient, Scala doesn't
have to try so hard to fake them.
Specifically, the obvious way to fake them as been hardwired into the JVM - Single Method Interfaces are
a thing. Unfortunately, Double Method Interfaces have not attained thing status, and there's no reason to think they will, because that
would require convincing Brian Goetz that partial functions should be pretty and efficient, and we all know that partial functions are
disgusting and unnecessary.</p>
<p>Where, then the idea come from? Not from Haskell, where the
<a href="https://wiki.haskell.org/Partial_functions">main guidance</a> is on how
to avoid them. Not from Python, where - bless their hearts - they
seem to have
<a href="https://docs.python.org/2/library/functools.html">confused</a> the idea
with currying. Not <a href="http://idris.readthedocs.io/en/latest/tutorial/typesfuns.html#totality">Idris</a>,
whose dependent type system largely avoids the problem by checking the domain at
compile time. We have something over Idris, in that Scala could deal better with - I don't know -
a function defined only where the Riemann zeta function is less than 1/2.</p>
<p>(Pause briefly while you imagine me hunting for 30 minutes for just the right illustration and then
deciding it was silly.)</p>
<h1>TL;DR</h1>
<p>When you're tempted to write a partial function, write a function that returns <code>Option</code> instead.</p>
<p>Here's what Chaucer has to say on the subject:</p>
<div class="highlight"><pre><span></span><code>O blisful God, that art so Iust and trewe!
Lo, how that thou biwreyest mordre alway!
Parcial fonctione wol out, that see we day by day.
Parcial fonctione is so wlatsom and abhominable
To God, that is so Iust and resonable,
- The Nun's Priest's Tale
</code></pre></div>Notes on Horrible Code2016-06-17T00:00:00-04:002016-06-17T00:00:00-04:00Peter Fraenkeltag:blog.podsnap.com,2016-06-17:/horrible.html<p>Recently, I came across some horrible, horrible code. I immediately pasted it into a messaging application and quickly received an
expression of solidarity. With what though? My indignation? Grief? Amusement? I find it suspicious that code I consider horrible
tends to have been written by people I already disliked for some reason.</p>
<p>Here are some things I have complained about recently:</p>
<ol>
<li>Code that was the source of a bug that was in retrospect obvious.</li>
<li>Code that could be deleted with no change to the behavior of the program other than, possibly, a performance improvement.</li>
<li>Extremely inefficient code, perhaps extravagant in …</li></ol><p>Recently, I came across some horrible, horrible code. I immediately pasted it into a messaging application and quickly received an
expression of solidarity. With what though? My indignation? Grief? Amusement? I find it suspicious that code I consider horrible
tends to have been written by people I already disliked for some reason.</p>
<p>Here are some things I have complained about recently:</p>
<ol>
<li>Code that was the source of a bug that was in retrospect obvious.</li>
<li>Code that could be deleted with no change to the behavior of the program other than, possibly, a performance improvement.</li>
<li>Extremely inefficient code, perhaps extravagant in memory use or accidentally elevating the order of complexity.</li>
<li>Extremely efficient code, in one tiny section of the program that made no measurable contribution to overall performance.</li>
<li>Flow control making use of (a) <code>if/else</code>, (b) monadic combinators, or (c) pattern matching, in lieu of one of the other two.</li>
</ol>
<p>Occupying their own special corner of my lizard brain are the following nomenclatural iniquities:</p>
<ol>
<li>Ambiguous or (gasp) misspelled names. I got very upset recently
about a block of code that contained three nearly identically named
variables, differing only in the placement of underscores.</li>
<li>Comical adherence to coding standards. Well, comical to me anyway. I found <code>val userUuid = UUID.randomUUID</code> hilarious.</li>
<li>Symbols that were literally the opposite of the true meaning, for
example a list of excluded things called <code>included</code>.</li>
<li>Symbols that had clearly been repurposed without being renamed, for
example <code>hostname</code> containing a user count.</li>
</ol>
<p>Here are some things that I often do before complaining about horrible code:</p>
<ol>
<li>Googling to make sure that the complaint is legitimate.</li>
<li>Learning from such googling that the complaint is not legitimate and then observing my blood pressure fluctuate in an
interplay of rising embarrassment and receding indignation.</li>
<li>Recalling such painful personal experiences to illustrate my sense of fairness and stiffen my resolve.</li>
<li>Recalling that time when I woke up in a cold sweat the night after noisily pointing out someone's mistake that
turned out either not to be a mistake or to be my mistake and deciding not to send my artfully composed email after all.</li>
<li>But not deleting it from Drafts either.</li>
<li>Realizing that, while I know that the code in question is horrible, I don't understand precisely why, and so spending the rest
of the afternoon reading about catamorphisms.</li>
<li>Realizing and then suppressing the thought that I wrote code that was horrible in a very similar way, actually quite recently.</li>
</ol>
<p>With reference to that last point:</p>
<div class="highlight"><pre><span></span><code>An ignorant person is one who doesn't know what you have just found out.
- Will Rogers
</code></pre></div>
<p>Neurological reality:</p>
<ol>
<li>This code makes me nervous and raises my cortisol levels. I have OCD, for chrissake.</li>
<li>But my genuine surprise is no less wounding than <a href="https://www.recurse.com/manual#no-feigned-surprise">feigned surprise</a>.</li>
</ol>
<p>Miscellany:</p>
<ol>
<li>This mess makes it hard to do my job and may lead to <em>outages</em>!</li>
<li>Seriously, what is the objective importance of either my job or your outages?</li>
<li>It's so <a href="http://xkcd.com/1695/">funny!</a></li>
</ol>
<p>Existential crises after pondering a world where no code could be called horrible anymore:</p>
<ol>
<li>Could code in such a world ever meaningfully be called good - or wonderful? Would code not so
designated be by implication horrible.</li>
<li>But craftsmanship is an objective good, no?</li>
<li>Why? Isn't craftsmanship basically about making nice things for rich people to display ostentatiously? Doesn't
all morality come down to aesthetic preference anyway?</li>
<li>Something that I find important is evidently not important to everyone. This makes me moderately sad.</li>
<li>If two people disagree about what is horrible, can anything they say to each other be called communication?</li>
<li>Glimpses of a chaotic hell. Heat death of the universe.</li>
</ol>Know when to fold 'em: visualizing the streaming, concurrent reduce2016-06-04T00:00:00-04:002016-06-04T00:00:00-04:00Peter Fraenkeltag:blog.podsnap.com,2016-06-04:/reductio.html<p>Holy cow! Can you believe your luck? What better way to spend some portion of whatever day of
the week it is than to think about parallelizing reduction algorithms?</p>
<p>Really? You can't think of one? In that case, I'll kick off our inevitable friendship by contributing a clojure
implementation of a fancy algorithm that asynchronously reduces arbitrary input streams while
preserving order. Then I'll scare you with glimpses of my obsessive quest to visualize what this
algorithm is actually doing; if you hang around, I'll show you what I eventually cobbled together
with <code>clojurescript</code> and <code>reagent</code>. (Hint. It's an <strong>ANIMATION …</strong></p><p>Holy cow! Can you believe your luck? What better way to spend some portion of whatever day of
the week it is than to think about parallelizing reduction algorithms?</p>
<p>Really? You can't think of one? In that case, I'll kick off our inevitable friendship by contributing a clojure
implementation of a fancy algorithm that asynchronously reduces arbitrary input streams while
preserving order. Then I'll scare you with glimpses of my obsessive quest to visualize what this
algorithm is actually doing; if you hang around, I'll show you what I eventually cobbled together
with <code>clojurescript</code> and <code>reagent</code>. (Hint. It's an <strong>ANIMATION!</strong> And <strong>COLORFUL!</strong>)</p>
<p>(If you want to play
with the algorithm or the visualization yourself, it's better to
clone it from <a href="https://github.com/pnf/sreduce">github</a> than to attempt copying and
pasting the chopped up code below.)</p>
<h1>Every hand's a winner</h1>
<p>The first really interesting topic in functional programming is reduction. After absorbing the idea that
mutating state is a "bad thing", you learn that you can have your stateless cake and eat it too.</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="nb">reduce + </span><span class="mi">0</span> <span class="p">(</span><span class="nb">range </span><span class="mi">10</span><span class="p">))</span>
</code></pre></div>
<p>does essentially the same thing as</p>
<div class="highlight"><pre><span></span><code> <span class="kt">int</span> <span class="n">total</span><span class="o">=</span><span class="mi">0</span><span class="p">;</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span><span class="o">=</span><span class="mi">0</span><span class="p">;</span> <span class="n">i</span><span class="o"><</span><span class="mi">10</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="n">total</span> <span class="o">+=</span> <span class="n">i</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">return</span> <span class="n">total</span><span class="p">;</span>
</code></pre></div>
<p>without the fuss of maintaining the running sum explicitly. It becomes second nature to translate this sort
of accumulation operation into a fold, and it's usually <code>foldl</code> - the one where the running accumulant is
always the left-hand side of the reduction operator.</p>
<h2>foldl and foldr considered slightly harmful</h2>
<p>In August 2009, Guy Steele, then at Sun Microsystems, gave a fantastic talk titled
<a href="https://vimeo.com/6624203">Organizing Functional Code for Parallel Execution; or, foldl and foldr Considered Slightly Harmful</a>,
alleging that this beautiful paradigm is hostile to concurrency. He's right. By construction, you can only do one reduction
operation at a time. That has the advantage of being extremely easy to visualize. ASCII art will
do (for now):</p>
<div class="highlight"><pre><span></span><code><span class="mf">1</span> <span class="n">a</span> <span class="n">b</span> <span class="n">c</span> <span class="n">d</span> <span class="n">e</span> <span class="n">f</span> <span class="n">g</span> <span class="n">h</span>
<span class="err">\</span> <span class="o">/</span> <span class="o">/</span> <span class="o">/</span> <span class="o">/</span> <span class="o">/</span> <span class="o">/</span> <span class="o">/</span>
<span class="mf">2</span> <span class="n">ab</span> <span class="o">/</span> <span class="o">/</span> <span class="o">/</span> <span class="o">/</span> <span class="o">/</span> <span class="o">/</span>
<span class="err">\</span> <span class="o">/</span> <span class="o">/</span> <span class="o">/</span> <span class="o">/</span> <span class="o">/</span> <span class="o">/</span>
<span class="mf">3</span> <span class="n">abc</span> <span class="o">/</span> <span class="o">/</span> <span class="o">/</span> <span class="o">/</span> <span class="o">/</span>
<span class="err">\</span> <span class="o">/</span> <span class="o">/</span> <span class="o">/</span> <span class="o">/</span> <span class="o">/</span>
<span class="mf">4</span> <span class="n">abcd</span> <span class="o">/</span> <span class="o">/</span> <span class="o">/</span> <span class="o">/</span>
<span class="err">\</span> <span class="o">/</span> <span class="o">/</span> <span class="o">/</span> <span class="o">/</span>
<span class="mf">5</span> <span class="n">abcde</span> <span class="o">/</span> <span class="o">/</span> <span class="o">/</span>
<span class="err">\</span> <span class="o">/</span> <span class="o">/</span> <span class="o">/</span>
<span class="mf">6</span> <span class="n">abdef</span> <span class="o">/</span> <span class="o">/</span>
<span class="err">\</span> <span class="o">/</span> <span class="o">/</span>
<span class="mf">7</span> <span class="n">abcdefg</span> <span class="o">/</span>
<span class="err">\</span> <span class="o">/</span>
<span class="mf">8</span> <span class="n">abcdefgh</span>
</code></pre></div>
<p>Here, we are concatenating the first 8 letters of the alphabet, and it will take us 7 sequential steps (aka <code>O(n)</code>) to do it.
Steele points out that you can do better if you know in advance that your reduction operator is associative; we can do bits of
the reduction in parallel and then reduce the reduced bits. Concatenation is the ultimate in associative operators; we can grab
any random consecutive sequence and reduce it, then treat the reduction as if it were just another element of input, e.g.</p>
<div class="highlight"><pre><span></span><code> <span class="n">a</span> <span class="n">b</span> <span class="n">c</span> <span class="n">d</span> <span class="n">e</span> <span class="n">f</span> <span class="n">g</span> <span class="n">h</span> <span class="o">--></span> <span class="n">a</span> <span class="n">b</span> <span class="n">c</span> <span class="n">def</span> <span class="n">g</span> <span class="n">h</span> <span class="o">--></span> <span class="n">abc</span> <span class="n">def</span> <span class="n">gh</span> <span class="o">--></span> <span class="n">abcdefgh</span>
</code></pre></div>
<p>Approaching this not so randomly, we can repeatedly divide and conquer</p>
<div class="highlight"><pre><span></span><code><span class="mf">1</span> <span class="n">a</span> <span class="n">b</span> <span class="n">c</span> <span class="n">d</span> <span class="n">e</span> <span class="n">f</span> <span class="n">g</span> <span class="n">h</span>
<span class="err">\</span> <span class="o">/</span> <span class="err">\</span> <span class="o">/</span> <span class="err">\</span> <span class="o">/</span> <span class="err">\</span> <span class="o">/</span>
<span class="mf">2</span> <span class="n">ab</span> <span class="n">cd</span> <span class="n">ef</span> <span class="n">gh</span>
<span class="err">\</span> <span class="o">/</span> <span class="err">\</span> <span class="o">/</span>
<span class="mf">3</span> <span class="n">abcd</span> <span class="n">efgh</span>
<span class="err">\</span> <span class="o">/</span>
<span class="mf">4</span> <span class="n">abcdefgh</span>
</code></pre></div>
<p>now completing in only takes 3 steps (aka <code>O(log n)</code>).
The main thrust of Steele's talk is that you should
use data structures that foster this sort of associative reduction. The minimal requirement of
such a structure is that you're in possession of all its elements, so you know how to divide
them in half.
This may remind you of merge-sort, which is in fact a parallelizable, associative reduction, taking advantage
of the fact that merging is associative.</p>
<h2>associative vs commutative</h2>
<p>Suppose, however, that you're reducing a stream of unknown length. It isn't clear anymore how to divvy
up the inputs. That isn't a problem if our reduction operation is commutative, rather than just
associative. In that case, we can just launch as many reductions as we can, combining new elements as they
become available with reduced values, as they complete. If the reduction operator isn't actually
commutative, accuracy will suffer:</p>
<div class="highlight"><pre><span></span><code><span class="mf">1</span> <span class="n">a</span> <span class="n">b</span> <span class="n">c</span> <span class="n">d</span> <span class="n">e</span> <span class="n">f</span> <span class="n">g</span> <span class="n">h</span>
<span class="err">\</span> <span class="o">/</span> <span class="err">\</span> <span class="o">/</span> <span class="o">/</span> <span class="err">|</span> <span class="err">|</span> <span class="err">|</span>
<span class="mf">2</span> <span class="n">ab</span> <span class="err">\</span> <span class="o">/</span> <span class="o">/</span> <span class="err">|</span> <span class="err">|</span> <span class="err">|</span>
<span class="err">\</span> <span class="n">cd</span> <span class="o">/</span> <span class="err">|</span> <span class="err">|</span> <span class="err">|</span>
<span class="err">\</span> <span class="err">\</span> <span class="o">/</span> <span class="err">|</span> <span class="err">|</span> <span class="err">|</span>
<span class="err">\</span> <span class="o">/</span><span class="err">\</span> <span class="o">/</span> <span class="err">|</span> <span class="err">|</span>
<span class="err">\</span> <span class="o">/</span> <span class="err">\</span> <span class="o">/</span> <span class="err">|</span> <span class="err">|</span>
<span class="mf">3</span> <span class="n">abe</span> <span class="err">\</span> <span class="o">/</span> <span class="o">/</span> <span class="err">|</span>
<span class="err">\</span> <span class="n">cdf</span> <span class="o">/</span> <span class="err">|</span>
<span class="err">\</span> <span class="err">\</span><span class="o">/</span> <span class="err">|</span>
<span class="err">\</span> <span class="o">/</span><span class="err">\</span> <span class="o">/</span>
<span class="mf">4</span> <span class="n">abeg</span> <span class="n">cdfh</span>
<span class="err">\</span> <span class="o">/</span>
<span class="mf">5</span> <span class="n">abegcdfh</span>
</code></pre></div>
<p>To take an extremely practical example,
suppose I needed to keep track of the orientation of my remote-control spaceship (USS Podsnap), which
transmits to me stream of 3D
<a href="https://en.wikipedia.org/wiki/Rotation_matrix">rotation matrices</a>, each representing a course correction.
Matrix multiplication is associative, so a streaming associative reduce is just the ticket. Matrix
multiplication is not, however, commutative, so if I mess up the order I will be lost in space. (Note that
2D rotation matrices - rotation around a single angle - are commutative, so this wouldn't be a problem for my
remote-control wheat combine.)</p>
<p>It seems that I truly need a true streaming, associative reduce -- where order matters, but nobody has told me
when inputs will stop arriving, at what rate they will arrive, or how long the reductions themselves will take.</p>
<h2>streaming associative reduce - a possible algorithm</h2>
<p>Here's a possible approach. We maintain multiple queues, and label them 1, 2, 4 etc., corresponding to
reductions of that many elements. When the very first element arrives, we throw it onto (the empty) queue #1.
When subsequent elements arrive, we check if there's anything at the head of queue #1 and, if so, launch
a reduction with it; otherwise, we put it in queue #1. If we do launch a reduction involving an element from queue #1,
we'll add a placeholder to queue #2, into which the result of the reduction will be placed when it completes.
After a while queue #2 may contain a number of placeholders, some populated with results of completed reductions,
others still pending. As soon as we have two complete reductions at the head of queue #2, we launch a reduction
for them, and put onto queue #4 a placeholder for the result. And so on.</p>
<p>Sometime after the stream finally completes, we'll find ourselves with all queues containing zero or one
reduced value. Because of the way we constructed these queues, we know that any reduction in queue i involves
only inputs elements preceding those involved in reductions in any other queue j<i. Accordingly, we just take
the single values remaining, put them in reverse order of bucket label, and treat them as a new input
series to reduce.</p>
<p>I think we're almost at the limits of ASCII visualization, but let's try anyway. Values in parentheses
below are pending:</p>
<div class="highlight"><pre><span></span><code> 1 2 4
---------- ----------- -------
a
a b
c (ab)
c d ab
e ab (cd)
e f ab cd
g (ef) (abcd)
g h ef abcd
ef (gh) abcd
efgh abcd
abcd efgh
(abcdefgh)
abcdefgh
</code></pre></div>
<p>The actual state at any point in time, is going to depend on </p>
<ol>
<li>When inputs arrive.</li>
<li>The amount of time a reduction takes.</li>
<li>The permitted level of concurrency.</li>
</ol>
<p>Note that you might not actually achieve the permitted level of concurrency, because we don't
yet have two consecutive reductions at the front of a queue.
Suppose that queue 2 looks like this (left is front):</p>
<div class="highlight"><pre><span></span><code><span class="mf">2</span><span class="p">:</span> <span class="p">(</span><span class="n">ab</span><span class="p">)</span> <span class="n">cd</span> <span class="p">(</span><span class="n">ef</span><span class="p">)</span> <span class="n">gh</span>
</code></pre></div>
<p>For some reason reducing a/b and e/f is taking longer than reducing c/d and g/h. Only when
a/b finishes</p>
<div class="highlight"><pre><span></span><code><span class="mi">2</span><span class="o">:</span> <span class="n">ab</span> <span class="n">cd</span> <span class="o">(</span><span class="n">ef</span><span class="o">)</span> <span class="n">gh</span>
</code></pre></div>
<p>can we grab the two head elements and launch a reduction of them in queue #4</p>
<div class="highlight"><pre><span></span><code><span class="mf">2</span><span class="p">:</span> <span class="p">(</span><span class="n">ef</span><span class="p">)</span> <span class="n">gh</span>
<span class="mf">4</span><span class="p">:</span> <span class="p">(</span><span class="n">abcd</span><span class="p">)</span>
</code></pre></div>
<p>Now imagine a case where reductions take essentially no time compared to the
arrival interval. Since we do a new reduction instantly upon receiving a new
element, the algorithm reduces to <code>foldl</code>, plus a decorative binary counter
as we shuffle partial reductions among the queues:</p>
<p><img src="./images/red-in1000-red1-np1.gif"><p/></p>
<p>Now another, where inputs arrive as fast as we take them,
reductions take 1 second, and we can do up to 10 of them at a time.
(<code>n</code> is the actual number of inputs left;
<code>np</code> is the actual number of reductions currently running; green
squares represent the number completed reductions, red the number in flight
;the actual ordering of reductions and placeholders in the queue is not shown):</p>
<p><img src="./images/red-in1-red1000-np10.gif"><p/></p>
<p>See how we immediately fill up queue #2 with
placeholders for 10 reductions, which we replenish as they complete and
spill over into later buckets.</p>
<p>Finally, here's a complicated example: inputs arrive every millisecond
(essentially as fast as we can consume them),
reductions take between 1 and 100ms, and we are willing to run
up to 10 of them in parallel.</p>
<p><img src="./images/reduce-n1000-r100-i1-c100.gif"><p/></p>
<p>It achieves pretty good concurrency, slowing down only during the final cleanup.</p>
<h1>Learn to play it right</h1>
<p>Clojure of course contains <code>reduce</code> (and <code>educe</code> and <code>transduce</code> and... well, we've
<a href="http://blog.podsnap.com/ducers.html">been</a>
<a href="http://blog.podsnap.com/ducers2.html">down</a>
<a href="http://blog.podsnap.com/ducers3.html">that</a>
<a href="http://blog.podsnap.com/ducers4.html">road</a>
<a href="http://blog.podsnap.com/lost-in-translation.html">already</a>), and it even
contains an associative reduce, which sticks it to those stuck-up
Haskellers by calling itself <code>fold</code>.<sup id="fnref:fold"><a class="footnote-ref" href="#fn:fold">1</a></sup>
Our reduce will look like</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">assoc-reduce</span> <span class="p">[</span><span class="nv">f</span> <span class="nv">c-in</span><span class="p">])</span>
</code></pre></div>
<p>where <code>f</code> is a function of two arguments, returning a <code>core.async</code> channel that delivers their reduction and <code>c-in</code> is a channel
of inputs; <code>assoc-reduce</code> returns a channel that will deliver the final reduction. In <a href="http://typedclojure.org">typesprach</a> it
would look like this:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="ss">:forall</span> <span class="p">[</span><span class="nv">A</span><span class="p">]</span> <span class="nv">assoc-reduce</span>
<span class="p">([</span><span class="nv">f</span> <span class="ss">:-</span> <span class="p">(</span><span class="nf">Fn</span> <span class="p">[</span><span class="nv">A</span> <span class="nv">A</span> <span class="nb">-> </span><span class="p">(</span><span class="nf">Chan</span> <span class="nv">A</span><span class="p">)])</span>
<span class="nv">c-in</span> <span class="ss">:-</span> <span class="p">(</span><span class="nf">Chan</span> <span class="nv">A</span><span class="p">)</span>
<span class="p">]</span> <span class="ss">:-</span> <span class="p">(</span><span class="nf">Chan</span> <span class="nv">A</span><span class="p">)))</span>
</code></pre></div>
<p>The central data structure for this algorithm is a queue of place-holders, which I ultimately
implemented as a vector of volatiles. That's a bit of a compromise, as it would be possible to
employ a fully functional data structure, but we can structure our code to localize the impurity.</p>
<p>When launching a reduction, we place new
<code>(volatile! nil)</code> at the
end of the queue where its result is supposed to go, and when the answer comes back,
we <code>reset!</code> the volatile. <em>Crucially</em>, we do not let this resetting occur
asynchronously, because we're going to want to identify all pairs of completed reductions later,
and it would be disruptive for results to teleport in while we're inspecting. Instead,
we'll arrange for each incoming reduction results to be accompanied by the destination <code>volatile</code>,
in which we can choose to place it at a time of our choosing.</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">iq</span> <span class="p">(</span><span class="nb">inc </span><span class="nv">old-queue-number</span><span class="p">)</span>
<span class="nv">v</span> <span class="p">(</span><span class="nf">volatile!</span> <span class="nv">nil</span><span class="p">)</span>
<span class="nv">queues</span> <span class="p">(</span><span class="nf">update-in</span> <span class="nv">queues</span> <span class="p">[</span><span class="nv">iq</span><span class="p">]</span> <span class="nb">conj </span><span class="nv">v</span><span class="p">)]</span> <span class="c1">;; put placeholder volatile on queue</span>
<span class="p">(</span><span class="nf">go</span> <span class="p">(</span><span class="nf">>!</span> <span class="nv">reduction-channel</span> <span class="c1">;; launch reduction asynchronously</span>
<span class="p">[</span><span class="nv">iq</span> <span class="c1">;; queue number</span>
<span class="p">(</span><span class="nf"><!</span> <span class="p">(</span><span class="nf">f</span> <span class="nv">a</span> <span class="nv">b</span><span class="p">)</span> <span class="c1">;; reduction resut</span>
<span class="nv">v</span><span class="p">]])))</span> <span class="c1">;; destination volatile</span>
</code></pre></div>
<p>The main loop now knows exactly where to put the results, and we know exactly when they were
put there. No race conditions here.</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="nf">go-loop</span> <span class="p">[</span><span class="nv">queues</span> <span class="p">{}]</span>
<span class="p">(</span><span class="k">let </span><span class="p">[[</span><span class="nv">iq</span> <span class="nv">r</span> <span class="nv">v</span><span class="p">]</span> <span class="p">(</span><span class="nf"><!</span> <span class="nv">reduction-channel</span><span class="p">)</span>
<span class="nv">_</span> <span class="p">(</span><span class="nf">reset!</span> <span class="nv">v</span> <span class="nv">r</span><span class="p">)</span>
<span class="nv">queues</span> <span class="p">(</span><span class="nf">launch-reductions-using-latest</span> <span class="nv">queues</span><span class="p">)]</span>
<span class="p">(</span><span class="nf">recur</span> <span class="nv">queues</span><span class="p">)))</span>
</code></pre></div>
<p>What then? After a reduction comes back, we may have an opportunity to
launch more, by pulling pairs of reduced values off the the current queue
(this is the operation we didn't want to be disrupted by incoming results)
for further reduction into the next queue:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">launch-reductions</span> <span class="p">[</span><span class="nv">c-redn</span> <span class="nv">f</span> <span class="nv">iq</span> <span class="nv">queue</span><span class="p">]</span>
<span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">pairs</span> <span class="p">(</span><span class="nb">take-while </span><span class="p">(</span><span class="k">fn </span><span class="p">[[</span><span class="nv">a</span> <span class="nv">b</span><span class="p">]]</span> <span class="p">(</span><span class="nb">and </span><span class="nv">a</span> <span class="nv">b</span><span class="p">))</span>
<span class="p">(</span><span class="nf">partition</span> <span class="mi">2</span> <span class="p">(</span><span class="nb">map deref </span><span class="nv">queue</span><span class="p">)))]</span>
<span class="p">(</span><span class="nb">map </span><span class="p">(</span><span class="k">fn </span><span class="p">[[</span><span class="nv">a</span> <span class="nv">b</span><span class="p">]]</span>
<span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">v</span> <span class="p">(</span><span class="nf">volatile!</span> <span class="nv">nil</span><span class="p">)]</span>
<span class="p">(</span><span class="nf">go</span> <span class="p">(</span><span class="nf">>!</span> <span class="nv">c-redn</span> <span class="p">[(</span><span class="nb">inc </span><span class="nv">iq</span><span class="p">)</span> <span class="p">(</span><span class="nf"><!</span> <span class="p">(</span><span class="nf">f</span> <span class="nv">a</span> <span class="nv">b</span><span class="p">))</span> <span class="nv">v</span><span class="p">]))</span>
<span class="nv">v</span><span class="p">))</span> <span class="nv">pairs</span><span class="p">)))</span>
</code></pre></div>
<p>So far, we've thought about what to do with results coming off a reduction channel;
we also have to worry about raw inputs. Life will be a little simpler if we make
the input channel look like the reduction channel, so we map our stream of <code>x</code>s into
<code>[0 x nil]</code>s. One used to do this with <code>(async/map> c-in f)</code>, but that's been deprecated
in favor of channels with built-in transducers, so we'll create one of those and pipe
our input channel to it:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">c-in</span> <span class="p">(</span><span class="nf">pipe</span> <span class="nv">c-in-orig</span> <span class="p">(</span><span class="nf">chan</span> <span class="mi">1</span> <span class="p">(</span><span class="nb">map </span><span class="p">(</span><span class="k">fn </span><span class="p">[</span><span class="nv">x</span><span class="p">]</span> <span class="p">[</span><span class="mi">0</span> <span class="nv">x</span> <span class="nv">nil</span><span class="p">]))))]</span> <span class="nv">...</span><span class="p">)</span>
</code></pre></div>
<p>Then we'll listen with <code>alts!</code> on <code>[c-redn c-in]</code>, taking real or fake reductions
as they arrive.</p>
<p>Actually, it's a little more complicated than that, because we don't want to find
ourselves listening when no results are expected, and we don't want to accept more
inputs when already at maximum parallelization. This means we're going to have to
keep track of a little more state than just the queues. Specifically, we'll keep
<code>c-in</code>, with the convention that its set to nil when closed and <code>np</code>, the total
number of reductions launched:</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="nf">go-loop</span> <span class="p">[{</span><span class="ss">:keys</span> <span class="p">[</span><span class="nv">c-in</span> <span class="nv">queues</span> <span class="nv">np</span><span class="p">]</span> <span class="ss">:as</span> <span class="nv">state</span><span class="p">}</span> <span class="p">{</span><span class="ss">:c-in</span> <span class="nv">c-in</span> <span class="ss">:queues</span> <span class="p">{}</span> <span class="ss">:np</span> <span class="mi">0</span><span class="p">}]</span>
</code></pre></div>
<p>The first thing we do in the loop is build a list of channels (possibly empty - a case we'll
handle a bit further down)</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="nb">if-let </span><span class="p">[</span><span class="nv">cs</span> <span class="p">(</span><span class="nb">seq </span><span class="p">(</span><span class="nb">filter identity </span><span class="p">(</span><span class="nf">list</span>
<span class="p">(</span><span class="k">if </span><span class="p">(</span><span class="nb">pos? </span><span class="nv">np</span><span class="p">)</span> <span class="nv">c-redn</span><span class="p">)</span> <span class="c1">;; include reductions if some are expected</span>
<span class="p">(</span><span class="k">if </span><span class="p">(</span><span class="nb">< </span><span class="nv">np</span> <span class="nv">np-max</span><span class="p">)</span> <span class="nv">c-in</span><span class="p">)</span> <span class="c1">;; include c-in if still open np<np-max</span>
<span class="p">)))]</span>
</code></pre></div>
<p>and listen for our "stuff":</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="k">let </span><span class="p">[[[</span><span class="nv">l</span> <span class="nv">res</span> <span class="nv">v</span><span class="p">]</span> <span class="nv">c</span><span class="p">]</span> <span class="p">(</span><span class="nf">alts!</span> <span class="nv">cs</span><span class="p">)]</span>
</code></pre></div>
<p>The only reason we might get back <code>nil</code> here is that the input channel has been closed,
in which case we record that fact and continue looping:</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="nb">if-not </span><span class="nv">l</span>
<span class="p">(</span><span class="nf">recur</span> <span class="p">(</span><span class="nb">assoc </span><span class="nv">state</span> <span class="ss">:c-in</span> <span class="nv">nil</span><span class="p">))</span>
</code></pre></div>
<p>If we do get back a reduction, we put it in the volatile expecting it,</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">q</span> <span class="p">(</span><span class="k">if </span><span class="nv">v</span>
<span class="p">(</span><span class="k">do </span><span class="p">(</span><span class="nf">vreset!</span> <span class="nv">v</span> <span class="nv">res</span><span class="p">)</span> <span class="p">(</span><span class="nf">queues</span> <span class="nv">l</span><span class="p">))</span> <span class="c1">;; real reduction</span>
<span class="p">(</span><span class="nb">concat </span><span class="p">(</span><span class="nf">queues</span> <span class="mi">0</span><span class="p">)</span> <span class="p">[(</span><span class="nf">volatile!</span> <span class="nv">res</span><span class="p">)]))</span> <span class="c1">;; actually an input</span>
</code></pre></div>
<p>launch as many reductions as we can from pairs at the head of the queue,</p>
<div class="highlight"><pre><span></span><code> <span class="nv">vs</span> <span class="p">(</span><span class="nf">launch-reductions</span> <span class="nv">c-redn</span> <span class="nv">f</span> <span class="nv">l</span> <span class="nv">q</span><span class="p">)</span>
<span class="nv">nr</span> <span class="p">(</span><span class="nb">count </span><span class="nv">vs</span><span class="p">)</span>
<span class="nv">q</span> <span class="p">(</span><span class="nb">drop </span><span class="p">(</span><span class="nb">* </span><span class="mi">2</span> <span class="nv">nr</span><span class="p">)</span> <span class="nv">q</span><span class="p">)</span>
</code></pre></div>
<p>adjust the number of running reductions accordingly (if <code>l</code> is zero, this is an input,
which does not indicate that a reduction has finished),</p>
<div class="highlight"><pre><span></span><code> <span class="nv">np</span> <span class="p">(</span><span class="nf">cond-></span> <span class="p">(</span><span class="nb">+ </span><span class="nv">np</span> <span class="nv">nr</span><span class="p">)</span> <span class="p">(</span><span class="nb">pos? </span><span class="nv">l</span><span class="p">)</span> <span class="nv">dec</span><span class="p">)</span>
</code></pre></div>
<p>put the placeholders on the next queue,</p>
<div class="highlight"><pre><span></span><code> <span class="nv">l2</span> <span class="p">(</span><span class="nb">inc </span><span class="nv">l</span><span class="p">)</span>
<span class="nv">q2</span> <span class="p">(</span><span class="nb">concat </span><span class="p">(</span><span class="nf">queues</span> <span class="nv">l2</span><span class="p">)</span> <span class="nv">vs</span><span class="p">)]</span>
</code></pre></div>
<p>and continue looping</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="nf">recur</span> <span class="p">(</span><span class="nb">assoc </span><span class="nv">state</span> <span class="ss">:n</span> <span class="nv">n</span> <span class="ss">:np</span> <span class="nv">np</span> <span class="ss">:queues</span> <span class="p">(</span><span class="nb">assoc </span><span class="nv">queues</span> <span class="nv">l</span> <span class="nv">q</span> <span class="nv">l2</span> <span class="nv">q2</span><span class="p">))))))</span>
</code></pre></div>
<p>In the case where <code>c-in</code> was closed and <code>np</code> was zero, our queues contain nothing but
complete reductions, which we extract in reverse order</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">reds</span> <span class="p">(</span><span class="nf">->></span> <span class="p">(</span><span class="nb">seq </span><span class="nv">queues</span><span class="p">)</span>
<span class="p">(</span><span class="nb">sort-by </span><span class="nv">first</span><span class="p">)</span> <span class="c1">;; sort by queue number</span>
<span class="p">(</span><span class="nb">map </span><span class="nv">second</span><span class="p">)</span> <span class="c1">;; extract queues</span>
<span class="p">(</span><span class="nb">map </span><span class="nv">first</span><span class="p">)</span> <span class="c1">;; take the head, if any</span>
<span class="p">(</span><span class="nb">filter </span><span class="nv">identity</span><span class="p">)</span> <span class="c1">;; ignore empty heads</span>
<span class="p">(</span><span class="nb">map </span><span class="nv">deref</span><span class="p">)</span> <span class="c1">;; unpack the volatile</span>
<span class="nv">reverse</span>
<span class="p">)]</span>
</code></pre></div>
<p>If there's only one reduction, we're well and truly done. Otherwise, we treat the new series as inputs:</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="k">if </span><span class="p">(</span><span class="nb"><= </span><span class="p">(</span><span class="nb">count </span><span class="nv">reds</span><span class="p">)</span> <span class="mi">1</span><span class="p">)</span>
<span class="p">(</span><span class="nf">>!</span> <span class="nv">c-result</span> <span class="p">(</span><span class="nb">first </span><span class="nv">reds</span><span class="p">))</span> <span class="c1">;; return result</span>
<span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">c-in</span> <span class="p">(</span><span class="nf">chan</span> <span class="mi">1</span> <span class="p">(</span><span class="nb">map </span><span class="p">(</span><span class="k">fn </span><span class="p">[</span><span class="nv">x</span><span class="p">]</span> <span class="p">[</span><span class="mi">0</span> <span class="nv">x</span> <span class="nv">nil</span><span class="p">])))]</span>
<span class="p">(</span><span class="nf">onto-chan</span> <span class="nv">c-in</span> <span class="nv">reds</span><span class="p">)</span>
<span class="p">(</span><span class="nf">recur</span> <span class="p">{</span><span class="ss">:n</span> <span class="p">(</span><span class="nb">count </span><span class="nv">reds</span><span class="p">)</span> <span class="ss">:c-in</span> <span class="nv">c-in</span> <span class="ss">:queues</span> <span class="p">{}</span> <span class="ss">:np</span> <span class="mi">0</span><span class="p">}))))))</span>
</code></pre></div>
<h1>Knowin' what the cards were</h1>
<p>Surprisingly, it wasn't that difficult to get this working. While the state is a bit messy,
we're careful to "modify" it only on one thread, and we enjoy the masochistic frisson of
admonishment every time we type one of Clojure's <em>mutation alert</em><sup id="fnref:TM"><a class="footnote-ref" href="#fn:TM">2</a></sup> exclamation points.</p>
<p>Unfortunately, I suffer from a rare disorder in which new algorithms
induce psychotic hallucinations. For hours after "discovering" binary
trees as an adolescent, I paced slowly back and forth in my friend
Steve's family's living room, grinning at phantom nixie
numbers<sup id="fnref:numbers"><a class="footnote-ref" href="#fn:numbers">3</a></sup> dancing before my eyes and gesticulating decisively,
like some demented conductor. (Subsequently, I used that knowledge to
implement in BASIC an animal guessing game, which I taught to
disambiguate some kid named Jeremy from a pig with the question, "is
it greasy?", so in some ways I was a normal teenager.)</p>
<p>The streaming reduce is particularly attractive - pearls swept forth
in quadrilles and copulae - but I guess the graphic equalizer thingie
is an ok approximation. Still, I couldn't even see how to make one of
those without some horrible sacrifice, like learning javascript.
Someday, I will be able to write only clojure, and some kind soul will
translate it into whatever craziness the browser wants.</p>
<h2>Someday is today</h2>
<p>The combination of clojurescript and
<a href="https://holmsand.github.io/reagent/index.html">reagent</a> over react
allows you to do a tremendous amount with the bare minimum of webbish
cant. The basic idea of reagent is that you use a special
implementation of <code>atom</code></p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="kd">defonce </span><span class="nv">mystate</span> <span class="p">(</span><span class="nf">r/atom</span> <span class="s">"Yowsa!"</span><span class="p">))</span>
</code></pre></div>
<p>which can be <code>swap!</code>ed and <code>reset!</code> as usual and, when dereferenced in the middle of HTML
(here represented as <a href="https://github.com/weavejester/hiccup">hiccup</a>)</p>
<div class="highlight"><pre><span></span><code> <span class="p">[</span><span class="ss">:div</span> <span class="s">"The value of mystate is "</span> <span class="o">@</span><span class="nv">mystate</span><span class="p">]</span>
</code></pre></div>
<p>just plugs in the value as if it had been typed there, updating it whenever it changes. You can
also update attributes, which is particularly interesting in SVG elements:</p>
<div class="highlight"><pre><span></span><code> <span class="p">[</span><span class="ss">:svg</span> <span class="p">[</span><span class="ss">:rect</span> <span class="ss">:height</span> <span class="mi">10</span> <span class="ss">:width</span> <span class="o">@</span><span class="nv">applause-volume</span><span class="p">]]</span>
</code></pre></div>
<p>It's handy to use
<code>core.async</code> to glue an otherwise web-agnostic application to depiction in terms of <code>r/atom</code>s, e.g.</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="nf">go-loop</span> <span class="p">[]</span>
<span class="p">(</span><span class="nf">reset!</span> <span class="nv">mystate</span> <span class="p">(</span><span class="nf">use-contents-of</span> <span class="p">(</span><span class="nf"><!</span> <span class="nv">mychannel</span><span class="p">)))</span>
<span class="p">(</span><span class="nf">recur</span><span class="p">))</span>
</code></pre></div>
<p>Since <code>assoc-reduce</code> was already keeping track of a <code>state</code>, I just
introduced an optional <code>debug</code> parameter - a channel which, if not
nil, should receive the state whenever it's updated. To simulate
varying rates of input and reduction, we use <code>timeout</code>s, optionally
fixed or random:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">pluss</span> <span class="p">[</span><span class="nv">t</span> <span class="nv">do-rand</span><span class="p">]</span>
<span class="p">(</span><span class="k">fn </span><span class="p">[</span><span class="nv">a</span> <span class="nv">b</span><span class="p">]</span> <span class="p">(</span><span class="nf">go</span> <span class="p">(</span><span class="nf"><!</span> <span class="p">(</span><span class="nf">timeout</span> <span class="p">(</span><span class="k">if </span><span class="nv">do-rand</span> <span class="p">(</span><span class="nb">rand-int </span><span class="nv">t</span><span class="p">)</span> <span class="nv">t</span><span class="p">)))</span> <span class="p">(</span><span class="nb">+ </span><span class="nv">a</span> <span class="nv">b</span><span class="p">))))</span>
<span class="p">(</span><span class="kd">defn </span><span class="nv">delay-spool</span> <span class="p">[</span><span class="nv">as</span> <span class="nv">t</span> <span class="nv">do-rand</span><span class="p">]</span>
<span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">c</span> <span class="p">(</span><span class="nf">chan</span><span class="p">)]</span>
<span class="p">(</span><span class="nf">go-loop</span> <span class="p">[[</span><span class="nv">a</span> <span class="o">&</span> <span class="nv">as</span><span class="p">]</span> <span class="nv">as</span><span class="p">]</span>
<span class="p">(</span><span class="k">if </span><span class="nv">a</span> <span class="p">(</span><span class="k">do </span><span class="p">(</span><span class="nf">>!</span> <span class="nv">c</span> <span class="nv">a</span><span class="p">)</span>
<span class="p">(</span><span class="nf"><!</span> <span class="p">(</span><span class="nf">timeout</span> <span class="p">(</span><span class="k">if </span><span class="nv">do-rand</span> <span class="p">(</span><span class="nb">rand-int </span><span class="nv">t</span><span class="p">)</span> <span class="nv">t</span><span class="p">)))</span>
<span class="p">(</span><span class="nf">recur</span> <span class="nv">as</span><span class="p">))</span>
<span class="p">(</span><span class="nf">async/close!</span> <span class="nv">c</span><span class="p">)))</span>
<span class="nv">c</span><span class="p">))</span>
</code></pre></div>
<p>There's some uninteresting massaging of the state into queue lengths,
and some even less interesting boilerplate to read parameters from
constructs originally intended for CGI, but in less than half an hour,
the following emerges:</p>
<div id="app" style="float: left;">Here it is></div>
<script src="extra/jss/sreduce.js"></script>
<script>sreduce.core.run();</script>
<p>It would be more work to make this efficient and to prevent you from
breaking it with silly inputs, but I feel vindicated in waiting for
the clojurescript ecosystem to catch up to my laziness.</p>
<p>(Of course what I'm really hoping for is that somebody actually
animates the dancing pearls for me. Or at least to tell me how to get
a carriage return after the damn thing.)</p>
<p>Go forth and reduce!</p>
<div class="footnote">
<hr>
<ol>
<li id="fn:fold">
<p>Haskellers responds with wounding jeers that we do not understand monoids and semigroups,
which we will pretend not to care about but obsess over in private. <a class="footnote-backref" href="#fnref:fold" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
<li id="fn:TM">
<p>A trademark of Cognitect Industries, all rights reserved. <a class="footnote-backref" href="#fnref:TM" title="Jump back to footnote 2 in the text">↩</a></p>
</li>
<li id="fn:numbers">
<p>Yes, numbers. We didn't have pointers back then, so you made structures with arrays of indices into
other arrays. Glory days. <a class="footnote-backref" href="#fnref:numbers" title="Jump back to footnote 3 in the text">↩</a></p>
</li>
</ol>
</div>Spying on myself2016-01-04T17:00:00-05:002016-01-04T17:00:00-05:00Peter Fraenkeltag:blog.podsnap.com,2016-01-04:/amcrest.html<p>Who knew that paranoia could be so generous? Not only does it allow you to worry about a larger fraction of the world's
hazards, it can even increase the number of potential hazards for you to worry about.</p>
<p>I read ads on the subway, so I know that you can't be too careful these days. There are no bounds, for example,
to the treachery my dog might be perpetrating while I'm hard at work, and it is essential, therefore, to have a 7 day
rolling video archive of so that, say, 5 days after the shredding of a particularly valuable …</p><p>Who knew that paranoia could be so generous? Not only does it allow you to worry about a larger fraction of the world's
hazards, it can even increase the number of potential hazards for you to worry about.</p>
<p>I read ads on the subway, so I know that you can't be too careful these days. There are no bounds, for example,
to the treachery my dog might be perpetrating while I'm hard at work, and it is essential, therefore, to have a 7 day
rolling video archive of so that, say, 5 days after the shredding of a particularly valuable chunk of styrofoam, I can
confront Fido (not her real name) with irrefutable evidence.</p>
<h1>Shameless commerce</h1>
<p>So I bought me one of these <a href="http://www.amazon.com/gp/product/B0145OQTPG/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=B0145OQTPG&linkCode=as2&tag=podsnappery-20&linkId=L73O44M6E74DGYR6">Amcrest thingies</a>:</p>
<p><a rel="nofollow" href="http://www.amazon.com/gp/product/B0145OQTPG/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=B0145OQTPG&linkCode=as2&tag=podsnappery-20&linkId=DATYFGI7QUAYRZF2"><img border="0" src="http://ws-na.amazon-adsystem.com/widgets/q?_encoding=UTF8&ASIN=B0145OQTPG&Format=_SL110_&ID=AsinImage&MarketPlace=US&ServiceVersion=20070822&WS=1&tag=podsnappery-20" ></a><img src="http://ir-na.amazon-adsystem.com/e/ir?t=podsnappery-20&l=as2&o=1&a=B0145OQTPG" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></p>
<p>(By the way, this is an Amazon affiliate link, and I really would appreciate it if you clicked through and bought the camera, notwithstanding the very good reasons not to.)</p>
<p>Then, I noticed <a href="http://www.amazon.com/review/R1MWF430QEPG0W?_encoding=UTF8&asin=B0145OQXCK&cdForum=Fx1Y0DDFR3SWFMU&cdMSG=addedToThread&cdPage=&cdThread=Tx33WJ8DQKAGHKR&newContentID=Mx3KDZTKUUFV6KY&newContentNum=2&store=photo#CustomerDiscussionsNRPB">this disturbing review</a>, the gist of which is that, if you follow the standard instructions
for setting up the camera with <a href="https://en.wikipedia.org/wiki/Universal_Plug_and_Play">UPnP</a>, you'll be exposing a totally unsecured CGI interface to the internet at large.
"Not to worry, Fido," I said, "I'm an Information Technology professional, and I would never do anything that stupid. Your disgusting secrets are safe with me." Blinking
quizzically, Fido batted herself in the face, dislodging a blackened clump of eye mucous, which she quickly slurped up. "True," I said, "but I have UPnP disabled, I've
verified with <a href="https://nmap.org/">nmap</a> that incoming ports are all closed, and I
access the camera only via web services, to which the camera makes outgoing connections... What? Wireshark, you say? Oh fine." Sometimes I wonder if dog ownership is
worth the effort.</p>
<h1>Puzzlement</h1>
<p><a rel="nofollow" href="http://www.amazon.com/gp/product/B0015IUDV2/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=B0015IUDV2&linkCode=as2&tag=podsnappery-20&linkId=PLDAWR6T4WYYJ5BP"><img border="0" src="http://ws-na.amazon-adsystem.com/widgets/q?_encoding=UTF8&ASIN=B0015IUDV2&Format=_SL110_&ID=AsinImage&MarketPlace=US&ServiceVersion=20070822&WS=1&tag=podsnappery-20" ></a><img src="http://ir-na.amazon-adsystem.com/e/ir?t=podsnappery-20&l=as2&o=1&a=B0015IUDV2" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></p>
<p>(Another Amazon affiliate link, by the way. Whatever it is, it's previously used but supposedly in good condition. Give it a try.)</p>
<p>The more I say about Wireshark, the more likely I am to reveal my ignorance about something important, so, if you haven't had the pleasure yet,
go <a href="https://www.wireshark.org/">read about it</a>, and come back after a few weeks of the sort of thing you think is fun but probably disturbs a lot of people.
Basically, we all now know, if you run Wireshark, it will scoop up all the TCP packets it sees and let you filter and search them in various ways. What I was
looking for was traffic to and from the camera. I already knew, by looking at router logs, what IP address the camera went by and that it had made various connections
into IP ranges owned by Amazon Web Services. Strangely, though, Wireshark couldn't see any of this. All I saw in the logs was the occasional message between my
laptop and the camera's configuration web interface, which I'd already verified was invisible to the outside world.</p>
<h1>Nervous little dogs</h1>
<p><img width=300px src="images/dogcoffee.jpg"></p>
<p>As an active (in the sense of this-very-minute) practitioner of internet scaremongering, I am brimming with sympathy for
the authors of breathless Wireshark tutorials about the fun you can have in a coffee shop, but my dog doesn't live in
a coffee shop. For a number of reasons that in retrospect were a bit obvious, it's not always easy to
capture traffic to or from a machine <em>other than the one on which you are running Wireshark</em>. There are many
long and intimidating articles about why this is so, but it all comes down to two common cases:</p>
<ol>
<li>It's a wired network. In this case, all the traffic is flowing through an ethernet <em>switch</em>, which
is meticulously routing to each device only the packets that are meant for it. Back when switches
were really expensive, you could count on the ethernet <em>hub</em> to send all packets to all devices,
which were supposed to avert their eyes from irrelevant ones. (You'll read a lot about setting your NIC card to "promiscuous mode,"
but that only affects what the card does with packets it receives. It doesn't induce the switch to send them
in the first place.)</li>
<li>Alternatively, it's a wireless network. And, since we're all responsible adults - unlike those hippies in charge
of America's coffee shops - it's likely using WPA2, which features
session-level encryption, so, whether or not other people's business is physically reaching your NIC, you aren't going to be able to
make sense of it. (Note that this encryption is independent of whether or not the contents of the TCP communication
is encrypted.)</li>
</ol>
<p>Of course, since we own the network, we have the ability to modify it, such that Wireshark is in the path of ungarbled packets.
The most common advice is to stick an old-fashioned ethernet hub on the network, positioned so that the traffic you're interested in must
pass through it, and then plug your laptop (or whatever you're using to run Wireshark) into the hub. Alternatively, you can
endow your switch with hub-like capabilities (or deficiencies). Fancy switches have specific sniffer features for this, but
not-so-fancy consumer routers are generally running linux and so have iptables. Looking at the DHCP leases on my router, I can
see that the camera is roosting at 192.168.2.98, while my laptop is 192.168.2.87. Now, I want to take all traffic <code>-d</code>estined
for the camera or <code>-s</code>ourced from it, and <code>--tee</code> a copy off to my laptop:</p>
<div class="highlight"><pre><span></span><code>iptables -t mangle -A POSTROUTING -d 192.168.2.98 -j ROUTE --tee --gw 192.168.2.87
iptables -t mangle -A PREROUTING -s 192.168.2.98 -j ROUTE --tee --gw 192.168.2.87
</code></pre></div>
<p>Having done this, we can start up capture in Wireshark with a <code>host 192.168.2.98</code> filter, and there's quite a bit more to see.</p>
<h1>Ouch</h1>
<p><img src="images/ragingbull.jpg" width=400px></p>
<p>In particular, we see something like this:</p>
<div class="highlight"><pre><span></span><code><span class="mf">3054</span> <span class="mf">19.206767</span> <span class="n">ec2</span><span class="o">-</span><span class="mf">54</span><span class="o">-</span><span class="mf">81</span><span class="o">-</span><span class="mf">2</span><span class="o">-</span><span class="mf">137.</span><span class="n">compute</span><span class="o">-</span><span class="mf">1.</span><span class="n">amazonaws</span><span class="mf">.</span><span class="n">com</span> <span class="mf">192.168.2.98</span> <span class="n">FTP</span> <span class="mf">86</span> <span class="n">Response</span><span class="p">:</span> <span class="mf">220</span> <span class="p">(</span><span class="n">vsFTPd</span> <span class="mf">2.2.2</span><span class="p">)</span>
<span class="mf">3056</span> <span class="mf">19.209820</span> <span class="mf">192.168.2.98</span> <span class="n">ec2</span><span class="o">-</span><span class="mf">54</span><span class="o">-</span><span class="mf">81</span><span class="o">-</span><span class="mf">2</span><span class="o">-</span><span class="mf">137.</span><span class="n">compute</span><span class="o">-</span><span class="mf">1.</span><span class="n">amazonaws</span><span class="mf">.</span><span class="n">com</span> <span class="n">FTP</span> <span class="mf">81</span> <span class="n">Request</span><span class="p">:</span> <span class="n">USER</span> <span class="n">cam12345</span>
<span class="mf">3059</span> <span class="mf">19.236939</span> <span class="mf">192.168.2.98</span> <span class="n">ec2</span><span class="o">-</span><span class="mf">54</span><span class="o">-</span><span class="mf">81</span><span class="o">-</span><span class="mf">2</span><span class="o">-</span><span class="mf">137.</span><span class="n">compute</span><span class="o">-</span><span class="mf">1.</span><span class="n">amazonaws</span><span class="mf">.</span><span class="n">com</span> <span class="n">FTP</span> <span class="mf">83</span> <span class="n">Request</span><span class="p">:</span> <span class="n">PASS</span> <span class="mf">1</span><span class="n">a23456b</span>
<span class="mf">3074</span> <span class="mf">19.416028</span> <span class="mf">192.168.2.98</span> <span class="n">ec2</span><span class="o">-</span><span class="mf">54</span><span class="o">-</span><span class="mf">81</span><span class="o">-</span><span class="mf">2</span><span class="o">-</span><span class="mf">137.</span><span class="n">compute</span><span class="o">-</span><span class="mf">1.</span><span class="n">amazonaws</span><span class="mf">.</span><span class="n">com</span> <span class="n">FTP</span> <span class="mf">101</span> <span class="n">Request</span><span class="p">:</span> <span class="n">stor</span> <span class="mf">2016</span><span class="o">-</span><span class="mf">1</span><span class="o">-</span><span class="mf">2</span><span class="o">-</span><span class="mf">03</span><span class="o">-</span><span class="mf">04</span><span class="o">-</span><span class="mf">05</span><span class="o">-</span><span class="n">AmcFtp</span><span class="mf">.</span><span class="n">mp4</span>
</code></pre></div>
<p>In case this isn't clear, my "security" camera is transmitting video by ftp, in plaintext, along with a username and password.
"Surely someone can explain this to me, Fido, but who?" Fido removed her snout from a delicate location and belched. "Worth a
try," I admitted. Indeed, the packet stream contained thousands of bytes along the lines of</p>
<div class="highlight"><pre><span></span><code><span class="mf">0000</span> <span class="n">a4</span> <span class="mf">5</span><span class="n">e</span> <span class="mf">60</span> <span class="n">d6</span> <span class="n">c9</span> <span class="mf">69</span> <span class="n">bc</span> <span class="n">ee</span> <span class="mf">7</span><span class="n">b</span> <span class="mf">7</span><span class="n">d</span> <span class="mf">10</span> <span class="n">b8</span> <span class="mf">08</span> <span class="mf">00</span> <span class="mf">45</span> <span class="mf">00</span> <span class="mf">.</span><span class="o">^</span><span class="err">`</span><span class="mf">..</span><span class="n">i</span><span class="mf">..</span><span class="err">{}</span><span class="mf">....</span><span class="n">E</span><span class="mf">.</span>
<span class="mf">00</span><span class="n">a0</span> <span class="mf">67</span> <span class="mf">31</span> <span class="mf">0</span><span class="n">b</span> <span class="mf">30</span> <span class="mf">09</span> <span class="mf">06</span> <span class="mf">03</span> <span class="mf">55</span> <span class="mf">04</span> <span class="mf">06</span> <span class="mf">13</span> <span class="mf">02</span> <span class="mf">43</span> <span class="mf">41</span> <span class="mf">31</span> <span class="mf">0</span><span class="n">b</span> <span class="n">g1</span><span class="mf">.0...</span><span class="n">U</span><span class="mf">....</span><span class="n">CA1</span><span class="mf">.</span>
<span class="mf">00</span><span class="n">b0</span> <span class="mf">30</span> <span class="mf">09</span> <span class="mf">06</span> <span class="mf">03</span> <span class="mf">55</span> <span class="mf">04</span> <span class="mf">08</span> <span class="mf">13</span> <span class="mf">02</span> <span class="mf">4</span><span class="n">f</span> <span class="mf">4</span><span class="n">e</span> <span class="mf">31</span> <span class="mf">0</span><span class="n">f</span> <span class="mf">30</span> <span class="mf">0</span><span class="n">d</span> <span class="mf">06</span> <span class="mf">0...</span><span class="n">U</span><span class="mf">....</span><span class="kr">ON</span><span class="mf">1.0..</span>
<span class="mf">00</span><span class="n">c0</span> <span class="mf">03</span> <span class="mf">55</span> <span class="mf">04</span> <span class="mf">07</span> <span class="mf">13</span> <span class="mf">06</span> <span class="mf">4</span><span class="n">f</span> <span class="mf">74</span> <span class="mf">74</span> <span class="mf">61</span> <span class="mf">77</span> <span class="mf">61</span> <span class="mf">31</span> <span class="mf">11</span> <span class="mf">30</span> <span class="mf">0</span><span class="n">f</span> <span class="mf">.</span><span class="n">U</span><span class="mf">....</span><span class="n">Ottawa1</span><span class="mf">.0.</span>
<span class="mf">00</span><span class="n">d0</span> <span class="mf">06</span> <span class="mf">03</span> <span class="mf">55</span> <span class="mf">04</span> <span class="mf">0</span><span class="n">a</span> <span class="mf">13</span> <span class="mf">08</span> <span class="mf">43</span> <span class="mf">61</span> <span class="mf">6</span><span class="n">d</span> <span class="mf">63</span> <span class="mf">6</span><span class="n">c</span> <span class="mf">6</span><span class="n">f</span> <span class="mf">75</span> <span class="mf">64</span> <span class="mf">31</span> <span class="mf">..</span><span class="n">U</span><span class="mf">....</span><span class="n">Camcloud1</span>
<span class="mf">00</span><span class="n">e0</span> <span class="mf">11</span> <span class="mf">30</span> <span class="mf">0</span><span class="n">f</span> <span class="mf">06</span> <span class="mf">03</span> <span class="mf">55</span> <span class="mf">04</span> <span class="mf">0</span><span class="n">b</span> <span class="mf">13</span> <span class="mf">08</span> <span class="mf">43</span> <span class="mf">61</span> <span class="mf">6</span><span class="n">d</span> <span class="mf">63</span> <span class="mf">6</span><span class="n">c</span> <span class="mf">6</span><span class="n">f</span> <span class="mf">.0...</span><span class="n">U</span><span class="mf">....</span><span class="n">Camclo</span>
<span class="mf">00</span><span class="n">f0</span> <span class="mf">75</span> <span class="mf">64</span> <span class="mf">31</span> <span class="mf">14</span> <span class="mf">30</span> <span class="mf">12</span> <span class="mf">06</span> <span class="mf">03</span> <span class="mf">55</span> <span class="mf">04</span> <span class="mf">03</span> <span class="mf">13</span> <span class="mf">0</span><span class="n">b</span> <span class="mf">44</span> <span class="mf">61</span> <span class="mf">6</span><span class="n">e</span> <span class="n">ud1</span><span class="mf">.0...</span><span class="n">U</span><span class="mf">....</span><span class="n">Dan</span>
<span class="mf">0100</span> <span class="mf">20</span> <span class="mf">42</span> <span class="mf">75</span> <span class="mf">72</span> <span class="mf">6</span><span class="n">b</span> <span class="mf">65</span> <span class="mf">74</span> <span class="mf">74</span> <span class="mf">00</span> <span class="mf">69</span> <span class="mf">30</span> <span class="mf">67</span> <span class="mf">31</span> <span class="mf">0</span><span class="n">b</span> <span class="mf">30</span> <span class="mf">09</span> <span class="n">Burkett</span><span class="mf">.</span><span class="n">i0g1</span><span class="mf">.0.</span>
<span class="mf">0110</span> <span class="mf">06</span> <span class="mf">03</span> <span class="mf">55</span> <span class="mf">04</span> <span class="mf">06</span> <span class="mf">13</span> <span class="mf">02</span> <span class="mf">43</span> <span class="mf">41</span> <span class="mf">31</span> <span class="mf">0</span><span class="n">b</span> <span class="mf">30</span> <span class="mf">09</span> <span class="mf">06</span> <span class="mf">03</span> <span class="mf">55</span> <span class="mf">..</span><span class="n">U</span><span class="mf">....</span><span class="n">CA1</span><span class="mf">.0...</span><span class="n">U</span>
<span class="mf">0120</span> <span class="mf">04</span> <span class="mf">08</span> <span class="mf">13</span> <span class="mf">02</span> <span class="mf">4</span><span class="n">f</span> <span class="mf">4</span><span class="n">e</span> <span class="mf">31</span> <span class="mf">0</span><span class="n">f</span> <span class="mf">30</span> <span class="mf">0</span><span class="n">d</span> <span class="mf">06</span> <span class="mf">03</span> <span class="mf">55</span> <span class="mf">04</span> <span class="mf">07</span> <span class="mf">13</span> <span class="mf">....</span><span class="kr">ON</span><span class="mf">1.0...</span><span class="n">U</span><span class="mf">...</span>
<span class="mf">0130</span> <span class="mf">06</span> <span class="mf">4</span><span class="n">f</span> <span class="mf">74</span> <span class="mf">74</span> <span class="mf">61</span> <span class="mf">77</span> <span class="mf">61</span> <span class="mf">31</span> <span class="mf">11</span> <span class="mf">30</span> <span class="mf">0</span><span class="n">f</span> <span class="mf">06</span> <span class="mf">03</span> <span class="mf">55</span> <span class="mf">04</span> <span class="mf">0</span><span class="n">a</span> <span class="mf">.</span><span class="n">Ottawa1</span><span class="mf">.0...</span><span class="n">U</span><span class="mf">..</span>
<span class="mf">0140</span> <span class="mf">13</span> <span class="mf">08</span> <span class="mf">43</span> <span class="mf">61</span> <span class="mf">6</span><span class="n">d</span> <span class="mf">63</span> <span class="mf">6</span><span class="n">c</span> <span class="mf">6</span><span class="n">f</span> <span class="mf">75</span> <span class="mf">64</span> <span class="mf">31</span> <span class="mf">11</span> <span class="mf">30</span> <span class="mf">0</span><span class="n">f</span> <span class="mf">06</span> <span class="mf">03</span> <span class="mf">..</span><span class="n">Camcloud1</span><span class="mf">.0...</span>
<span class="mf">0150</span> <span class="mf">55</span> <span class="mf">04</span> <span class="mf">0</span><span class="n">b</span> <span class="mf">13</span> <span class="mf">08</span> <span class="mf">43</span> <span class="mf">61</span> <span class="mf">6</span><span class="n">d</span> <span class="mf">63</span> <span class="mf">6</span><span class="n">c</span> <span class="mf">6</span><span class="n">f</span> <span class="mf">75</span> <span class="mf">64</span> <span class="mf">31</span> <span class="mf">14</span> <span class="mf">30</span> <span class="n">U</span><span class="mf">....</span><span class="n">Camcloud1</span><span class="mf">.0</span>
<span class="mf">0160</span> <span class="mf">12</span> <span class="mf">06</span> <span class="mf">03</span> <span class="mf">55</span> <span class="mf">04</span> <span class="mf">03</span> <span class="mf">13</span> <span class="mf">0</span><span class="n">b</span> <span class="mf">44</span> <span class="mf">61</span> <span class="mf">6</span><span class="n">e</span> <span class="mf">20</span> <span class="mf">42</span> <span class="mf">75</span> <span class="mf">72</span> <span class="mf">6</span><span class="n">b</span> <span class="mf">...</span><span class="n">U</span><span class="mf">....</span><span class="n">Dan</span> <span class="n">Burk</span>
<span class="mf">0170</span> <span class="mf">65</span> <span class="mf">74</span> <span class="mf">74</span> <span class="mf">0</span><span class="n">e</span> <span class="mf">00</span> <span class="mf">00</span> <span class="mf">00</span> <span class="n">ett</span><span class="mf">....</span>
</code></pre></div>
<p>so clearly at least one person wants my attention. Dan Burkett, you may have guessed, lives in Ottawa, and is
affiliated with a company called Camcloud, to which Amcrest apparently outsources its web-based offerings.</p>
<p><img src="images/shoelessjoe.png" width=400px></p>
<p>I fired off a nice email to info@camcloud.com, inquiring about
"the methods you use to protect the privacy and security of users." Surprisingly, they wrote back.
Unsurprisingly, they don't really see an issue here:</p>
<div class="highlight"><pre><span></span><code><span class="n">Your</span> <span class="n">packet</span> <span class="n">analysis</span> <span class="k">is</span> <span class="n">correct</span> <span class="n">that</span> <span class="n">FTP</span> <span class="k">is</span> <span class="n">used</span> <span class="k">for</span> <span class="n">media</span> <span class="n">transfer</span> <span class="n">during</span> <span class="n">media</span> <span class="n">upload</span><span class="p">,</span> <span class="n">which</span> <span class="k">is</span> <span class="n">very</span> <span class="n">common</span> <span class="k">for</span> <span class="n">almost</span> <span class="n">all</span> <span class="n">IP</span> <span class="n">cameras</span><span class="o">.</span> <span class="n">It</span><span class="s1">'s important to point out that our FTP server implementation does not permit user access or any file retrieval (upload only) and every camera'</span><span class="n">s</span> <span class="n">FTP</span> <span class="n">credentials</span> <span class="n">are</span> <span class="n">unique</span> <span class="n">to</span> <span class="n">that</span> <span class="n">camera</span><span class="p">,</span> <span class="ow">and</span> <span class="n">are</span> <span class="n">destroyed</span> <span class="n">when</span> <span class="n">the</span> <span class="n">camera</span> <span class="k">is</span> <span class="n">removed</span> <span class="n">from</span> <span class="n">the</span> <span class="n">cloud</span><span class="o">.</span>
</code></pre></div>
<p>The upload-only ftp drop is very slightly reassuring, in that spies would need to to capture and reassemble each file in real time, rather than perusing a library of
mp4's at their leisure. On the other hand, it did occur to me that a careful burglar - less interested in spying on me than preventing me from spying on him - might think
to overwrite incriminating uploads. Please don't tell Fido, but I verified that (1) I could indeed log into the ftp server manually, (2) I could not RETR the file
that my camera had uploaded, but (3) I could indeed overwrite it with one containing only the string "Hello".</p>
<p>Also, for what meanings of "very common" and "almost all" is "very common for almost all" distinguishable from "very common" or "almost all" separately?
I worry about these things.</p>
<h1>What now?</h1>
<p><img src="images/highanxiety.jpg"></p>
<p>In the grand scheme of IoT disaster scenarios, the Amcrest ProHD 1080P is probably not going to be our national undoing, but
it's still kind of creepy. We're used to the idea that peril lurks in every email (so sorry, HotH00kup69),
and there doesn't seem to be much percentage in wiring our refrigerator to the matrix, but security cameras sound like they ought
to be sort of secure. Phrased as, "hey, let's take this mysterious little computer running software about which we know nothing, cram it with gigabytes of
personal data and expose it to the internet," the idea is less appealing.</p>
<p>Worse, while it took a bit of effort to uncover the minor fiasco of plaintext ftp, increased sophistication on the part of the provider
would have the primary effect of hiding the mechanism from users like me, and only secondarily and unverifiably the effect of increasing security.
I don't own a <a href="https://nest.com/camera/meet-nest-cam/">NestCam</a>, but I'd be surprised if the fine people at a wholly owned subsidiary of
Google would at this point be making any mistakes crude enough for me to discover and blog about. On the other hand,
Google <a href="http://techcrunch.com/2015/06/01/google-photos-reminder-smile-its-free-youre-the-product/">may not be</a>
the most trustworthy guardian of your visual records.</p>
<p>Anyway, this is why you shouldn't have a dog.</p>Impostor!2016-01-04T17:00:00-05:002016-01-04T17:00:00-05:00Peter Fraenkeltag:blog.podsnap.com,2016-01-04:/impostor.html<h1>Zoltar</h1>
<p><img style="float: left" src="./images/zoltar.jpg" width="20%"/></p>
<p>I took an "Impostor Syndrome" <a href="http://www.empresshasnoclothes.com/articles-detail.php?aid=520&cid=4">quiz</a>
consisting of the following 6 statements</p>
<ol>
<li>When people praise me for something I've accomplished, I'm afraid I won't be able to live up to their expectations of me in the future.</li>
<li>At times, I feel my success has been due to some kind of luck.</li>
<li>Sometimes I'm afraid others will discover how much knowledge or ability I really lack.</li>
<li>When I've succeeded at something and received recognition for my accomplishments, I have doubts that I can keep repeating that success.</li>
<li>I often compare my ability to those around me and think they may …</li></ol><h1>Zoltar</h1>
<p><img style="float: left" src="./images/zoltar.jpg" width="20%"/></p>
<p>I took an "Impostor Syndrome" <a href="http://www.empresshasnoclothes.com/articles-detail.php?aid=520&cid=4">quiz</a>
consisting of the following 6 statements</p>
<ol>
<li>When people praise me for something I've accomplished, I'm afraid I won't be able to live up to their expectations of me in the future.</li>
<li>At times, I feel my success has been due to some kind of luck.</li>
<li>Sometimes I'm afraid others will discover how much knowledge or ability I really lack.</li>
<li>When I've succeeded at something and received recognition for my accomplishments, I have doubts that I can keep repeating that success.</li>
<li>I often compare my ability to those around me and think they may be more intelligent than I am.</li>
<li>If I am going to receive a promotion or recognition of some kind, I hesitate to tell others until it is an accomplished fact.</li>
</ol>
<p>to which one assigns a score from 1 to 5, where 5 means complete
agreement. I got 29, which is like <a href="http://zoltarmachine.com/">Zoltar</a> handing out a card
that reads</p>
<div class="highlight"><pre><span></span><code><span class="nv">Thin</span><span class="o">-</span><span class="nv">skinned</span> <span class="nv">buffoon</span>. <span class="nv">You</span> <span class="k">pause</span> <span class="nv">between</span> <span class="nv">stupidities</span> <span class="nv">only</span> <span class="nv">long</span> <span class="nv">enough</span>
<span class="nv">to</span> <span class="nv">cringe</span> <span class="nv">at</span> <span class="nv">those</span> <span class="nv">you</span> <span class="nv">uttered</span> <span class="nv">years</span> <span class="nv">ago</span>.
</code></pre></div>
<p>What a relief, then, to learn that
impostor syndrome is a <a href="http://qz.com/606727/is-imposter-syndrome-a-sign-of-greatness/">sign of greatness</a>!
I'm like John Steinbeck and Jodie Foster. A broad spectrum of humanity wants to be me.</p>
<p>Unless.</p>
<p>Unless I'm not really an impostor. Maybe I've only been pretending. Maybe I'm one of those confident jerk types,
but I pretend to be an impostor so that people will say nice things to me out of pity. An impostor at imposturing.</p>
<h1>Recursive Impostature</h1>
<p><img style="float: left" src="./images/matrix.jpg" width="20%"/></p>
<p>I'm here to tell you (by which I mean me) that you really are an impostor, so you can stop worrying.
You are a slab of meat hurtling through the cosmos, and claiming to be more than that is
outright pretension. (I forgot to say, you're also pretentious.) Acting like your puny thoughts
can in any perceptible way staunch the entropic dissolution of our universe is just your way of
advertising ignorance. And it's not even "our universe"; your outrageous fraudulence is just part of
a huge <a href="http://www.simulation-argument.com/matrix.html">simulation</a>, run by even more outrageous
frauds.</p>
<p>I call them the uber-frauds. These unmitigated assholes went out and built computers and learned how to
make them perform acts of ostentatious complexity. Some of them studied for years, plumping their disposable
carcasses with "knowledge," developing "judgment" and performing various acts of "kindness" along the way -
when they of all people (or whatever) had to have known goddamn well what a sham it was. Impostors.</p>
<p>Even relative to each other, they were impostors. One of their nastier simulated characters was a Russian
physicist named Lev Landau, who kept a <a href="https://en.wikipedia.org/wiki/Lev_Landau#Landau.27s_List">list</a> of
other physicists, scoring them on a logarithmic scale. Isaac Newton was at the top, ranked zero,
with ability decreasing by a factor of 10 for every unit increase.<br>
Albert Einstein, at 0.5, was $10^\frac{1}{2}\simeq 3.1\times$
stupider than Newton. Landau gave himself a 2.5, making him 100 times stupider still, and a typical physicist in a top-flight
department gets a 4.5, corresponding to one ten-thousandth of an Einstein.</p>
<p>Landau's insight<sup id="fnref:others"><a class="footnote-ref" href="#fn:others">1</a></sup> is not these exact numeric values, but that ability sorts itself out on a log scale.
The person next to you is not 10% more or less capable, but 10 times one way or the other. </p>
<p>This distribution is taught in Simulation 101 at Uberfraud University, where students ranked 6 or 7
hang desperately on every word from the 4.5 at the front of the lecture hall, imagining themselves
some 31.6 to 316 times beyond their station. Impostors, all of them. And I hear that the Dr. 4.5
was one in her time.</p>
<p>But she got better. Because that's what these uber-fraud super-beings
do. They do stuff they should know to leave to their superiors, and
act all confident when they really aren't, and then one day they find
themselves pretending to be able to do new and different stuff,
because the stuff they used to pretend to be able to do is now so
doable that they can no longer remember how impressively impossible it
used to be.</p>
<h1>Awesome Power!</h1>
<p><img style="float: left" src="./images/hugh.jpg" width="20%"/></p>
<p>This may not be what you want to hear, but impostor syndrome is not something you recover
from. The gift of self-doubt is one that keeps on giving, relentless, indispensably.
What may be more reassuring is that, at some point, it ceases to be entirely unpleasant.
You still have the weird feeling of titanium blades exploding through your knuckles, but
you know from experience that the discomfort passes.</p>
<p>Annoyingly, there are people in this world who accomplish great things without
ever experiencing a second of gut-wrenching existential panic. I'm not talking about
impostors who just do a good job of hiding it, though these may exist. I mean there are those
who bestride our narrow world triumphantly, experiencing absolutely no
downside from being born this way. Get over it. Landau's log scale applies to all manner of
fortune, and the portion awarded you is likely somewhere in the middle. Count your blessings.</p>
<div class="footnote">
<hr>
<ol>
<li id="fn:others">
<p>one of them anyway <a class="footnote-backref" href="#fnref:others" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
</ol>
</div>A minimalist translation of Clojure's core.async to Scala2015-01-31T00:00:00-05:002015-01-31T00:00:00-05:00Peter Fraenkeltag:blog.podsnap.com,2015-01-31:/scasync.html<h3>Synecducers</h3>
<p>Great computer languages, like monarchs and planets, become emblems of what surrounds them.
The greatest computer languages are barely there, as nearly everything we
file under their names could be described as a library or other customization.
It's not unusual and not even absurd to find a question about <code>socket()</code> on a <code>C</code> language
forum: Linux is arguably a morphogenic implication of <code>C</code>. Clojure, too, is formally
minimal, but it induces composition.
Much of what we specifically admire about it isn't the language itself so much
as an expression of it.</p>
<p>Clojure provides Communicating Sequential Processes (CSP) via the …</p><h3>Synecducers</h3>
<p>Great computer languages, like monarchs and planets, become emblems of what surrounds them.
The greatest computer languages are barely there, as nearly everything we
file under their names could be described as a library or other customization.
It's not unusual and not even absurd to find a question about <code>socket()</code> on a <code>C</code> language
forum: Linux is arguably a morphogenic implication of <code>C</code>. Clojure, too, is formally
minimal, but it induces composition.
Much of what we specifically admire about it isn't the language itself so much
as an expression of it.</p>
<p>Clojure provides Communicating Sequential Processes (CSP) via the <code>core.async</code> library. The
<code>go</code> construct is here implemented as a macro, yet the code that uses it is at least as elegant
and natural as that written in the <code>go</code> language, whose inventors obviously found the
concept rather central. The internet is stuffed to the gills with <code>core.async</code> tutorials, so I
won't go into it much today except to inform choices made in a Scala version.</p>
<h3>Goal</h3>
<p>The goal is to come as close as possible, in Scala, to the following Clojure code,</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">foo</span> <span class="p">[</span><span class="nv">n</span><span class="p">]</span>
<span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">c1</span> <span class="p">(</span><span class="nf">chan</span><span class="p">)</span>
<span class="nv">c2</span> <span class="p">(</span><span class="nf">chan</span><span class="p">)]</span>
<span class="p">(</span><span class="nf">async/go-loop</span> <span class="p">[</span><span class="nv">n</span> <span class="p">(</span><span class="nb">dec </span><span class="nv">n</span><span class="p">)]</span>
<span class="p">(</span><span class="nf">>!</span> <span class="nv">c1</span> <span class="s">"Fizz"</span><span class="p">)</span>
<span class="p">(</span><span class="nf"><!</span> <span class="p">(</span><span class="nf">timeout</span> <span class="p">(</span><span class="nb">rand </span><span class="mi">500</span><span class="p">)))</span>
<span class="p">(</span><span class="nb">when </span><span class="p">(</span><span class="nb">pos? </span><span class="nv">n</span><span class="p">)</span> <span class="p">(</span><span class="nf">recur</span> <span class="p">(</span><span class="nb">dec </span><span class="nv">n</span><span class="p">))))</span>
<span class="p">(</span><span class="nf">async/go-loop</span> <span class="p">[</span><span class="nv">n</span> <span class="p">(</span><span class="nb">dec </span><span class="nv">n</span><span class="p">)]</span>
<span class="p">(</span><span class="nf">>!</span> <span class="nv">c2</span> <span class="mi">41</span><span class="p">)</span>
<span class="p">(</span><span class="nf"><!</span> <span class="p">(</span><span class="nf">timeout</span> <span class="p">(</span><span class="nb">rand </span><span class="mi">500</span><span class="p">)))</span>
<span class="p">(</span><span class="nb">when </span><span class="p">(</span><span class="nb">pos? </span><span class="nv">n</span><span class="p">)</span> <span class="p">(</span><span class="nf">recur</span> <span class="p">(</span><span class="nb">dec </span><span class="nv">n</span><span class="p">))))</span>
<span class="p">(</span><span class="nf">async/go-loop</span> <span class="p">[</span><span class="nv">n</span> <span class="p">(</span><span class="nb">dec </span><span class="p">(</span><span class="nb">* </span><span class="mi">2</span> <span class="nv">n</span><span class="p">))]</span>
<span class="p">(</span><span class="k">let </span><span class="p">[[</span><span class="nv">v</span> <span class="nv">c</span><span class="p">]</span> <span class="p">(</span><span class="nf">async/alts!</span> <span class="p">[</span><span class="nv">c1</span> <span class="nv">c2</span><span class="p">])]</span>
<span class="p">(</span><span class="nf">condp</span> <span class="nb">= </span><span class="nv">c</span>
<span class="nv">c1</span> <span class="p">(</span><span class="nb">println </span><span class="p">(</span><span class="nb">str </span><span class="nv">v</span> <span class="s">"Buzz"</span><span class="p">))</span>
<span class="nv">c2</span> <span class="p">(</span><span class="nb">println </span><span class="p">(</span><span class="nb">inc </span><span class="nv">v</span><span class="p">))))</span>
<span class="p">(</span><span class="nb">when </span><span class="p">(</span><span class="nb">pos? </span><span class="nv">n</span><span class="p">)</span> <span class="p">(</span><span class="nf">recur</span> <span class="p">(</span><span class="nb">dec </span><span class="nv">n</span><span class="p">))))))</span>
</code></pre></div>
<p>but with type safety<sup id="fnref:ctchan"><a class="footnote-ref" href="#fn:ctchan">1</a></sup> of course. What's most important is that the
channel read and write operations should not block a thread, but I would also like
to avoid significant addition of boilerplate.</p>
<h3>Philosophical Rants</h3>
<h4>Development Driven Development</h4>
<p>If you have no idea what you're doing, start typing and see what happens. If it
doesn't work, fix it. If it's unfixable, rip it apart and start over. If
the emerging structure suggests interesting generalizations, consider
relocating the goal post. If the requirements unavoidably imply ugliness, think
about changing the requirements. It's rare that anybody knows exactly what
they need.</p>
<p>I could have called this Refactoring Driven Development, or Prototype Driven Development, but
that wouldn't have been sarcastic enough to
to evoke and implicitly deride Test Driven Development, whose
synergies of witless sloganeering and robotic compliance promise not just general dystopia,
but the very specific dystopia of the former Soviet Union.</p>
<h4>Curtain Driven Development</h4>
<p>When writing a library or framework that you expect to be used in
diverse ways, it's easy to get too hung up on the dire prohibitions
of our age:</p>
<ol>
<li>Mutability</li>
<li>Locks</li>
<li>Explicit type checks and casts</li>
</ol>
<p>The evil of these constructs lies in their potential to create code
that is difficult to understand ("reason about") and for that reason
may hide tricky bugs. In fact, you should probably assume that there
will be bugs, and bugs are bad in inverse proportion to the amount of
time you're willing to spend on eradication and the proving of eradication.
That amount of time should, in turn, be proportional to the aggregate
time that the code will eventually run. In an application written
for a relatively narrow purpose, you should be willing to sacrifice
flexibility and even performance to avoid spending time fixing things.</p>
<p>On the other hand, the benefit of these constructs lies in their
proximity to the hardware and the freedom to tell that hardware
what to do more more or less directly. The real world is procedural,
mutable and unityped, and, when coding in the real world, the only
decision you can make is at where in the abstraction stack you choose
to begin hiding that fact. Beyond that point, code should pay no
attention to the infernal machine behind the curtain, but that doesn't
diminish our need for infernal machines.</p>
<h4>Backpedaling</h4>
<p>To be honest, I was hoping that there would be a lot less curtain-worthy action
than ultimately appeared necessary. It had seemed at least plausible to
address the various
asynchronous dependencies and contingencies monadically, with vanilla promises
and futures, but this proved to be beyond me, especially in the implementation of
<code>alts</code>. For the moment, I believe that one needs extra machinery when there
are potentially both many writers and many readers on the same channel - hence,
perhaps, the rather old-school concurrency techniques uses in the <code>core.async</code>
<code>ManyToManyChannel</code>. Or I'm missing something.</p>
<p>I should also emphasize that this code is not nearly well tested enough to
be offered for production use. There's a difference between eschewing test-driven
development and embracing inadequate testing, but I don't have a lot of time on
my hands at the moment... Appropriately, this code lives for the moment
in my <code>scala-playground</code> repository.</p>
<h3>Asynchronicity in Scala</h3>
<p>There isn't a standard implementation of CSP for Scala, but there are quite a few
related tools and abstractions for concurrent programming. This is a non-exhaustive
list of concepts in the space.</p>
<h4>Futures and Promises</h4>
<p>The simplest way of making something happen in the future in Scala is with a <code>Future</code>,</p>
<div class="highlight"><pre><span></span><code> <span class="kd">val</span> <span class="n">f</span> <span class="o">=</span> <span class="nc">Future</span> <span class="p">{</span><span class="nc">Thread</span><span class="p">.</span><span class="n">sleep</span><span class="p">(</span><span class="mi">100</span><span class="p">);</span><span class="mi">2</span><span class="p">}</span>
</code></pre></div>
<p>where <code>Thread.sleep(100)</code> is a stand-in for some more reasonable time-consuming
activity.
So far, that's not much different from creating a Java <code>Runnable</code>, but the most
stupendous thing
about Scala's futures is that they're functors,</p>
<div class="highlight"><pre><span></span><code> <span class="kd">val</span> <span class="n">f2</span> <span class="o">=</span> <span class="n">f</span><span class="p">.</span><span class="n">map</span> <span class="p">{</span><span class="n">_+</span><span class="mi">1</span><span class="p">}</span>
</code></pre></div>
<p>so you can map arbitrary chains of what are essentially callbacks,
specifying them sequentially, rather than undergoing the
eponymous hell. Actually, they're monads too, with <code>flatMap</code>, so you
can use the full <code>for</code> sugar to coordinate multiple events:</p>
<div class="highlight"><pre><span></span><code><span class="kd">val</span> <span class="n">f3</span> <span class="o">=</span> <span class="nc">Future</span> <span class="p">{</span><span class="nc">Thread</span><span class="p">.</span><span class="n">sleep</span><span class="p">(</span><span class="mi">1000</span><span class="p">);</span> <span class="mf">3.14</span><span class="p">}</span>
<span class="kd">val</span> <span class="n">f4</span> <span class="o">=</span> <span class="k">for</span> <span class="p">{</span><span class="n">i</span> <span class="o"><-</span> <span class="n">f2</span>
<span class="n">d</span> <span class="o"><-</span> <span class="n">f3</span><span class="p">}</span>
<span class="k">yield</span> <span class="p">(</span><span class="n">d</span><span class="p">.</span><span class="n">round</span><span class="o">==</span><span class="n">i</span><span class="p">)</span>
</code></pre></div>
<p>Generally one wants to avoid blocking on futures, but you can:</p>
<div class="highlight"><pre><span></span><code> <span class="n">assert</span><span class="p">(</span> <span class="nc">Await</span><span class="p">.</span><span class="n">result</span><span class="p">(</span><span class="n">f2</span><span class="p">,</span> <span class="mi">1</span> <span class="n">second</span><span class="p">)</span><span class="o">==</span><span class="mi">4</span><span class="p">)</span> <span class="c1">// blocking</span>
</code></pre></div>
<p>There are other interesting features, but that's the gist of it.</p>
<p>We also get <code>Promise</code>s, which are really souped up
<code>CountdownLatch</code>es that you use to trigger activity somewhere
else, generally via a future:</p>
<div class="highlight"><pre><span></span><code><span class="kd">val</span> <span class="n">p</span> <span class="o">=</span> <span class="nc">Promise</span><span class="p">[</span><span class="nc">Int</span><span class="p">]</span>
<span class="n">p</span><span class="p">.</span><span class="n">future</span><span class="p">.</span><span class="n">onSuccess</span> <span class="p">{</span><span class="k">case</span> <span class="n">i</span> <span class="o">=></span> <span class="n">println</span><span class="p">(</span><span class="s">s"The answer is </span><span class="si">$</span><span class="n">i</span><span class="s">"</span><span class="p">)}</span>
<span class="n">p</span><span class="p">.</span><span class="n">success</span><span class="p">(</span><span class="mi">42</span><span class="p">)</span>
</code></pre></div>
<p>A single promise can produce an arbitrary number of futures, all of which will
fire when the promise is fulfilled.</p>
<h4>Actors</h4>
<p>The standard suggestion for event driven programming in Scala these days
seems to be Akka's Actor library. Akka has a lot to offer, but, for
present purposes, we should note that the actor model is not CSP; in
some ways, it's the opposite of CSP.
Programming with CSP style channels is akin to using higher order
functions with collections; it's a variation of a familiar FP paradigm,
where the central entity is a conduit of data, and behavior is
encapsulated in pure, transport-oblivious functions.
Programming with actors is, essentially, writing miniature servers;
it's a variation of a familiar OO paradigm, where the central entity is
an object with customizable behaviors:</p>
<div class="highlight"><pre><span></span><code><span class="k">class</span> <span class="nc">MyActor</span> <span class="k">extends</span> <span class="nc">Actor</span> <span class="p">{</span>
<span class="k">override</span> <span class="k">def</span> <span class="nf">receive</span> <span class="o">=</span> <span class="p">{</span>
<span class="k">case</span> <span class="n">x</span><span class="p">:</span><span class="nc">Int</span> <span class="o">=></span> <span class="n">doSomething</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
<span class="k">case</span> <span class="n">_</span> <span class="o">=></span> <span class="n">doSomethingElse</span><span class="p">()</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<h4>Async</h4>
<p>As we've seen, the monadic future allows us to compose asynchronous events in
a manner that is much more intuitive than straight callbacks.
The <a href="https://github.com/scala/async">Async Project</a> takes this further, with
macros that enable an even more intuitive and efficient organization of
futures. In the example above, we can replace the <code>for</code> comprehension
with:</p>
<div class="highlight"><pre><span></span><code><span class="n">async</span> <span class="p">{</span>
<span class="kd">val</span> <span class="n">i</span> <span class="o">=</span> <span class="n">await</span><span class="p">(</span><span class="n">f2</span><span class="p">)</span>
<span class="kd">val</span> <span class="n">d</span> <span class="o">=</span> <span class="n">await</span><span class="p">(</span><span class="n">f3</span><span class="p">)</span>
<span class="n">d</span><span class="p">.</span><span class="n">round</span><span class="o">==</span><span class="n">i</span>
<span class="p">}</span>
</code></pre></div>
<p>or even</p>
<div class="highlight"><pre><span></span><code><span class="n">async</span> <span class="p">{</span><span class="n">await</span><span class="p">(</span><span class="n">f2</span><span class="p">)</span> <span class="o">==</span> <span class="n">await</span><span class="p">(</span><span class="n">f3</span><span class="p">).</span><span class="n">round</span><span class="p">}</span>
</code></pre></div>
<p>Notwithstanding its misleading name, <code>await</code> doesn't actually wait. Rather,
the state of execution is "parked" as it is for various <code>!</code> operations
inside Clojure's <code>go</code> blocks.</p>
<p>With <code>async</code> we're well on our way to <code>core.async</code>, but we still have to
bridge the divide between futures and channels.</p>
<h4>scala-gopher</h4>
<p><a href="https://github.com/rssh/scala-gopher">Gopher</a> is the most fully fledged CSP
framework for Scala that I've been able to find. It may be the wave of the
future, but (again) for the present (and contrived) purposes, I will find things to
object to. First, the author writes</p>
<blockquote>
<p>Note, which this is not an emulation of go language constructions
in scala, but rather reimplementation of key ideas in 'scala-like'
manner.</p>
</blockquote>
<p>but what I'm looking for is in fact an emulation of Clojure language
constructions, and, with the use of <code>async</code>, I think I can achieve that
in a manner that is sufficiently Scala-like as at least to be legible. Second,
while Gopher is built with Async components, it introduces its own <code>go</code>
macro, which I'm hoping isn't necessary, since it would be nice to
mix futures and channel programming.</p>
<h4>Akka Channels</h4>
<p>The Akka <code>Channel</code> class has been deprecated in favor of the
<code>akka.persistence.AtLeastOnceDelivery</code> trait, which makes it somewhat
more obvious that its purpose is reliable delivery rather than CSP.
In retrospect, it seems that one shouldn't ever use the word "channel"
except to provide opportunities for going on about what we mean by the word.</p>
<h3>Channels for Scala</h3>
<p>Like Clojure's channel, ours is essentially a buffer with asynchronous
access methods. Unlike Clojure's, ours will handle those asynchronous
operations via futures and promises, so we can re-use the machinery and idioms for
for dealing with them. We should be able to do this:</p>
<div class="highlight"><pre><span></span><code><span class="kd">val</span> <span class="n">c</span> <span class="o">=</span> <span class="nc">Chan</span><span class="p">[</span><span class="nc">Int</span><span class="p">]</span>
<span class="n">async</span> <span class="p">{</span>
<span class="n">println</span><span class="p">(</span><span class="s">s"Got </span><span class="si">${</span><span class="n">await</span><span class="p">(</span><span class="n">c</span><span class="p">.</span><span class="n">read</span><span class="p">)</span><span class="si">}</span><span class="s">)</span>
<span class="s">}</span>
<span class="s">async {</span>
<span class="s"> await(c.write(5))</span>
<span class="s"> println("</span><span class="nc">Sent</span><span class="err">"</span><span class="p">)</span>
<span class="p">}</span>
</code></pre></div>
<p>What must be the case in order for this sort of thing to work? First,
the channel will need to support some notification mechanism, to tell
parked writers that a previously full buffer can now accommodate a write,
and parked readers that a previously empty buffer now has something to read.
Ignoring all sorts of complexity, the top of the class has to look a bit like:</p>
<div class="highlight"><pre><span></span><code><span class="c1">// v1</span>
<span class="k">class</span> <span class="nc">Chan</span><span class="p">[</span><span class="nc">T</span><span class="p">](</span><span class="kd">val</span> <span class="n">buf</span><span class="p">:</span> <span class="nc">ChanBuffer</span><span class="p">[</span><span class="nc">T</span><span class="p">])</span> <span class="p">{</span>
<span class="kd">var</span> <span class="n">pReadyForWrite</span> <span class="o">=</span> <span class="nc">Promise</span><span class="p">[</span><span class="nc">Unit</span><span class="p">].</span><span class="n">success</span><span class="p">(</span><span class="nc">Unit</span><span class="p">)</span> <span class="c1">// start out empty</span>
<span class="kd">var</span> <span class="n">pReadyForRead</span> <span class="o">=</span> <span class="nc">Promise</span><span class="p">[</span><span class="nc">Unit</span><span class="p">]</span> <span class="c1">// nothing to read yet</span>
<span class="k">def</span> <span class="nf">write</span><span class="p">(</span><span class="n">v</span> <span class="p">:</span> <span class="nc">T</span><span class="p">)</span> <span class="p">:</span> <span class="nc">Future</span><span class="p">[</span><span class="nc">Unit</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span>
<span class="kd">val</span> <span class="n">p</span> <span class="o">=</span> <span class="nc">Promise</span><span class="p">[</span><span class="nc">Unit</span><span class="p">]</span>
<span class="n">pReadyForwrite</span><span class="p">.</span><span class="n">future</span> <span class="p">{</span>
<span class="n">buf</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
<span class="n">pReadyForWrite</span> <span class="o">=</span> <span class="nc">Promise</span><span class="p">[</span><span class="nc">Unit</span><span class="p">]</span>
<span class="n">pReadyForRead</span><span class="p">.</span><span class="n">success</span><span class="p">(</span><span class="nc">Unit</span><span class="p">)</span>
<span class="n">p</span><span class="p">.</span><span class="n">success</span><span class="p">(</span><span class="nc">Unit</span><span class="p">)</span>
<span class="p">}</span>
<span class="n">p</span><span class="p">.</span><span class="n">future</span>
<span class="o">???</span>
<span class="p">}</span>
</code></pre></div>
<p>When <code>pReadyForWrite</code> fires, we (1) add the data,
(2) trigger <code>pReadyForRead</code>, and (3) signal the parked writer.</p>
<p>The most significant gap in this implementation is that it won't manage
multiple readers and writers on the same channel. In fact,
when <code>pReadyForWrite</code> goes off, we know that <em>somebody</em> will be able to
perform a write, but it might not be us. If it's not us, then we need to
schedule another attempt. And how do we know if we can write? Let's assume
that the buffer's <code>take</code> and <code>put</code> methods can succeed or fail, and tell
us about it the state of the buffer afterwards.</p>
<div class="highlight"><pre><span></span><code> <span class="k">abstract</span> <span class="k">class</span> <span class="nc">ChanBuffer</span><span class="p">[</span><span class="nc">T</span><span class="p">]()</span> <span class="p">{</span>
<span class="k">def</span> <span class="nf">put</span><span class="p">(</span><span class="n">v</span><span class="p">:</span> <span class="nc">T</span><span class="p">)</span> <span class="p">:</span> <span class="nc">Option</span><span class="p">[</span><span class="nc">BufferResult</span><span class="p">[</span><span class="nc">T</span><span class="p">]]</span>
<span class="k">def</span> <span class="nf">take</span> <span class="p">:</span> <span class="nc">Option</span><span class="p">[</span><span class="nc">BufferResult</span><span class="p">[</span><span class="nc">T</span><span class="p">]]</span>
<span class="p">}</span>
<span class="k">case</span> <span class="k">class</span> <span class="nc">BufferResult</span><span class="p">[</span><span class="nc">T</span><span class="p">](</span><span class="n">v</span> <span class="p">:</span> <span class="nc">T</span><span class="p">,</span>
<span class="n">noLongerEmpty</span><span class="p">:</span><span class="nc">Boolean</span><span class="o">=</span><span class="kc">false</span><span class="p">,</span>
<span class="n">noLongerFull</span><span class="p">:</span><span class="nc">Boolean</span><span class="o">=</span><span class="kc">false</span><span class="p">,</span>
<span class="n">nowEmpty</span><span class="p">:</span><span class="nc">Boolean</span><span class="o">=</span><span class="kc">false</span><span class="p">,</span>
<span class="n">nowFull</span><span class="p">:</span><span class="nc">Boolean</span><span class="o">=</span><span class="kc">false</span><span class="p">)</span>
</code></pre></div>
<p>The <code>BufferResult</code> contains
flags rather than a simple enumeration, because combinations are possible.
For example, a put to an empty buffer of length 1 will return
<code>BufferResult(true,false,false,true)</code> as the buffer is now
full and is no longer empty. If this were a dropping buffer,
which simply dropped new data when there wasn't room for it, then
<code>nowFull</code> would never be set.</p>
<p>The <code>BufferResult</code> flags will imply very specific followup
behavior from the caller:</p>
<ol>
<li><code>noLongerEmpty</code> - Complete <code>pReadyForRead</code></li>
<li><code>noLongerFull</code> - Complete <code>pReadyForWrite</code></li>
<li><code>nowEmpty</code> - Replace <code>pReadyForRead</code> with an uncompleted promise.</li>
<li><code>nowFull</code> - Replace <code>pReadyForWrite</code> with an uncompleted promise.</li>
</ol>
<p>The more complete logic now notifies the parked writer if the write succeeds
but reschedules another try if it doesn't. Roughly:</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">write</span><span class="p">(</span><span class="n">v</span> <span class="p">:</span> <span class="nc">T</span><span class="p">)</span> <span class="p">:</span> <span class="nc">Future</span><span class="p">[</span><span class="nc">Unit</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span>
<span class="kd">val</span> <span class="n">p</span> <span class="o">=</span> <span class="nc">Promise</span><span class="p">[</span><span class="nc">Unit</span><span class="p">]</span>
<span class="n">pReadyForwrite</span><span class="p">.</span><span class="n">future</span> <span class="p">{</span> <span class="n">tryWrite</span><span class="p">(</span><span class="n">v</span><span class="p">,</span><span class="n">p</span><span class="p">)</span> <span class="p">}</span>
<span class="n">p</span><span class="p">.</span><span class="n">future</span>
<span class="p">}</span>
<span class="k">private</span><span class="p">[</span><span class="bp">this</span><span class="p">]</span> <span class="k">def</span> <span class="nf">tryWrite</span><span class="p">(</span><span class="n">v</span><span class="p">:</span> <span class="nc">T</span><span class="p">,</span> <span class="n">p</span><span class="p">:</span> <span class="nc">Promise</span><span class="p">[</span><span class="nc">Unit</span><span class="p">])</span> <span class="p">:</span> <span class="nc">Unit</span> <span class="o">=</span> <span class="bp">this</span><span class="p">.</span><span class="k">synchronized</span> <span class="p">{</span>
<span class="n">b</span><span class="p">.</span><span class="n">put</span><span class="p">(</span><span class="n">v</span><span class="p">).</span>
<span class="n">map</span> <span class="p">{</span> <span class="n">br</span> <span class="o">=></span>
<span class="k">if</span><span class="p">(</span><span class="n">br</span><span class="p">.</span><span class="n">noLongerEmpty</span><span class="p">)</span> <span class="n">pReadyForRead</span><span class="p">.</span><span class="n">trySuccess</span><span class="p">(</span><span class="nc">Unit</span><span class="p">)</span>
<span class="k">if</span><span class="p">(</span><span class="n">br</span><span class="p">.</span><span class="n">noLongerEmpty</span><span class="p">)</span> <span class="n">pReadyForWrite</span> <span class="o">=</span> <span class="nc">Promise</span><span class="p">[</span><span class="nc">Unit</span><span class="p">]</span>
<span class="n">pNotify</span><span class="p">.</span><span class="n">success</span><span class="p">(</span><span class="nc">Unit</span><span class="p">)</span>
<span class="p">}</span>
<span class="n">orElse</span>
<span class="n">pReadyForWrite</span><span class="p">.</span><span class="n">future</span><span class="p">.</span><span class="n">map</span> <span class="p">{</span><span class="n">_</span> <span class="o">=></span> <span class="n">tryWrite</span><span class="p">(</span><span class="n">v</span><span class="p">,</span><span class="n">p</span><span class="p">)}</span>
<span class="p">}</span>
</code></pre></div>
<p>The corresponding <code>read</code> is similar:</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">read</span><span class="p">:</span> <span class="nc">Future</span><span class="p">[</span><span class="nc">T</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span>
<span class="kd">val</span> <span class="n">p</span> <span class="o">=</span> <span class="nc">Promise</span><span class="p">[</span><span class="nc">T</span><span class="p">]</span>
<span class="n">pReadyForRead</span><span class="p">.</span><span class="n">future</span> <span class="p">{</span> <span class="n">tryRead</span><span class="p">(</span><span class="n">p</span><span class="p">)</span> <span class="p">}</span>
<span class="n">p</span><span class="p">.</span><span class="n">future</span>
<span class="p">}</span>
<span class="k">private</span><span class="p">[</span><span class="bp">this</span><span class="p">]</span> <span class="k">def</span> <span class="nf">tryRead</span><span class="p">(</span><span class="n">pNotify</span><span class="p">:</span> <span class="nc">Promise</span><span class="p">[</span><span class="nc">T</span><span class="p">])</span> <span class="p">:</span> <span class="nc">Unit</span> <span class="o">=</span> <span class="bp">this</span><span class="p">.</span><span class="k">synchronized</span> <span class="p">{</span>
<span class="n">b</span><span class="p">.</span><span class="n">read</span><span class="p">.</span>
<span class="n">map</span> <span class="p">{</span> <span class="n">br</span> <span class="o">=></span>
<span class="k">if</span><span class="p">(</span><span class="n">br</span><span class="p">.</span><span class="n">noLongerEmpty</span><span class="p">)</span> <span class="n">pReadyForRead</span><span class="p">.</span><span class="n">trySuccess</span><span class="p">(</span><span class="nc">Unit</span><span class="p">)</span>
<span class="k">if</span><span class="p">(</span><span class="n">br</span><span class="p">.</span><span class="n">noLongerEmpty</span><span class="p">)</span> <span class="n">pReadyForWrite</span> <span class="o">=</span> <span class="nc">Promise</span><span class="p">[</span><span class="nc">Unit</span><span class="p">]</span>
<span class="n">pNotify</span><span class="p">.</span><span class="n">success</span><span class="p">(</span><span class="n">br</span><span class="p">.</span><span class="n">v</span><span class="p">)</span>
<span class="p">}</span>
<span class="n">orElse</span>
<span class="n">pReadyForRead</span><span class="p">.</span><span class="n">future</span><span class="p">.</span><span class="n">map</span> <span class="p">{</span><span class="n">_</span> <span class="o">=></span> <span class="n">tryRead</span><span class="p">(</span><span class="n">pNotify</span><span class="p">)}</span>
<span class="p">}</span>
</code></pre></div>
<p>Without a tremendous amount of overhead, what we have so far handles the basic case
of a single channel servicing multiple readers and writers.</p>
<h3>Timeout</h3>
<p>The <code>timeout</code> function in <code>core.async</code> simply creates a channel that delivers <code>nil</code>
some number of milliseconds in the future, so a delay is simply expressed by reading from
this channel. Whether you use this delay as the time limit for some operation or for some
other purpose is up to you. In Scala, the second argument of <code>Await.result</code> specifies
a timeout, after which the future being awaited will fail. This is crucial functionality,
but it's not quite the same thing. To replicate the <code>core.async</code> variety, it seems to
be necessary to go back to Java <code>Timer</code>s. The following could stand some optimization,
but it gets the job done:</p>
<div class="highlight"><pre><span></span><code> <span class="k">object</span> <span class="nc">Timeout</span> <span class="p">{</span>
<span class="kd">val</span> <span class="n">timer</span> <span class="o">=</span> <span class="k">new</span> <span class="nc">Timer</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">timeout</span><span class="p">(</span><span class="n">d</span><span class="p">:</span> <span class="nc">Duration</span><span class="p">):</span> <span class="nc">Future</span><span class="p">[</span><span class="nc">Unit</span><span class="p">]</span> <span class="o">=</span> <span class="n">timeout</span><span class="p">(</span><span class="n">d</span><span class="p">.</span><span class="n">toMillis</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">timeout</span><span class="p">[</span><span class="nc">T</span><span class="p">](</span><span class="n">d</span><span class="p">:</span> <span class="nc">Long</span><span class="p">,</span> <span class="n">v</span><span class="p">:</span> <span class="nc">T</span><span class="p">):</span> <span class="nc">Chan</span><span class="p">[</span><span class="nc">T</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span>
<span class="kd">val</span> <span class="n">c</span> <span class="o">=</span> <span class="nc">Chan</span><span class="p">[</span><span class="nc">T</span><span class="p">]</span>
<span class="kd">val</span> <span class="n">tt</span> <span class="o">=</span> <span class="k">new</span> <span class="nc">TimerTask</span><span class="p">()</span> <span class="p">{</span> <span class="k">def</span> <span class="nf">run</span> <span class="p">{</span> <span class="n">c</span><span class="p">.</span><span class="n">write</span><span class="p">(</span><span class="n">v</span><span class="p">)</span> <span class="p">}</span> <span class="p">}</span>
<span class="n">timer</span><span class="p">.</span><span class="n">schedule</span><span class="p">(</span><span class="n">tt</span><span class="p">,</span> <span class="n">d</span><span class="p">)</span>
<span class="n">c</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<h3>Alt</h3>
<p>Unfortunately, things get more complicated when we try to implement <code>alts!</code>.
The desired use case is along these lines:</p>
<div class="highlight"><pre><span></span><code> <span class="kd">val</span> <span class="n">c1</span> <span class="o">=</span> <span class="nc">Chan</span><span class="p">[</span><span class="nc">Int</span><span class="p">]</span>
<span class="kd">val</span> <span class="n">c2</span> <span class="o">=</span> <span class="nc">Chan</span><span class="p">[</span><span class="nc">String</span><span class="p">]</span>
<span class="c1">// Send integers at random intervals to c1</span>
<span class="n">async</span> <span class="p">{</span>
<span class="k">while</span><span class="p">(</span><span class="kc">true</span><span class="p">)</span> <span class="p">{</span>
<span class="kd">val</span> <span class="n">i</span> <span class="o">=</span> <span class="nc">Random</span><span class="p">.</span><span class="n">nextInt</span><span class="p">(</span><span class="mi">10</span><span class="p">)</span>
<span class="n">await</span><span class="p">{</span><span class="n">c1</span><span class="p">.</span><span class="n">write</span><span class="p">(</span><span class="n">i</span><span class="p">)}</span>
<span class="n">await</span><span class="p">{</span><span class="n">timeout</span><span class="p">((</span><span class="nc">Random</span><span class="p">.</span><span class="n">nextInt</span><span class="p">(</span><span class="mi">1000</span><span class="p">))</span> <span class="n">milliseconds</span><span class="p">).</span><span class="n">read</span><span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="c1">// Send strings at random intervals to c2</span>
<span class="n">async</span> <span class="p">{</span>
<span class="k">while</span><span class="p">(</span><span class="kc">true</span><span class="p">)</span> <span class="p">{</span>
<span class="kd">val</span> <span class="n">s</span> <span class="o">=</span> <span class="s">s"I am a string: </span><span class="si">${</span><span class="nc">Random</span><span class="p">.</span><span class="n">nextInt</span><span class="p">(</span><span class="mi">10</span><span class="p">)</span><span class="si">}</span><span class="s">"</span>
<span class="n">await</span><span class="p">{</span><span class="n">c2</span><span class="p">.</span><span class="n">write</span><span class="p">(</span><span class="n">s</span><span class="p">)}</span>
<span class="n">await</span><span class="p">{</span><span class="n">timeout</span><span class="p">((</span><span class="nc">Random</span><span class="p">.</span><span class="n">nextInt</span><span class="p">(</span><span class="mi">1000</span><span class="p">))</span> <span class="n">milliseconds</span><span class="p">).</span><span class="n">read</span><span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="c1">// Listen for the first delivery from c1, c2 and a timeout.</span>
<span class="kd">var</span> <span class="n">n</span> <span class="o">=</span> <span class="mi">100</span>
<span class="n">async</span> <span class="p">{</span>
<span class="k">while</span><span class="p">(</span><span class="n">n</span><span class="o">></span><span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
<span class="n">n</span><span class="o">=</span><span class="n">n</span><span class="o">-</span><span class="mi">1</span>
<span class="kd">val</span> <span class="n">tout</span> <span class="o">=</span> <span class="n">timeout</span><span class="p">(</span><span class="nc">Random</span><span class="p">.</span><span class="n">nextInt</span><span class="p">(</span><span class="mi">1000</span><span class="p">)</span> <span class="n">milliseconds</span><span class="p">)</span>
<span class="n">await</span><span class="p">(</span><span class="n">alts</span><span class="p">(</span><span class="n">c1</span><span class="p">,</span> <span class="n">c2</span><span class="p">,</span> <span class="n">tout</span><span class="p">))</span> <span class="k">match</span> <span class="p">{</span>
<span class="k">case</span> <span class="n">tout</span><span class="p">(</span><span class="n">_</span><span class="p">)</span> <span class="o">=></span> <span class="n">println</span><span class="p">(</span><span class="s">"Nothing this time."</span><span class="p">)</span>
<span class="k">case</span> <span class="n">c1</span><span class="p">(</span><span class="n">i</span><span class="p">)</span> <span class="o">=></span> <span class="n">println</span><span class="p">(</span><span class="s">s"Plus one is </span><span class="si">${</span><span class="n">i</span> <span class="o">+</span> <span class="mi">1</span><span class="si">}</span><span class="s">"</span><span class="p">)</span>
<span class="k">case</span> <span class="n">c2</span><span class="p">(</span><span class="n">s</span><span class="p">)</span> <span class="o">=></span> <span class="n">println</span><span class="p">(</span><span class="n">s</span> <span class="o">+</span> <span class="s">"score and seven"</span><span class="p">)</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<p>The first two <code>async</code> blocks are simple enough, but it isn't clear
how <code>alts</code> is going to work. What initially comes to mind is
<code>Future.firstCompletedOf(...)</code> which does about what it says on
the tin, but if we do the obvious thing,</p>
<div class="highlight"><pre><span></span><code> <span class="n">await</span><span class="p">(</span><span class="nc">Future</span><span class="p">.</span><span class="n">firstCompletedOf</span><span class="p">(</span><span class="nc">List</span><span class="p">(</span><span class="n">c1</span><span class="p">.</span><span class="n">read</span><span class="p">,</span><span class="n">c2</span><span class="p">.</span><span class="n">read</span><span class="p">,</span><span class="n">tout</span><span class="p">.</span><span class="n">read</span><span class="p">))</span>
</code></pre></div>
<p>we run into several problems. First, every time this line executes,
new <code>Future</code>s will be created for each channel, and we will be
ignoring all but the first to complete. The other two will now vie
for delivery on their respective channels, sucking whatever they
receive into oblivion. Second, we have no way of knowing which
channel won. Solving these two problems will require writing an
function that <code>Chan</code>s directly, causing a third problem: we
can't multiplex over channels of heterogeneous types. We can't even
cheat by taking <code>Chan[Any]</code> (as we could have with <code>Future[Any]</code>,
because <code>Chan[T]</code> is <code>T</code>
invariant, which is has to be as it supports mutating writes.</p>
<p>We'll solve these problems in reverse order:</p>
<h4>Multiplexing over heterogeneous types</h4>
<p>To solve the last problem, we're going to do something ghastly:</p>
<div class="highlight"><pre><span></span><code> <span class="k">type</span> <span class="nc">Pretender</span>
<span class="k">implicit</span> <span class="k">def</span> <span class="nf">ghastly</span><span class="p">[</span><span class="nc">T</span><span class="p">](</span><span class="n">c</span><span class="p">:</span> <span class="nc">Chan</span><span class="p">[</span><span class="nc">T</span><span class="p">]):</span> <span class="nc">Chan</span><span class="p">[</span><span class="nc">Pretender</span><span class="p">]</span> <span class="o">=</span> <span class="n">c</span><span class="p">.</span><span class="k">asInstanceOf</span><span class="p">[</span><span class="nc">Chan</span><span class="p">[</span><span class="nc">Pretender</span><span class="p">]]</span>
<span class="k">def</span> <span class="nf">alts</span><span class="p">(</span><span class="n">cs</span><span class="p">:</span> <span class="nc">Chan</span><span class="p">[</span><span class="nc">Pretender</span><span class="p">]</span><span class="o">*</span><span class="p">):</span> <span class="nc">Future</span><span class="p">[</span><span class="nc">Pretender</span><span class="p">]</span> <span class="o">=</span> <span class="o">???</span>
</code></pre></div>
<p>Now, <code>alts</code> will return a <code>Future[Pretender]</code>, which would
seem to be problematic, except that we're also planning on solving
problem 2, which means we'll know exactly how to cast it back, and
hopefully we can do that in a manner that makes mistakes unlikely. I
also contend that casting something to this made up class is less
horrible than casting to <code>Any</code>, since we limit the number of
things it can pretend to be.</p>
<h4>Identifying the returned channel</h4>
<p>So that we can identify the winning channel, we're going to go back
and rewrite the internals of <code>Chan</code> to deal in pairs</p>
<div class="highlight"><pre><span></span><code> <span class="k">case</span> <span class="k">class</span> <span class="nc">CV</span><span class="p">[</span><span class="nc">T</span><span class="p">](</span><span class="kd">val</span> <span class="n">c</span><span class="p">:</span> <span class="nc">Chan</span><span class="p">[</span><span class="nc">T</span><span class="p">],</span> <span class="kd">val</span> <span class="n">v</span><span class="p">:</span> <span class="nc">T</span><span class="p">)</span>
</code></pre></div>
<p>We could have used plain tuples, but a named case class makes the code a little prettier,
and there will be another use for having a real class a bit later.</p>
<p>E.g.</p>
<div class="highlight"><pre><span></span><code> <span class="k">private</span><span class="p">[</span><span class="bp">this</span><span class="p">]</span> <span class="k">def</span> <span class="nf">tryRead</span><span class="p">(</span><span class="n">pNotify</span><span class="p">:</span> <span class="nc">Promise</span><span class="p">[</span><span class="nc">CV</span><span class="p">[</span><span class="nc">T</span><span class="p">]])</span> <span class="p">{</span>
<span class="n">b</span><span class="p">.</span><span class="n">read</span><span class="p">.</span><span class="n">map</span>
<span class="p">{</span> <span class="n">br</span> <span class="o">=></span>
<span class="k">if</span><span class="p">(</span><span class="n">br</span><span class="p">.</span><span class="n">noLongerEmpty</span><span class="p">)</span> <span class="n">pReadyForRead</span><span class="p">.</span><span class="n">trySuccess</span><span class="p">(</span><span class="nc">Unit</span><span class="p">)</span>
<span class="k">if</span><span class="p">(</span><span class="n">br</span><span class="p">.</span><span class="n">noLongerEmpty</span><span class="p">)</span> <span class="n">pReadyForWrite</span> <span class="o">=</span> <span class="nc">Promise</span><span class="p">[</span><span class="nc">Unit</span><span class="p">]</span>
<span class="n">pNotify</span><span class="p">.</span><span class="n">success</span><span class="p">(</span><span class="nc">CV</span><span class="p">(</span><span class="bp">this</span><span class="p">,</span><span class="n">br</span><span class="p">.</span><span class="n">v</span><span class="p">))</span>
<span class="p">}</span>
<span class="n">orElse</span>
<span class="n">pReadyForRead</span><span class="p">.</span><span class="n">future</span><span class="p">.</span><span class="n">map</span> <span class="p">{</span><span class="n">_</span> <span class="o">=></span> <span class="n">tryRead</span><span class="p">(</span><span class="n">pNotify</span><span class="p">)}</span>
<span class="p">}</span>
</code></pre></div>
<p>The regular <code>read</code> method will <code>map(_.v)</code> to return just the
value, since disambiguation won't be necessary, but <code>alts</code> will be
able to return <code>CV</code> pairs:</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">alts</span><span class="p">(</span><span class="n">cs</span><span class="p">:</span> <span class="nc">Chan</span><span class="p">[</span><span class="nc">Pretender</span><span class="p">]</span><span class="o">*</span><span class="p">):</span> <span class="nc">Future</span><span class="p">[</span><span class="nc">CV</span><span class="p">[</span><span class="nc">Pretender</span><span class="p">]]</span> <span class="o">=</span> <span class="o">???</span>
</code></pre></div>
<p>Once we can identify the channel returned by <code>alts</code>, we can use that information to cast
the value to its correct <code>T</code>. The trick is to write a class (rather than object) specific
<code>unapply</code> method</p>
<div class="highlight"><pre><span></span><code> <span class="k">class</span> <span class="nc">Chan</span><span class="p">[</span><span class="nc">T</span><span class="p">]</span> <span class="p">(...)</span> <span class="p">{</span>
<span class="p">...</span>
<span class="k">def</span> <span class="nf">unapply</span><span class="p">(</span><span class="n">cv</span><span class="p">:</span> <span class="nc">CV</span><span class="p">[</span><span class="nc">Chan</span><span class="p">.</span><span class="nc">Pretender</span><span class="p">]):</span> <span class="nc">Option</span><span class="p">[</span><span class="nc">T</span><span class="p">]</span> <span class="o">=</span>
<span class="k">if</span> <span class="p">(</span><span class="n">cv</span><span class="p">.</span><span class="n">c</span> <span class="n">eq</span> <span class="bp">this</span><span class="p">)</span> <span class="p">{</span>
<span class="nc">Some</span><span class="p">(</span><span class="n">cv</span><span class="p">.</span><span class="n">v</span><span class="p">.</span><span class="k">asInstanceOf</span><span class="p">[</span><span class="nc">T</span><span class="p">])</span>
<span class="p">}</span> <span class="k">else</span> <span class="nc">None</span>
<span class="p">...</span>
<span class="p">}</span>
</code></pre></div>
<p>such that</p>
<div class="highlight"><pre><span></span><code> <span class="kd">val</span> <span class="n">c1</span> <span class="o">=</span> <span class="nc">Chan</span><span class="p">[</span><span class="nc">TheRightType</span><span class="p">]</span>
<span class="p">...</span>
<span class="k">case</span> <span class="n">c1</span><span class="p">(</span><span class="n">v</span><span class="p">)</span> <span class="o">=></span> <span class="p">...</span>
</code></pre></div>
<p>will only match a <code>cv:CV[Pretender]</code> if <code>cv.c</code> refers to the
same object as <code>c1</code>, and then <code>cv.v</code> will be cast to
<code>TheRightType</code> This is the equivalent of</p>
<div class="highlight"><pre><span></span><code> <span class="k">case</span> <span class="nc">CV</span><span class="p">(</span><span class="n">`c1`</span><span class="p">,</span><span class="n">_v</span><span class="p">)</span> <span class="o">=></span> <span class="p">{</span><span class="kd">val</span> <span class="n">v</span> <span class="o">=</span> <span class="n">_v</span><span class="p">.</span><span class="k">asInstanceOf</span><span class="p">[</span><span class="nc">TheRightType</span><span class="p">];</span> <span class="p">....</span> <span class="p">}</span>
</code></pre></div>
<p>except without ugly back-quotes and even uglier, error-prone casts. (Again, per the
philosophical rant, we do have a cast, it's just behind the curtain.)</p>
<h4>Saving the futures</h4>
<p>The final (first) problem is the hardest.</p>
<p>Instead of one promise per channel client, we want just one, to be
completed with the result from whichever channel is ready first.
We also need to be sure that, once this promise is fulfilled, the
losing channels will not attempt to fulfill it again or lose any
data as a result. Previously, this was not a risk: any given
<code>pNotify</code> would only be fulfilled by one <code>tryXXXX</code> lineage
(i.e. the original one, or one rescheduled with <code>pReadyForXXXX</code>.</p>
<p>This leads us to the notion of a <code>TentativePromise</code>, where an
attempt to fulfill might fail in one of two distinct ways:</p>
<ol>
<li>The promise was as yet uncompleted, but the buffer operation failed.</li>
<li>The promise was already completed, so we didn't even attempt the buffer operation.</li>
</ol>
<div class="highlight"><pre><span></span><code> <span class="k">object</span> <span class="nc">OfferResult</span> <span class="k">extends</span> <span class="nc">Enumeration</span> <span class="p">{</span>
<span class="k">type</span> <span class="nc">OfferResult</span> <span class="o">=</span> <span class="nc">Value</span>
<span class="kd">val</span> <span class="nc">AlreadyCompleted</span><span class="p">,</span> <span class="nc">DidComplete</span><span class="p">,</span> <span class="nc">DidNotComplete</span> <span class="o">=</span> <span class="nc">Value</span>
<span class="p">}</span>
<span class="k">import</span> <span class="nc">OfferResult</span><span class="p">.</span><span class="n">_</span>
<span class="k">class</span> <span class="nc">TentativePromise</span><span class="p">[</span><span class="nc">T</span><span class="p">]</span> <span class="p">{</span>
<span class="kd">val</span> <span class="n">p</span> <span class="o">=</span> <span class="nc">Promise</span><span class="p">[</span><span class="nc">T</span><span class="p">]</span>
<span class="k">def</span> <span class="nf">future</span><span class="p">:</span> <span class="n">scala</span><span class="p">.</span><span class="n">concurrent</span><span class="p">.</span><span class="nc">Future</span><span class="p">[</span><span class="nc">T</span><span class="p">]</span> <span class="o">=</span> <span class="n">p</span><span class="p">.</span><span class="n">future</span>
<span class="k">def</span> <span class="nf">tentativeOffer</span><span class="p">(</span><span class="n">o</span><span class="p">:</span> <span class="o">=></span> <span class="nc">Option</span><span class="p">[</span><span class="nc">T</span><span class="p">])</span> <span class="p">:</span> <span class="nc">OfferResult</span> <span class="o">=</span> <span class="bp">this</span><span class="p">.</span><span class="k">synchronized</span> <span class="p">{</span>
<span class="k">if</span><span class="p">(</span><span class="o">!</span><span class="n">p</span><span class="p">.</span><span class="n">isCompleted</span><span class="p">)</span> <span class="n">o</span> <span class="k">match</span> <span class="p">{</span>
<span class="k">case</span> <span class="nc">Some</span><span class="p">(</span><span class="n">t</span><span class="p">)</span> <span class="o">=></span> <span class="p">{</span><span class="n">p</span><span class="p">.</span><span class="n">success</span><span class="p">(</span><span class="n">t</span><span class="p">);</span> <span class="nc">DidComplete</span><span class="p">}</span>
<span class="k">case</span> <span class="nc">None</span> <span class="o">=></span> <span class="nc">DidNotComplete</span>
<span class="p">}</span>
<span class="k">else</span> <span class="nc">AlreadyCompleted</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<p>Note that the argument of <code>tentativeOffer</code> is lazy. In</p>
<div class="highlight"><pre><span></span><code> <span class="n">pNotify</span><span class="p">.</span><span class="n">tentativeOffer</span><span class="p">(</span><span class="n">b</span><span class="p">.</span><span class="n">take</span><span class="p">)</span>
</code></pre></div>
<p>the <code>b.take</code> operation won't be attempted unless the promise is
as yet uncompleted, and the promise won't be completed if
the <code>take</code> fails.</p>
<p>It turns out we'll need one more promisey sort of thing.
Suppose we had a channel that is used to communicate a halt from
a loop of repeated <code>alts</code>:</p>
<div class="highlight"><pre><span></span><code> <span class="n">async</span> <span class="p">{</span>
<span class="k">while</span><span class="p">(</span><span class="o">!</span><span class="n">halt</span><span class="p">)</span> <span class="p">{</span>
<span class="n">await</span><span class="p">(</span><span class="n">alts</span><span class="p">(</span><span class="n">haltChannel</span><span class="p">,</span> <span class="n">stuffChannel</span><span class="p">))</span> <span class="k">match</span> <span class="p">{</span>
<span class="k">case</span> <span class="n">stuffChannel</span><span class="p">(</span><span class="n">s</span><span class="p">)</span> <span class="o">=></span> <span class="n">doStuff</span><span class="p">(</span><span class="n">s</span><span class="p">)</span>
<span class="k">case</span> <span class="n">haltChannel</span><span class="p">(</span><span class="n">_</span><span class="p">)</span> <span class="o">=></span> <span class="n">halt</span><span class="o">=</span><span class="kc">true</span>
<span class="p">}}}</span>
</code></pre></div>
<p>There are repeated attempts to read from the
channel, but, by definition, never more than one write.
Were we to continue scheduling with conventional <code>Promise</code>s, ala</p>
<div class="highlight"><pre><span></span><code> <span class="n">pReadyForRead</span><span class="p">.</span><span class="n">future</span><span class="p">.</span><span class="n">map</span> <span class="p">{</span><span class="n">_</span> <span class="o">=></span> <span class="n">tryRead</span><span class="p">(</span><span class="n">pNotify</span><span class="p">)}</span>
</code></pre></div>
<p>we'd accumulate a future on <code>pReadyForRead</code> with every iteration, and
they wouldn't be cleaned up until the program's end. What we want is some way
to keep track of the futures and clean them up when <code>pNotify</code> is completed,
regardless of who completed it.</p>
<p>We'll use the <code>TentativePromise</code> in conjunction with an <code>IndirectPromise</code>,
whose purpose is to fulfill <code>TentativePromise</code>s. The <code>pReadyForXXXX</code> will
be <code>IndirectPromise</code>s, on which a raft of <code>TentativePromise</code>s may be depending,
e.g. for multiple <code>async</code> blocks each competitively <code>await</code>ing a read from
the same channel.
We'll use a new method, <code>futureOffer</code>, which makes a future
fulfillment action contingent on fulfillment being necessary</p>
<div class="highlight"><pre><span></span><code> <span class="n">pReadyForWrite</span><span class="p">.</span><span class="n">futureOffer</span><span class="p">(</span><span class="n">pDelivery</span><span class="p">){</span><span class="n">tryWrite</span><span class="p">(</span><span class="n">_</span><span class="p">)}}</span>
</code></pre></div>
<p>The <code>futureOffer</code> accumulates the completion functions in a map, but schedules
cleanup should the delivery promise be completed prematurely:</p>
<div class="highlight"><pre><span></span><code> <span class="k">class</span> <span class="nc">IndirectPromise</span><span class="p">[</span><span class="nc">T</span><span class="p">,</span><span class="nc">U</span><span class="p">]</span> <span class="k">extends</span> <span class="nc">Promise</span><span class="p">[</span><span class="nc">U</span><span class="p">]</span> <span class="p">{</span>
<span class="kd">val</span> <span class="n">p</span> <span class="o">=</span> <span class="nc">Promise</span><span class="p">[</span><span class="nc">U</span><span class="p">]</span>
<span class="k">type</span> <span class="nc">TP</span> <span class="o">=</span> <span class="nc">TentativePromise</span><span class="p">[</span><span class="nc">T</span><span class="p">]</span>
<span class="kd">val</span> <span class="n">h</span><span class="p">:</span> <span class="n">mutable</span><span class="p">.</span><span class="nc">HashMap</span><span class="p">[</span><span class="nc">TP</span><span class="p">,</span> <span class="nc">TP</span> <span class="o">=></span> <span class="nc">Unit</span><span class="p">]</span> <span class="o">=</span> <span class="k">new</span> <span class="n">mutable</span><span class="p">.</span><span class="nc">HashMap</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">futureOffer</span><span class="p">(</span><span class="n">pDeliver</span> <span class="p">:</span> <span class="nc">TP</span><span class="p">)(</span><span class="n">f</span><span class="p">:</span><span class="nc">TP</span><span class="o">=></span><span class="nc">Unit</span><span class="p">)</span> <span class="p">:</span> <span class="nc">Unit</span> <span class="o">=</span> <span class="bp">this</span><span class="p">.</span><span class="k">synchronized</span> <span class="p">{</span>
<span class="k">if</span><span class="p">(</span><span class="n">p</span><span class="p">.</span><span class="n">isCompleted</span><span class="p">)</span> <span class="p">{</span>
<span class="n">f</span><span class="p">(</span><span class="n">pDeliver</span><span class="p">)</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
<span class="n">h</span> <span class="o">+=</span> <span class="p">((</span><span class="n">pDeliver</span><span class="p">,</span> <span class="n">f</span><span class="p">))</span>
<span class="n">pDeliver</span><span class="p">.</span><span class="n">future</span><span class="p">.</span><span class="n">map</span> <span class="p">{</span><span class="n">_</span> <span class="o">=></span> <span class="bp">this</span><span class="p">.</span><span class="k">synchronized</span><span class="p">{</span><span class="n">h</span> <span class="o">-=</span> <span class="n">pDeliver</span><span class="p">}}</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<p>To extend the <code>Promise</code> trait, you need to implement <code>future</code>, <code>isCompleted</code> and
<code>tryComplete</code>. The first two are just delegated (so standard listeners are supported),
but the latter will loop over any remaining promise/completion pairs and attempt to run them:</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">future</span> <span class="o">=</span> <span class="n">p</span><span class="p">.</span><span class="n">future</span>
<span class="k">def</span> <span class="nf">isCompleted</span><span class="p">:</span> <span class="nc">Boolean</span> <span class="o">=</span> <span class="n">p</span><span class="p">.</span><span class="n">isCompleted</span>
<span class="k">def</span> <span class="nf">tryComplete</span><span class="p">(</span><span class="n">result</span><span class="p">:</span> <span class="n">scala</span><span class="p">.</span><span class="n">util</span><span class="p">.</span><span class="nc">Try</span><span class="p">[</span><span class="nc">U</span><span class="p">]):</span> <span class="nc">Boolean</span> <span class="o">=</span> <span class="bp">this</span><span class="p">.</span><span class="k">synchronized</span> <span class="p">{</span>
<span class="k">if</span><span class="p">(</span><span class="n">p</span><span class="p">.</span><span class="n">tryComplete</span><span class="p">(</span><span class="n">result</span><span class="p">))</span> <span class="p">{</span> <span class="c1">// fires any standard listeners</span>
<span class="n">h</span><span class="p">.</span><span class="n">foreach</span> <span class="p">{</span><span class="k">case</span> <span class="p">(</span><span class="n">pDeliver</span><span class="p">,</span><span class="n">f</span><span class="p">)</span> <span class="o">=></span> <span class="n">f</span><span class="p">(</span><span class="n">pDeliver</span><span class="p">)</span> <span class="p">}</span>
<span class="kc">true</span>
<span class="p">}</span> <span class="k">else</span> <span class="kc">false</span><span class="p">}</span>
</code></pre></div>
<p>Now, we'll rejigger <code>tryRead</code> and <code>tryWrite</code> to use our new promises.</p>
<div class="highlight"><pre><span></span><code> <span class="k">class</span> <span class="nc">Chan</span><span class="p">[</span><span class="nc">T</span><span class="p">](</span><span class="kd">val</span> <span class="n">b</span><span class="p">:</span> <span class="nc">ChanBuffer</span><span class="p">[</span><span class="nc">T</span><span class="p">])</span> <span class="p">{</span>
<span class="k">private</span> <span class="p">[</span><span class="bp">this</span><span class="p">]</span> <span class="kd">var</span> <span class="n">pReadyForWrite</span> <span class="o">=</span> <span class="nc">IndirectPromise</span><span class="p">.</span><span class="n">successful</span><span class="p">[</span><span class="nc">CV</span><span class="p">[</span><span class="nc">T</span><span class="p">],</span><span class="nc">Unit</span><span class="p">](</span><span class="nc">Unit</span><span class="p">)</span>
<span class="k">private</span> <span class="p">[</span><span class="bp">this</span> <span class="p">]</span> <span class="kd">var</span> <span class="n">pReadyForRead</span> <span class="o">=</span> <span class="nc">IndirectPromise</span><span class="p">[</span><span class="nc">CV</span><span class="p">[</span><span class="nc">T</span><span class="p">],</span><span class="nc">Unit</span><span class="p">]</span>
<span class="k">private</span><span class="p">[</span><span class="bp">this</span><span class="p">]</span> <span class="k">def</span> <span class="nf">tryWrite</span><span class="p">(</span><span class="n">v</span><span class="p">:</span> <span class="nc">T</span><span class="p">,</span> <span class="n">pNotify</span><span class="p">:</span> <span class="nc">TentativePromise</span><span class="p">[</span><span class="nc">CV</span><span class="p">[</span><span class="nc">T</span><span class="p">]])</span> <span class="p">:</span> <span class="nc">Unit</span> <span class="o">=</span> <span class="bp">this</span><span class="p">.</span><span class="k">synchronized</span> <span class="p">{</span>
<span class="kd">var</span> <span class="n">trigger</span> <span class="o">=</span> <span class="kc">false</span>
<span class="n">pNotify</span><span class="p">.</span><span class="n">tentativeOffer</span><span class="p">(</span><span class="n">b</span><span class="p">.</span><span class="n">put</span><span class="p">(</span><span class="n">v</span><span class="p">).</span><span class="n">map</span> <span class="p">{</span> <span class="n">br</span> <span class="o">=></span>
<span class="k">if</span> <span class="p">(</span><span class="n">br</span><span class="p">.</span><span class="n">noLongerEmpty</span><span class="p">)</span> <span class="n">trigger</span> <span class="o">=</span> <span class="kc">true</span>
<span class="k">if</span> <span class="p">(</span><span class="n">br</span><span class="p">.</span><span class="n">nowFull</span><span class="p">)</span> <span class="n">pReadyForWrite</span> <span class="o">=</span> <span class="nc">IndirectPromise</span><span class="p">[</span><span class="nc">CV</span><span class="p">[</span><span class="nc">T</span><span class="p">],</span><span class="nc">Unit</span><span class="p">]</span>
<span class="nc">CV</span><span class="p">(</span><span class="bp">this</span><span class="p">,</span><span class="n">v</span><span class="p">)</span>
<span class="p">})</span> <span class="k">match</span> <span class="p">{</span>
<span class="k">case</span> <span class="nc">DidNotComplete</span> <span class="o">=></span> <span class="n">pReadyForWrite</span><span class="p">.</span><span class="n">futureOffer</span><span class="p">(</span><span class="n">pNotify</span><span class="p">){</span><span class="n">tryWrite</span><span class="p">(</span><span class="n">v</span><span class="p">,</span><span class="n">_</span><span class="p">)}</span>
<span class="k">case</span> <span class="nc">DidComplete</span> <span class="o">=></span> <span class="p">()</span>
<span class="k">case</span> <span class="nc">AlreadyCompleted</span> <span class="o">=></span> <span class="p">()</span>
<span class="p">}</span>
<span class="k">if</span><span class="p">(</span><span class="n">trigger</span><span class="p">)</span> <span class="n">pReadyForRead</span><span class="p">.</span><span class="n">trySuccess</span><span class="p">(</span><span class="nc">Unit</span><span class="p">)</span>
<span class="p">}</span>
<span class="k">private</span><span class="p">[</span><span class="bp">this</span><span class="p">]</span> <span class="k">def</span> <span class="nf">tryRead</span><span class="p">(</span><span class="n">pNotify</span><span class="p">:</span> <span class="nc">TentativePromise</span><span class="p">[</span><span class="nc">CV</span><span class="p">[</span><span class="nc">T</span><span class="p">]]):</span> <span class="nc">Unit</span> <span class="o">=</span> <span class="bp">this</span><span class="p">.</span><span class="k">synchronized</span> <span class="p">{</span>
<span class="kd">var</span> <span class="n">trigger</span> <span class="o">=</span> <span class="kc">false</span>
<span class="n">pNotify</span><span class="p">.</span><span class="n">tentativeOffer</span><span class="p">(</span><span class="n">b</span><span class="p">.</span><span class="n">take</span><span class="p">.</span><span class="n">map</span> <span class="p">{</span><span class="n">br</span> <span class="o">=></span>
<span class="k">if</span><span class="p">(</span><span class="n">br</span><span class="p">.</span><span class="n">noLongerFull</span><span class="p">)</span> <span class="n">trigger</span> <span class="o">=</span> <span class="kc">true</span>
<span class="k">if</span><span class="p">(</span><span class="n">br</span><span class="p">.</span><span class="n">nowEmpty</span><span class="p">)</span> <span class="n">pReadyForRead</span> <span class="o">=</span> <span class="nc">IndirectPromise</span><span class="p">[</span><span class="nc">CV</span><span class="p">[</span><span class="nc">T</span><span class="p">],</span><span class="nc">Unit</span><span class="p">]</span>
<span class="nc">CV</span><span class="p">(</span><span class="bp">this</span><span class="p">,</span><span class="n">br</span><span class="p">.</span><span class="n">v</span><span class="p">)</span>
<span class="p">})</span> <span class="k">match</span> <span class="p">{</span>
<span class="k">case</span> <span class="nc">DidNotComplete</span> <span class="o">=></span> <span class="n">pReadyForRead</span><span class="p">.</span><span class="n">future</span> <span class="n">map</span> <span class="p">{</span><span class="n">_</span> <span class="o">=></span> <span class="n">tryRead</span><span class="p">(</span><span class="n">pNotify</span><span class="p">)}</span>
<span class="k">case</span> <span class="nc">DidComplete</span> <span class="o">=></span> <span class="p">()</span>
<span class="k">case</span> <span class="nc">AlreadyCompleted</span> <span class="o">=></span> <span class="p">()</span>
<span class="p">}</span>
<span class="k">if</span><span class="p">(</span><span class="n">trigger</span><span class="p">)</span> <span class="n">pReadyForWrite</span><span class="p">.</span><span class="n">trySuccess</span><span class="p">(</span><span class="nc">Unit</span><span class="p">)</span>
<span class="p">}</span>
<span class="p">...</span>
<span class="p">}</span>
</code></pre></div>
<h4>The two R's</h4>
<p>One last item is that we'd like <code>alts</code> to allow both reads and writes, just as Clojure's does, e.g.</p>
<div class="highlight"><pre><span></span><code> <span class="n">alts</span><span class="p">(</span><span class="n">c1</span><span class="p">,</span><span class="nc">CV</span><span class="p">(</span><span class="n">c2</span><span class="p">,</span><span class="s">"output"</span><span class="p">))</span> <span class="k">match</span> <span class="p">{...}</span>
</code></pre></div>
<p>We require another redefinition of <code>alts</code> to take a new <code>ChanHolder</code> trait
and implementations for both <code>Chan</code> and <code>CV</code>.</p>
<div class="highlight"><pre><span></span><code> <span class="k">sealed</span> <span class="k">trait</span> <span class="nc">ChanHolder</span><span class="p">[</span><span class="nc">T</span><span class="p">]</span> <span class="p">{</span>
<span class="k">def</span> <span class="nf">chan</span> <span class="p">:</span> <span class="nc">Chan</span><span class="p">[</span><span class="nc">T</span><span class="p">]</span>
<span class="p">}</span>
<span class="k">case</span> <span class="k">class</span> <span class="nc">CV</span><span class="p">[</span><span class="nc">T</span><span class="p">](</span><span class="kd">val</span> <span class="n">c</span><span class="p">:</span> <span class="nc">Chan</span><span class="p">[</span><span class="nc">T</span><span class="p">],</span> <span class="kd">val</span> <span class="n">v</span><span class="p">:</span> <span class="nc">T</span><span class="p">)</span> <span class="k">extends</span> <span class="nc">ChanHolder</span><span class="p">[</span><span class="nc">T</span><span class="p">]</span> <span class="p">{</span>
<span class="k">def</span> <span class="nf">chan</span> <span class="o">=</span> <span class="n">c</span>
<span class="p">}</span>
<span class="k">class</span> <span class="nc">Chan</span><span class="p">[</span><span class="nc">T</span><span class="p">](...)</span> <span class="k">extends</span> <span class="nc">ChanHolder</span><span class="p">[</span><span class="nc">T</span><span class="p">]</span> <span class="p">{</span>
<span class="k">def</span> <span class="nf">chan</span> <span class="o">=</span> <span class="bp">this</span>
<span class="p">...</span>
<span class="p">}</span>
<span class="k">def</span> <span class="nf">alts</span><span class="p">(</span><span class="n">cs</span><span class="p">:</span> <span class="nc">ChanHolder</span><span class="p">[</span><span class="nc">Pretender</span><span class="p">]</span><span class="o">*</span><span class="p">):</span> <span class="nc">Future</span><span class="p">[</span><span class="nc">CV</span><span class="p">[</span><span class="nc">Pretender</span><span class="p">]]</span>
</code></pre></div>
<h3>alt-together now</h3>
<p>We're finally in a position to write the new and improved <code>Chan.alts</code>:</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">alts</span><span class="p">(</span><span class="n">cs</span><span class="p">:</span> <span class="nc">ChanHolder</span><span class="p">[</span><span class="nc">Pretender</span><span class="p">]</span><span class="o">*</span><span class="p">):</span> <span class="nc">Future</span><span class="p">[</span><span class="nc">CV</span><span class="p">[</span><span class="nc">Pretender</span><span class="p">]]</span> <span class="o">=</span> <span class="p">{</span>
<span class="kd">val</span> <span class="n">p</span> <span class="o">=</span> <span class="nc">Promise</span><span class="p">[</span><span class="nc">CV</span><span class="p">[</span><span class="nc">Pretender</span><span class="p">]]</span>
<span class="n">cs</span><span class="p">.</span><span class="n">foreach</span> <span class="p">{</span> <span class="n">_</span> <span class="k">match</span> <span class="p">{</span>
<span class="k">case</span> <span class="n">c</span> <span class="p">:</span> <span class="nc">Chan</span><span class="p">[</span><span class="nc">Pretender</span><span class="p">]</span> <span class="o">=></span> <span class="n">c</span><span class="p">.</span><span class="n">chan</span><span class="p">.</span><span class="n">read</span><span class="p">(</span><span class="n">p</span><span class="p">)</span>
<span class="k">case</span> <span class="nc">CV</span><span class="p">(</span><span class="n">c</span><span class="p">,</span><span class="n">v</span><span class="p">)</span> <span class="o">=></span> <span class="n">c</span><span class="p">.</span><span class="n">chan</span><span class="p">.</span><span class="n">write</span><span class="p">(</span><span class="n">v</span><span class="p">,</span><span class="n">p</span><span class="p">)</span>
<span class="p">}}</span>
<span class="n">p</span><span class="p">.</span><span class="n">future</span>
<span class="p">}</span>
</code></pre></div>
<h3>What's next</h3>
<p>As previously backpedaled,
there's a bit too much <code>if</code>, <code>synchronized</code> and <code>var</code> for comfort.
I want to continue noodling around with different chaining techniques.
With or without stylistic improvements, the code would benefit from a
concurrency torture test, which even sounds like fun.</p>
<div class="footnote">
<hr>
<ol>
<li id="fn:ctchan">
<p>Note that <code>core.typed</code> does provide a polymorphic <code>Chan</code> annotation, but it cannot handle heterogeneous channel types in <code>alts!</code>. <a class="footnote-backref" href="#fnref:ctchan" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
</ol>
</div>Look at the Pie. Or my new, crappy watch I made myself.2015-01-05T00:00:00-05:002015-01-05T00:00:00-05:00Peter Fraenkeltag:blog.podsnap.com,2015-01-05:/pebblepie.html<p><img src="images/pie.jpg" width=400/> <img src="images/exactitude.jpg" width=340/> </p>
<p>Mutt: What will be the appropriate designation for the upcoming period of solar rotation in the Anno Domini system as devised by St. Dionysius Exiguus?</p>
<p>Jeff: Why don't you just look at the pie?</p>
<p>Mutt: Couldn't you just answer the question? Why do you always have to be a sarcastic asshole?</p>
<p>Jeff: Seriously, look at the pie. </p>
<p>Mutt: Huh. Yeah. I see what you mean. Sorry dude. </p>
<p>Jeff: No problem, my tinhorn peer. It is after all New Years. </p>
<p>Ens.: And to all a good night.</p>
<h3>TL;DR</h3>
<p>Consciously uncouple from the yoke of technological precision, using nothing
but precision …</p><p><img src="images/pie.jpg" width=400/> <img src="images/exactitude.jpg" width=340/> </p>
<p>Mutt: What will be the appropriate designation for the upcoming period of solar rotation in the Anno Domini system as devised by St. Dionysius Exiguus?</p>
<p>Jeff: Why don't you just look at the pie?</p>
<p>Mutt: Couldn't you just answer the question? Why do you always have to be a sarcastic asshole?</p>
<p>Jeff: Seriously, look at the pie. </p>
<p>Mutt: Huh. Yeah. I see what you mean. Sorry dude. </p>
<p>Jeff: No problem, my tinhorn peer. It is after all New Years. </p>
<p>Ens.: And to all a good night.</p>
<h3>TL;DR</h3>
<p>Consciously uncouple from the yoke of technological precision, using nothing
but precision technology and simple calculus.</p>
<p>Also, some quick pointers on getting started with the Pebble SDK.</p>
<h3>Chronotourism and the Modern Condition</h3>
<p><img src="images/leapsecond.png" width=50%/></p>
<p>Leap seconds sure are something, right?
Kith and kin, pauper and prince, all glued to their
cesium synced devices, each of course
making a personal decision about
when to begin bating his or her breath, but all sharing equally in the glory
of that soon-to-be-slightly-improved coherence
between Coordinated Universal Time and Mean Solar Time.</p>
<p>If you're a certain kind of computer program, the business of
<a href="https://en.wikipedia.org/wiki/Leap_second">when and if</a> leap seconds
occur can't be ignored, but, if you're a nightowl with state-of-the-art
chronometry astride your wrist, or NTP packets fluttering on bands of LTE in the vicinity
of your buttocks, it's all about
the thrill. With no suitable target for our temporal attentions since June of 2012, it's no
wonder that some of us have turned to drink.</p>
<h3>Exactitude</h3>
<p>The era of big game may be over, but at least, should the trains at some point run
on time, we'll be ready. For one thing, the Swiss sell us more than $20B worth wristwatches
every year. I used to think it was odd that, Switzerland being one of the only places where
the trains actually do run on time, they export 95% of their production, but I've been told
that the trains run not just on time but frequently,
so it really doesn't matter when you show up. When you're flying United Airlines operated by
ExpressJet - DBA United Express, every second counts, or at least you
count every second, so remember not to leave that extremely expensive hunk of exotic alloy
in the gunk at the bottom of a TSA tray.</p>
<p>I won't have that problem, because I am too cool for words.
Well, I will have the United-Airlines-operated-by-ExpressJet- DBA-United-Express problem,
but my TSA gunk will be adhering to the handsome, tomato-red plastic exterior of a refurbished
first-generation Pebble... that just happens maybe to be displaying a convincing simulacrum of
one of those fine Swiss things.</p>
<p><img src="images/rolex.png" width=100/><img src="images/pebbles.png" width=100><img src="images/pebble-swiss.png" width=100><img src="images/pebble_red.jpg" width=110/></p>
<h3>I want my SDK</h3>
<p>Maybe. You see, the Pebble also comes with a so-called SDK. That is a technical term, in
this case best translated as the reason well enough will not be let alone. There's also a whole
<a href="https://github.com/pebble/pebble-sdk-examples">repo</a> of example watch faces and apps, just begging
to be contorted inanely by people who are supposed to have better things to do.</p>
<p>Depending on how trusting a soul you are, it will take between 0.5 and 5 minutes to
<a href="http://developer.getpebble.com/sdk/">download</a> and install the tools. Their
<code>curl | sh</code> maneuver is pretty sane overall, even using <code>virtualenv</code> to avoid
polluting your python installation, but it does try to slip in an
<code>echo $HEINOUSNESS >> "$HOME/.bash_profile"</code>, so I preferred to download the script
and follow along manually.</p>
<p>Then you just cd to one of the sample directories and</p>
<div class="highlight"><pre><span></span><code> pebble build
</code></pre></div>
<p>Even to do conventional things with the watch, you'll have already had to install Pebble
app on your phone, and it has a "developer mode" option, that listens on a port for things
like</p>
<div class="highlight"><pre><span></span><code> pebble install --phone <span class="si">${</span><span class="nv">PHONE_IP_ADDRESS</span><span class="si">}</span>
</code></pre></div>
<p>Nb.</p>
<ol>
<li>You probably don't want to do this on a public network.</li>
<li>Also, there are other ways to skin this here onion, including
an entirely cloud based IDE, but I didn't want my CPU to get cold and lonely, and I can't
imagine that a nice person like you does either.</li>
<li>Other than an abandoned private effort, there's no emulator available.</li>
</ol>
<h3>The Clunky Old Watch Face</h3>
<blockquote>
<p>Be transported to magical olden times when you strap on this beauty. This watch loses or gains several minutes a day, depending on how much it's been wound, where winding is accomplished by switching out and back into the face. Additionally, it believes that every month has 31 days.</p>
</blockquote>
<p>You can actually get it <a href="https://apps.getpebble.com/applications/54a708610cd93626c90000da">at the pebble app store</a>, and it looks like this:</p>
<p><img src="images/crappy-watch.png" width=150px></p>
<p>Little <em>ab initio</em> coding was involved, just a few tweaks of the the <a href="https://github.com/pebble/pebble-sdk-examples/tree/master/watchfaces/simple_analog">simple_analog</a>
watchface from Pebble's SDK examples.</p>
<h3>Rules of disengagement</h3>
<ol>
<li>The winding level of the watch $w$ varies between 0 and 1. </li>
<li>Other things being equal, the watch unwinds at a constant rate $dw/dt = -\upsilon$ until it hits $w=0$.</li>
<li>While $w>0$, watch displays $t+s(t)$, where $t$ is the actual time, and $s$ is a time-varying skew.
When $w=0$, it displays $t_z+s(t_z)$, where $t_z$ is moment $w$ hit zero.</li>
<li>When $w>0$, skew changes at a rate proportional to the winding level in excess of a bias-point: $ds/dt = \alpha (w - w_b)$;
when $w=0$, $ds/dt=0$.</li>
<li>The minute hand should be a little crooked.</li>
<li>$w$ increases by $\Delta w$ each time the watch face is unloaded and reloaded, subject to $w\leq 1$.</li>
<li>At time $t_f$, when the watch is fully wound and $w=1$, we reset $s=0$.</li>
<li>The displayed day of the month shall be $d_f + (t-t_f)/\tau_d \mod 31$, where $d_f$ is the correct
date at $t_f$ and $\tau_d$ is the length of a day.</li>
<li>The day of the month should be a random sequence of letters.</li>
<li>The apparent brand of the watch shall be <strong>BOFFO</strong>.</li>
<li>The sweep second hand does not change continuously but jumps every $\tau_j$ to $t + \tau_j + U([0,\tau_j])$,</li>
</ol>
<p>Qualitatively, the watch gains time when the mainspring is tight, but loses time as the spring loosens.</p>
<h3>Precision inaccuracy</h3>
<p>Now, we could simulate the evolution of skew by incrementing the various state variables at small time
intervals, but this post is about precision, and, goddamnit, our inaccuracy is going to be precise.
Even more importantly (if you can imagine that), the user might switch to some other, inferior watch face
for a spell, and it could potentially take them hours to realize the error of their ways and switch back.</p>
<p>Given the state $s_0$, $w_0$ as of $t_0$, we can integrate directly:</p>
<p>$$\begin{eqnarray}
s_t &=& s_0 + \int_{t_0}^t \alpha (w-w_b) dt' \
&=& s_0 + \int_{t_0}^t \alpha (w_0 - \upsilon (t'-t_0) -w_b) dt' \
&=& s_0 + \alpha [w_0 + \upsilon t_0 - w_b] (t-t_0) - \frac{1}{2} \alpha \upsilon (t-t_0)(t+t_0) \
&=& s_0 + \alpha (t-t_0) [(w_0-w_b) - \frac{1}{2} \upsilon (t-t_0)] \
\end{eqnarray}$$</p>
<p>Of course, the clock stops when $w$ will hit zero at</p>
<p>$$t_1 = t_0 + w_0/\upsilon$$</p>
<p>so if $t_1<t$, we'll use it instead of $t$ in the expression for skew.</p>
<p>With the winding bias $w_b=0.6$, unwind rate $\upsilon=0.04/{hr}$ and
skew/winding coefficient $\alpha=2 {min}/{day}$, skew evolves like this:</p>
<p><img src="images/time-skew.png"></p>
<h3>Skeleton code</h3>
<p>In the basic watch face you are required to have a <code>main</code> method, and it
should probably call <code>app_event_loop()</code>:</p>
<div class="highlight"><pre><span></span><code><span class="cm">/* Lots</span>
<span class="cm"> of</span>
<span class="cm"> file</span>
<span class="cm"> static</span>
<span class="cm"> variables */</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span>
<span class="cm">/* Initialize stuff. */</span>
<span class="n">app_event_loop</span><span class="p">();</span> <span class="cm">/* built-in function */</span>
<span class="cm">/* Clean up stuff */</span>
<span class="p">}</span>
</code></pre></div>
<p>The initialization/clean-up will comprise:</p>
<ol>
<li>Recovering/persisting the various $x_0$ values.</li>
<li>Setting up and tearing down the graphics objects.</li>
<li>Scheduling/canceling callbacks.</li>
</ol>
<p>None of the examples set a return value, so it's apparently discarded.</p>
<h3>Workflow</h3>
<p>Since it's not possible to run watch code except on the watch itself, I recommend an incremental approach,
with frequent re-installations and lots of logging. There's a handy macro for the latter,</p>
<div class="highlight"><pre><span></span><code> <span class="n">APP_LOG</span><span class="p">(</span><span class="n">APP_LOG_LEVEL_DEBUG</span><span class="p">,</span> <span class="s">"Minute tick. Setting skew."</span><span class="p">);</span>
</code></pre></div>
<p>messages from which can be viewed with </p>
<div class="highlight"><pre><span></span><code> pebble logs --phone <span class="si">${</span><span class="nv">PHONE_IP_ADDRESS</span><span class="si">}</span>
</code></pre></div>
<p>The whole process is pleasantly retro. I pity the style-conscious fops who will
find it necessary to design apps for the iWatch.</p>
<h3>State</h3>
<p>Our watch face "app" is not going to be running continuously; indeed, absent any
better metaphor for winding, we will be requiring the user to exit and re-launch
the face using the little up/down buttons on the side of the watch. During the
painful hiatuses, we'll need to store the various $0$ subscripted values persistently.
The SDK provides for this sort of thing via</p>
<div class="highlight"><pre><span></span><code> <span class="kt">void</span> <span class="nf">persist_write_data</span><span class="p">(</span><span class="kt">int32_t</span> <span class="n">key</span><span class="p">,</span> <span class="k">const</span> <span class="kt">void</span> <span class="o">*</span><span class="n">data</span><span class="p">,</span> <span class="kt">size_t</span> <span class="n">len</span><span class="p">);</span>
<span class="kt">void</span> <span class="nf">persist_read_data</span><span class="p">(</span><span class="kt">int32_t</span> <span class="n">key</span><span class="p">,</span> <span class="kt">void</span> <span class="o">*</span><span class="n">data</span><span class="p">,</span> <span class="kt">size_t</span> <span class="n">len</span><span class="p">);</span>
</code></pre></div>
<p>with some specialized forms, for what are supposed to be the most
commonly stored data types. The <code>key</code> is arbitrary, but obviously
unique within our watchface. Since, at least once (more, if the user
deletes us but subsequently repents of this hasty decision), the face
will have no stored data, we must detect and deal with that situation.
Thus, for example:</p>
<div class="highlight"><pre><span></span><code><span class="k">static</span> <span class="k">const</span> <span class="kt">uint32_t</span> <span class="n">WLEVEL_KEY</span> <span class="o">=</span> <span class="mi">4</span><span class="p">;</span>
<span class="k">static</span> <span class="kt">double</span> <span class="n">w0</span><span class="p">;</span> <span class="c1">// initial winding level</span>
<span class="k">static</span> <span class="kt">double</span> <span class="n">w</span><span class="p">;</span> <span class="c1">// evolving winding level</span>
<span class="c1">// ...</span>
<span class="k">if</span><span class="p">(</span><span class="n">persist_exists</span><span class="p">(</span><span class="n">WLEVEL_KEY</span><span class="p">)</span> <span class="p">{</span>
<span class="n">persist_read_data</span><span class="p">(</span><span class="n">WLEVEL_KEY</span><span class="p">,</span><span class="o">&</span><span class="n">w0</span><span class="p">,</span><span class="k">sizeof</span><span class="p">(</span><span class="n">w0</span><span class="p">));</span>
<span class="n">w</span> <span class="o">=</span> <span class="n">w0</span><span class="p">;</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
<span class="n">w</span> <span class="o">=</span> <span class="n">w0</span> <span class="o">=</span> <span class="mf">1.0</span><span class="p">;</span>
<span class="p">}</span>
<span class="c1">// ...</span>
<span class="n">persist_write_data</span><span class="p">(</span><span class="n">WLEVEL_KEY</span><span class="p">,</span><span class="o">&</span><span class="n">w</span><span class="p">,</span><span class="k">sizeof</span><span class="p">(</span><span class="n">w</span><span class="p">));</span>
</code></pre></div>
<p>We'll deal with the gradual unwinding of $w$ in callback code below.</p>
<h3>Graphics</h3>
<p>The graphics code is pretty standard, though somewhat more verbose than is necessary in
fancy languages. You start by creating a window and setting callbacks for loading and
unloading it:</p>
<div class="highlight"><pre><span></span><code> <span class="n">window</span> <span class="o">=</span> <span class="n">window_create</span><span class="p">();</span>
<span class="n">window_set_window_handlers</span><span class="p">(</span><span class="n">window</span><span class="p">,</span> <span class="p">(</span><span class="n">WindowHandlers</span><span class="p">)</span> <span class="p">{</span>
<span class="p">.</span><span class="n">load</span> <span class="o">=</span> <span class="n">window_load</span><span class="p">,</span>
<span class="p">.</span><span class="n">unload</span> <span class="o">=</span> <span class="n">window_unload</span><span class="p">,</span>
<span class="p">});</span>
</code></pre></div>
<p>We'll also build up little graphics objects, like a pointy minute hand,</p>
<div class="highlight"><pre><span></span><code><span class="k">const</span> <span class="n">GPathInfo</span> <span class="n">MINUTE_HAND_POINTS</span> <span class="o">=</span>
<span class="p">{</span> <span class="mi">5</span><span class="p">,</span> <span class="p">(</span><span class="n">GPoint</span> <span class="p">[])</span> <span class="p">{{</span> <span class="mi">-8</span><span class="p">,</span> <span class="mi">20</span> <span class="p">},</span> <span class="p">{</span> <span class="mi">8</span><span class="p">,</span> <span class="mi">20</span> <span class="p">},</span> <span class="p">{</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">-30</span><span class="p">},</span> <span class="p">{</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">-80</span> <span class="p">},</span> <span class="p">{</span> <span class="mi">-8</span><span class="p">,</span> <span class="mi">-30</span><span class="p">}</span> <span class="p">}};</span>
<span class="n">minute_arrow</span> <span class="o">=</span> <span class="n">gpath_create</span><span class="p">(</span><span class="o">&</span><span class="n">MINUTE_HAND_POINTS</span><span class="p">);</span>
</code></pre></div>
<p>which would be even simpler if we hadn't added extra points to make it jagged.</p>
<h4>Set up a tree of layers</h4>
<p>In <code>window_load</code>, we define a tree of "layers", which at this point have no content
other than further callbacks that will render them when they, or a parent layer, are
marked dirty.</p>
<div class="highlight"><pre><span></span><code><span class="n">Window</span> <span class="o">*</span><span class="n">window</span><span class="p">;</span>
<span class="n">Layer</span> <span class="o">*</span><span class="n">simple_bg_layer</span><span class="p">;</span>
<span class="n">Layer</span> <span class="o">*</span><span class="n">hands_layer</span><span class="p">;</span>
<span class="n">Layer</span> <span class="o">*</span><span class="n">date_layer</span><span class="p">;</span>
<span class="n">TextLayer</span> <span class="o">*</span><span class="n">num_label</span><span class="p">;</span>
<span class="kt">char</span> <span class="n">num_buffer</span><span class="p">[</span><span class="mi">4</span><span class="p">];</span>
<span class="k">static</span> <span class="kt">void</span> <span class="nf">window_load</span><span class="p">(</span><span class="n">Window</span> <span class="o">*</span><span class="n">window</span><span class="p">)</span> <span class="p">{</span>
<span class="c1">// Root layer</span>
<span class="n">Layer</span> <span class="o">*</span><span class="n">window_layer</span> <span class="o">=</span> <span class="n">window_get_root_layer</span><span class="p">(</span><span class="n">window</span><span class="p">);</span>
<span class="n">GRect</span> <span class="n">bounds</span> <span class="o">=</span> <span class="n">layer_get_bounds</span><span class="p">(</span><span class="n">window_layer</span><span class="p">);</span>
<span class="c1">// Child 1: the watchface background</span>
<span class="n">simple_bg_layer</span> <span class="o">=</span> <span class="n">layer_create</span><span class="p">(</span><span class="n">bounds</span><span class="p">);</span>
<span class="n">layer_set_update_proc</span><span class="p">(</span><span class="n">simple_bg_layer</span><span class="p">,</span> <span class="n">bg_update_proc</span><span class="p">);</span>
<span class="n">layer_add_child</span><span class="p">(</span><span class="n">window_layer</span><span class="p">,</span> <span class="n">simple_bg_layer</span><span class="p">);</span>
<span class="c1">// Child 2: the hands</span>
<span class="n">hands_layer</span> <span class="o">=</span> <span class="n">layer_create</span><span class="p">(</span><span class="n">bounds</span><span class="p">);</span>
<span class="n">layer_set_update_proc</span><span class="p">(</span><span class="n">hands_layer</span><span class="p">,</span> <span class="n">hands_update_proc</span><span class="p">);</span>
<span class="n">layer_add_child</span><span class="p">(</span><span class="n">window_layer</span><span class="p">,</span> <span class="n">hands_layer</span><span class="p">);</span>
<span class="c1">// Child 3: the date</span>
<span class="n">date_layer</span> <span class="o">=</span> <span class="n">layer_create</span><span class="p">(</span><span class="n">bounds</span><span class="p">);</span>
<span class="n">layer_set_update_proc</span><span class="p">(</span><span class="n">date_layer</span><span class="p">,</span> <span class="n">date_update_proc</span><span class="p">);</span>
<span class="n">layer_add_child</span><span class="p">(</span><span class="n">window_layer</span><span class="p">,</span> <span class="n">date_layer</span><span class="p">);</span>
<span class="c1">// Child 1 of date: the day of the month</span>
<span class="n">num_label</span> <span class="o">=</span> <span class="n">text_layer_create</span><span class="p">(</span><span class="n">GRect</span><span class="p">(</span><span class="mi">73</span><span class="p">,</span> <span class="mi">114</span><span class="p">,</span> <span class="mi">18</span><span class="p">,</span> <span class="mi">20</span><span class="p">));</span>
<span class="n">text_layer_set_fonts_and_stuff_like_that</span><span class="p">(</span><span class="n">num_label</span> <span class="p">...)</span>
<span class="n">layer_add_child</span><span class="p">(</span><span class="n">date_layer</span><span class="p">,</span> <span class="n">text_layer_get_layer</span><span class="p">(</span><span class="n">num_label</span><span class="p">));</span>
<span class="c1">// Child 2 of date: the day of the week</span>
<span class="c1">// ... you can imagine this</span>
<span class="c1">// more stuff</span>
<span class="p">}</span>
</code></pre></div>
<h4>Individual layer callbacks</h4>
<p>The main business happens in the various <code>...update_proc</code> functions. Here, for example,
we can maintain the date layer and its children, using familiar <code>stdlib</code> utilities:</p>
<div class="highlight"><pre><span></span><code><span class="k">static</span> <span class="kt">void</span> <span class="nf">date_update_proc</span><span class="p">(</span><span class="n">Layer</span> <span class="o">*</span><span class="n">layer</span><span class="p">,</span> <span class="n">GContext</span> <span class="o">*</span><span class="n">ctx</span><span class="p">)</span> <span class="p">{</span>
<span class="kt">time_t</span> <span class="n">now</span> <span class="o">=</span> <span class="n">time</span><span class="p">(</span><span class="nb">NULL</span><span class="p">);</span>
<span class="k">struct</span> <span class="nc">tm</span> <span class="o">*</span><span class="n">t</span> <span class="o">=</span> <span class="n">localtime</span><span class="p">(</span><span class="o">&</span><span class="n">now</span><span class="p">);</span>
<span class="n">strftime</span><span class="p">(</span><span class="n">day_buffer</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">day_buffer</span><span class="p">),</span> <span class="s">"%a"</span><span class="p">,</span> <span class="n">t</span><span class="p">);</span>
<span class="n">text_layer_set_text</span><span class="p">(</span><span class="n">day_label</span><span class="p">,</span> <span class="n">day_buffer</span><span class="p">);</span>
<span class="n">strftime</span><span class="p">(</span><span class="n">num_buffer</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">num_buffer</span><span class="p">),</span> <span class="s">"%d"</span><span class="p">,</span> <span class="n">t</span><span class="p">);</span>
<span class="n">text_layer_set_text</span><span class="p">(</span><span class="n">num_label</span><span class="p">,</span> <span class="n">num_buffer</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div>
<p>Of course, that's really boring. On our watch, the date simply advances every
24 hours, modulo 31, which is usually the number of days in the month:</p>
<div class="highlight"><pre><span></span><code><span class="k">static</span> <span class="kt">void</span> <span class="nf">date_update_proc</span><span class="p">(</span><span class="n">Layer</span> <span class="o">*</span><span class="n">layer</span><span class="p">,</span> <span class="n">GContext</span> <span class="o">*</span><span class="n">ctx</span><span class="p">)</span> <span class="p">{</span>
<span class="kt">time_t</span> <span class="n">now</span> <span class="o">=</span> <span class="n">time</span><span class="p">(</span><span class="nb">NULL</span><span class="p">);</span>
<span class="kt">int</span> <span class="n">d</span> <span class="o">=</span> <span class="n">dwound</span> <span class="o">+</span> <span class="p">(((</span><span class="n">now</span><span class="o">-</span><span class="n">twound</span><span class="p">)</span><span class="o">/</span><span class="p">(</span><span class="mi">24</span><span class="o">*</span><span class="mi">3600</span><span class="p">))</span> <span class="o">%</span> <span class="mi">31</span><span class="p">);</span>
<span class="n">snprintf</span><span class="p">(</span><span class="n">num_buffer</span><span class="p">,</span><span class="k">sizeof</span><span class="p">(</span><span class="n">num_buffer</span><span class="p">),</span><span class="s">"%d"</span><span class="p">,</span><span class="n">d</span><span class="p">);</span>
<span class="p">...</span>
<span class="p">}</span>
</code></pre></div>
<p>This <code>twound</code> is the time that the watch was last fully wound and, we assume, set to the
correct date. It's persisted along with other state variables.</p>
<p>The hands update is more graphically intense. We start by getting the incorrect time</p>
<div class="highlight"><pre><span></span><code><span class="k">static</span> <span class="kt">void</span> <span class="nf">hands_update_proc</span><span class="p">(</span><span class="n">Layer</span> <span class="o">*</span><span class="n">layer</span><span class="p">,</span> <span class="n">GContext</span> <span class="o">*</span><span class="n">ctx</span><span class="p">)</span> <span class="p">{</span>
<span class="kt">time_t</span> <span class="n">now</span> <span class="o">=</span> <span class="n">time</span><span class="p">(</span><span class="nb">NULL</span><span class="p">)</span> <span class="o">+</span> <span class="n">s</span><span class="p">;</span>
<span class="k">struct</span> <span class="nc">tm</span> <span class="o">*</span><span class="n">t</span> <span class="o">=</span> <span class="n">localtime</span><span class="p">(</span><span class="o">&</span><span class="n">now</span><span class="p">);</span>
</code></pre></div>
<p>where the skew <code>s</code> is going to be updated separately in a different callback. We rotate the
hands into place and draw them:</p>
<div class="highlight"><pre><span></span><code> <span class="n">gpath_rotate_to</span><span class="p">(</span><span class="n">minute_arrow</span><span class="p">,</span> <span class="n">TRIG_MAX_ANGLE</span> <span class="o">*</span> <span class="n">t</span><span class="o">-></span><span class="n">tm_min</span> <span class="o">/</span> <span class="mi">60</span><span class="p">);</span>
<span class="n">gpath_draw_filled</span><span class="p">(</span><span class="n">ctx</span><span class="p">,</span> <span class="n">minute_arrow</span><span class="p">);</span>
<span class="n">gpath_draw_outline</span><span class="p">(</span><span class="n">ctx</span><span class="p">,</span> <span class="n">minute_arrow</span><span class="p">);</span>
</code></pre></div>
<p>In addition to simple skew, the second hand also suffers from a clunking twitch,</p>
<div class="highlight"><pre><span></span><code> <span class="kt">int</span> <span class="n">jsec</span> <span class="o">=</span> <span class="n">t</span><span class="o">-></span><span class="n">tm_sec</span> <span class="o">+</span> <span class="p">(</span><span class="n">rand</span><span class="p">()</span> <span class="o">%</span> <span class="n">JUMP_SEC</span><span class="p">);</span>
</code></pre></div>
<p>but it's still just a straight line and very easy to draw:</p>
<div class="highlight"><pre><span></span><code> <span class="kt">int32_t</span> <span class="n">second_angle</span> <span class="o">=</span> <span class="n">TRIG_MAX_ANGLE</span> <span class="o">*</span> <span class="n">jsec</span> <span class="o">/</span> <span class="mi">60</span><span class="p">;</span>
<span class="n">secondHand</span><span class="p">.</span><span class="n">y</span> <span class="o">=</span> <span class="p">(</span><span class="kt">int16_t</span><span class="p">)(</span><span class="o">-</span><span class="n">cos_lookup</span><span class="p">(</span><span class="n">second_angle</span><span class="p">)</span> <span class="o">*</span> <span class="p">(</span><span class="kt">int32_t</span><span class="p">)</span><span class="n">secondHandLength</span> <span class="o">/</span> <span class="n">TRIG_MAX_RATIO</span><span class="p">)</span> <span class="o">+</span> <span class="n">center</span><span class="p">.</span><span class="n">y</span><span class="p">;</span>
<span class="n">secondHand</span><span class="p">.</span><span class="n">x</span> <span class="o">=</span> <span class="p">(</span><span class="kt">int16_t</span><span class="p">)(</span><span class="n">sin_lookup</span><span class="p">(</span><span class="n">second_angle</span><span class="p">)</span> <span class="o">*</span> <span class="p">(</span><span class="kt">int32_t</span><span class="p">)</span><span class="n">secondHandLength</span> <span class="o">/</span> <span class="n">TRIG_MAX_RATIO</span><span class="p">)</span> <span class="o">+</span> <span class="n">center</span><span class="p">.</span><span class="n">x</span><span class="p">;</span>
<span class="n">graphics_draw_line</span><span class="p">(</span><span class="n">ctx</span><span class="p">,</span> <span class="n">secondHand</span><span class="p">,</span> <span class="n">center</span><span class="p">);</span>
</code></pre></div>
<h4>Temporal callbacks</h4>
<p>The graphics callbacks are triggered through the cascade of layers. To respond simply to the
passage of time, we set up a handler explicitly:</p>
<div class="highlight"><pre><span></span><code> <span class="n">tick_timer_service_subscribe</span><span class="p">(</span><span class="n">SECOND_UNIT</span><span class="o">|</span><span class="n">HOUR_UNIT</span><span class="o">|</span><span class="n">MINUTE_UNIT</span><span class="p">,</span> <span class="n">handle_tick</span><span class="p">);</span>
</code></pre></div>
<p>It wasn't clear to me at first that there could be only one timer subscription, so I originally
had separate callbacks for hours, minutes and seconds. This produced very odd results, as repeated calls
to <code>tick_timer_service_subscribe</code> only overwrite some of the previously set information. The one
callback can still detect why it was called. Once a minute, we'll update the winding level and skew:</p>
<div class="highlight"><pre><span></span><code><span class="k">static</span> <span class="kt">void</span> <span class="nf">handle_tick</span><span class="p">(</span><span class="k">struct</span> <span class="nc">tm</span> <span class="o">*</span><span class="n">tick_time</span><span class="p">,</span> <span class="n">TimeUnits</span> <span class="n">units_changed</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span><span class="p">(</span><span class="n">units_changed</span> <span class="o">&</span> <span class="n">MINUTE_UNIT</span><span class="p">)</span> <span class="p">{</span>
<span class="n">t1</span> <span class="o">=</span> <span class="n">time</span><span class="p">(</span><span class="nb">NULL</span><span class="p">);</span>
<span class="n">w</span> <span class="o">=</span> <span class="n">w0</span> <span class="o">-</span> <span class="n">upsilon</span> <span class="o">*</span> <span class="p">(</span><span class="n">t1</span><span class="o">-</span><span class="n">t0</span><span class="p">);</span>
<span class="k">if</span><span class="p">(</span><span class="n">w</span><span class="o"><=</span><span class="mf">0.0</span><span class="p">)</span> <span class="p">{</span>
<span class="n">t1</span> <span class="o">=</span> <span class="n">w0</span><span class="o">/</span><span class="n">upsilon</span> <span class="o">+</span> <span class="n">t0</span><span class="p">;</span> <span class="c1">// time the clock stopped</span>
<span class="n">w</span> <span class="o">=</span> <span class="mf">0.0</span>
<span class="p">}</span>
<span class="n">s</span> <span class="o">=</span> <span class="n">s0</span> <span class="o">+</span> <span class="n">alpha</span> <span class="o">*</span> <span class="p">(</span><span class="n">t1</span><span class="o">-</span><span class="n">t0</span><span class="p">)</span> <span class="o">*</span> <span class="p">(</span><span class="n">w0</span> <span class="o">-</span> <span class="n">W0</span> <span class="o">-</span> <span class="mf">0.5</span> <span class="o">*</span> <span class="n">upsilon</span> <span class="o">*</span> <span class="p">(</span><span class="n">t1</span><span class="o">-</span><span class="n">t0</span><span class="p">));</span>
<span class="p">}</span>
<span class="p">...</span>
</code></pre></div>
<p>Note that we don't count on actually being called every minute. The level and skew are calculated
from scratch, and we keep track of the time <code>t1</code> we calculated them so we can later persist it.</p>
<p>On the per-second callback, we'd typically just mark the root layer dirty and let the graphics callbacks
take care of the rest,</p>
<div class="highlight"><pre><span></span><code> <span class="k">if</span><span class="p">(</span><span class="n">units_changed</span> <span class="o">&</span> <span class="n">SECOND_UNIT</span><span class="p">)</span>
<span class="n">layer_mark_dirty</span><span class="p">(</span><span class="n">window_get_root_layer</span><span class="p">(</span><span class="n">window</span><span class="p">));</span>
</code></pre></div>
<p>but our watch stops working when $w=0.0$, and it only ticks every seven seconds:</p>
<div class="highlight"><pre><span></span><code> <span class="k">if</span><span class="p">(</span><span class="n">units_changed</span> <span class="o">&</span> <span class="n">SECOND_UNIT</span> <span class="o">&&</span>
<span class="p">(</span><span class="n">tick_time</span><span class="o">-></span><span class="n">tm_sec</span> <span class="o">%</span> <span class="n">JUMP_SEC</span><span class="p">)</span><span class="o">==</span><span class="mi">0</span> <span class="o">&&</span> <span class="n">w</span><span class="o">></span><span class="mf">0.0</span><span class="p">)</span> <span class="p">{</span>
<span class="n">layer_mark_dirty</span><span class="p">(</span><span class="n">window_get_root_layer</span><span class="p">(</span><span class="n">window</span><span class="p">));</span>
<span class="p">}</span>
</code></pre></div>
<h3>Low standards</h3>
<p>Unlike the famously picky iPhone app store, it seems that you can publish anything to Pebble.
Once you set up a developer account, you create a new app or watchface by uploading the <code>.pbw</code>
file created during the build and clicking "Publish." You wait about 5 seconds for approval, the
main criterion for which seems to
be having set a previously unused UUID for the app in a JSON file. What's more, despite the
total lack of oversight and a ridiculously bad search facility, more than zero people might
actually install your stuff. Someone even hearted mine!</p>
<p><img src="images/pebble-dev-dash.png"></p>
<p>And if the <strong>BOFFO</strong> is not quite clunky enough, you can either
<a href="http://github.com/pnf/pebble-sdk-examples">fork it</a> or look at the pie.<sup id="fnref:pie"><a class="footnote-ref" href="#fn:pie">1</a></sup></p>
<div class="footnote">
<hr>
<ol>
<li id="fn:pie">
<p>From <a href="http://www.amazon.com/The-Silver-Spoon-New-Edition/dp/0714862568">The Silver Spoon</a>, also
transcribed <a href="http://denvertruffle.com/recipes/vegetable-pie">here</a>. <a class="footnote-backref" href="#fnref:pie" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
</ol>
</div>Squiggly Lines: Background lint- and type-checking for Clojure in emacs.2014-12-12T00:00:00-05:002014-12-12T00:00:00-05:00Peter Fraenkeltag:blog.podsnap.com,2014-12-12:/squiggly.html<p>In the alternate universe where I am very honest, my résumé contains a line item
for the aggregate year I've spent fiddling with <code>init.el</code>. It does not,
however, list emacs lisp among the languages I know, because (in this alternate
universe) I have too much self respect to flaunt cut and paste skills.</p>
<p>In the present universe, this is going to be awkward, because I will have to present
really crappy elisp code without apology.</p>
<h3>What this is about.</h3>
<ol>
<li>One of the things you get in an industrial strength IDE, in exchange for giving up your
favorite editor, is …</li></ol><p>In the alternate universe where I am very honest, my résumé contains a line item
for the aggregate year I've spent fiddling with <code>init.el</code>. It does not,
however, list emacs lisp among the languages I know, because (in this alternate
universe) I have too much self respect to flaunt cut and paste skills.</p>
<p>In the present universe, this is going to be awkward, because I will have to present
really crappy elisp code without apology.</p>
<h3>What this is about.</h3>
<ol>
<li>One of the things you get in an industrial strength IDE, in exchange for giving up your
favorite editor, is a sort of ghost-pairing assistant who draws squiggly red lines
under your coding infelicities without requiring you to trawl through batch compilation
output.</li>
<li>There are a number of Clojure IDEs in various stages of development, but none of them
really perform this kind of magic. Cursive is moving in that direction, but in a
way that may limit itself, by not taking advantage of existing checking tools and those
inherent in the compiler. And it's closed source. Boo hiss.</li>
<li>Cider is open source and lives in emacs, but
gets its Clojure insights from a sideboard Clojure process, for which it is
essentially a presentation layer. This is a sensible approach that fuses
efforts across may projects rather than trying to duplicate all the functionality
in one parallel code base.</li>
<li>But it doesn't have squiggly lines.</li>
<li>While googling about, I realized it would not be that hard to add them.</li>
<li>And the general technique is sort of interesting.</li>
<li>But my solution needs work, is probably duplicative, and I need advice.</li>
</ol>
<p>The code below is of course on <a href="https://github.com/pnf/squiggly-clojure">github</a>.</p>
<h3>flycheck</h3>
<p><a href="https://flycheck.readthedocs.org/en/latest/guide/introduction.html">flycheck</a>
is an extension that gives emacs the ability to highlight coding errors
and warnings, in a wide variety of languages, in near-time, just as we've come
to expect from fancy IDEs. Importantly,</p>
<ol>
<li>the error checking runs in the background, without you having to request it, and</li>
<li>the errors show up as discreet annotations in the source buffer, with elaboration
available through mouse/cursor-over.</li>
</ol>
<p>The fancy IDEs, of course, are generally written in
modern languages closely related to the languages they support, so, while IntelliJ
and Eclipse are incredibly polished and impressive, they do start out with some
advantages over flycheck, which has a much broader language support remit and a
significantly more grizzled platform to build upon.</p>
<p>Unlike its predecessor, flymake, flycheck is not distributed with the base
distribution of emacs, but it is easily installed from MELPA. While you're at it,
it will also be helpful to install <code>flycheck-pos-tip</code>, which displays errors and
warnings as tool tips rather than in the rather overburdened minibuffer.</p>
<p>Since roughly the beginning of December, flycheck has allowed relatively
straightforward
<a href="http://www.lunaryorn.com/2014/12/03/generic-syntax-checkers-in-flycheck.html">customization</a>,
letting you extend it beyond the mere 43 languages
that it natively supports.</p>
<h3>flycheck and cider</h3>
<p>Most flycheck checkers are implemented using an external command that checks the code
in the current buffer and emits its complaints in some parsable form. Due to the
long startup times of JVM programs, and the tendency of JVM language linters to be
written in the languages they lint (3 times fast please), there hasn't been any support
in flycheck for such languages. To get acceptable performance, we'd really need a
persistent JVM process, with ongoing two-way communication.</p>
<p>That sort of thing is a pain in the neck to get right, but,
fortunately for us, it is also a popular Bulgarian pastime, so we have Cider.
Cider uses its persistent connection to the REPL to provide a
<a href="https://github.com/clojure-emacs/cider">vast set of features</a>, including
code completion, documentation lookup, code browsing, etc.</p>
<p>It also makes
Clojure errors a bit more palatable, formatting them nicely and highlighting
the offending code in the source buffer. Still, the error checking
experience is not quite as smooth as what one has for Java or Scala in Eclipse or IntelliJ.
The main difference is that Cider doesn't disguise that batch nature of error
checking operations. Errors emerge in a punctuated fashion, when you
throw an exception, and they show up one at a time in a popup buffer.</p>
<p>Using cider's utilities for asynchronous communication with a persistent Clojure process
together with flycheck's asynchronous error handling, we can now get a little closer to
the IDE experience.</p>
<h3>linting and type-checking</h3>
<p>In a charitable mood, you might call me a <a href="https://github.com/clojure/core.typed">core.typed</a>
enthusiast,
rather than complaining that I never shut up about it. Whatever your approach,
I'm probably not going to stop talking about it, because I strongly believe that
strongly believing in unmitigated dynamic typing does a disservice to a computer
language I have come to love. Enthusiasm aside, I don't exactly love the ritual of
getting up the nerve to run <code>(check-ns)</code> and then bracing for an eruption of
text. If I could run the check in the background periodically and gently flag
the type transgressions for later review, that would be a fine thing.</p>
<p>Another tool I've discovered recently is <a href="https://github.com/jonase/eastwood">eastwood</a>,
a more general purpose linter for Clojure. It does quite a lot, from detecting typos
that will crash immediately at runtime</p>
<div class="highlight"><pre><span></span><code><span class="n">wrong</span><span class="o">-</span><span class="nl">arity</span><span class="p">:</span> <span class="n">Function</span> <span class="n">on</span> <span class="n">var</span> <span class="err">#'</span><span class="n">clojure</span><span class="p">.</span><span class="n">core</span><span class="o">/</span><span class="n">map</span> <span class="n">called</span> <span class="n">with</span> <span class="mi">1</span> <span class="n">args</span><span class="p">,</span>
<span class="n">but</span> <span class="n">it</span> <span class="n">is</span> <span class="n">only</span> <span class="n">known</span> <span class="n">to</span> <span class="n">take</span> <span class="n">one</span> <span class="n">of</span> <span class="n">the</span> <span class="n">following</span> <span class="nl">args</span><span class="p">:</span>
<span class="p">[</span><span class="n">f</span> <span class="n">coll</span><span class="p">]</span> <span class="p">[</span><span class="n">f</span> <span class="n">c1</span> <span class="n">c2</span><span class="p">]</span> <span class="p">[</span><span class="n">f</span> <span class="n">c1</span> <span class="n">c2</span> <span class="n">c3</span><span class="p">]</span> <span class="p">[</span><span class="n">f</span> <span class="n">c1</span> <span class="n">c2</span> <span class="n">c3</span> <span class="o">&</span> <span class="n">colls</span><span class="p">]...</span>
</code></pre></div>
<p>to subtler indications that you probably made a mistake</p>
<div class="highlight"><pre><span></span><code><span class="nv">unused</span><span class="o">-</span><span class="nv">ret</span><span class="o">-</span><span class="nv">vals</span>: <span class="nv">Lazy</span> <span class="nv">function</span> <span class="k">call</span> <span class="nl">return</span> <span class="nv">value</span> <span class="nv">is</span> <span class="nv">discarded</span>:
</code></pre></div>
<p>to complaints that will probably feel a bit pedantic at first</p>
<div class="highlight"><pre><span></span><code> unlimited-use: Unlimited use of (clojure.walk clojure.pprint)
</code></pre></div>
<p>(because these namespaces show up in a <code>use</code> without <code>:only</code> or <code>:refer</code>).</p>
<p>In a recent release of <code>eastwood</code>, it became possible to invoke the linter from the REPL,
which means it can be invoked by cider, and thus by flycheck.</p>
<h3>Asynchronous everything</h3>
<p>Emacs lisp code runs in a single thread, which means we can't just ask ask clojure (or any process) to
perform a task and then block on its response. Since both cider and flycheck make a living by
communicating with external processes, they work with and provide tools for asynchronous callback.</p>
<h4>cider callbacks</h4>
<p>The canonical cider expression evaluator looks like this,</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="n">cider</span><span class="o">-</span><span class="n">tooling</span><span class="o">-</span><span class="n">eval</span> <span class="s2">"(do (println "</span><span class="n">hello</span><span class="p">,</span> <span class="n">how</span> <span class="n">do</span> <span class="n">you</span> <span class="n">you</span> <span class="n">do</span><span class="err">?</span><span class="s2">") 42)"</span>
<span class="p">(</span><span class="n">nrepl</span><span class="o">-</span><span class="n">make</span><span class="o">-</span><span class="n">response</span><span class="o">-</span><span class="n">handler</span> <span class="n">buffer</span>
<span class="p">(</span><span class="n">lambda</span> <span class="p">(</span><span class="n">buffer</span> <span class="n">value</span><span class="p">)</span>
<span class="p">(</span><span class="n">message</span> <span class="p">(</span><span class="n">format</span> <span class="s2">"The final value should be </span><span class="se">\"</span><span class="s2">42</span><span class="se">\"</span><span class="s2"> == </span><span class="si">%s</span><span class="s2">"</span> <span class="n">value</span><span class="p">)))</span>
<span class="p">(</span><span class="n">lambda</span> <span class="p">(</span><span class="n">buffer</span> <span class="n">stdout</span><span class="p">)</span>
<span class="p">(</span><span class="n">message</span> <span class="s2">"Very well, thank you."</span><span class="p">)</span>
<span class="p">(</span><span class="n">lambda</span> <span class="p">(</span><span class="n">buffer</span> <span class="n">stderr</span><span class="p">)</span>
<span class="p">(</span><span class="n">message</span> <span class="s2">"Nothing here. Move along."</span><span class="p">))</span>
<span class="s1">'())))</span>
</code></pre></div>
<p>with hooks to capture, respectively, the final return value of the expression,
standard out and standard error.
Since <code>eastwood</code> communicates via formatted output lines, while <code>check-ns-info</code>
returns type errors in structured form, we'll get to use two of these three.</p>
<h4>flycheck callbacks</h4>
<p>The core of a canonical flycheck checker looks like this,</p>
<div class="highlight"><pre><span></span><code> (defun my-flycheck-checker-start (checker callback)
(let ((buffer (current-buffer))
(errors ()))
(push (flycheck-error-new-at 37 ;; line
1 ;; column
'error
"Something terrible happened."
:checker checker
:buffer buffer
:filename "foo.clj")
errors)
(funcall callback 'finished errors)))
</code></pre></div>
<p>packing infelicities into flycheck error structures and passing them on to the
provided callback.</p>
<h4>flycheck + cider</h4>
<p>Since all our error information is coming from evaluations of Clojure code,
both the accumulation of error structures and the final invocation of
the flycheck callback will occur within cider response handler callbacks,
something like this,</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="n">defun</span> <span class="n">my</span><span class="o">-</span><span class="n">flycheck</span><span class="o">-</span><span class="n">checker</span><span class="o">-</span><span class="n">start</span> <span class="p">(</span><span class="n">checker</span> <span class="n">callback</span><span class="p">)</span>
<span class="p">(</span><span class="n">let</span> <span class="p">((</span><span class="n">buffer</span> <span class="p">(</span><span class="n">current</span><span class="o">-</span><span class="n">buffer</span><span class="p">))</span>
<span class="p">(</span><span class="n">errors</span> <span class="p">()))</span>
<span class="p">(</span><span class="n">cider</span><span class="o">-</span><span class="n">tooling</span><span class="o">-</span><span class="n">eval</span> <span class="s2">"(check-something)"</span>
<span class="p">(</span><span class="n">nrepl</span><span class="o">-</span><span class="n">make</span><span class="o">-</span><span class="n">response</span><span class="o">-</span><span class="n">handler</span> <span class="n">buffer</span>
<span class="p">(</span><span class="n">lambda</span> <span class="p">(</span><span class="n">buffer</span> <span class="n">value</span><span class="p">)</span>
<span class="p">(</span><span class="n">mapc</span> <span class="p">(</span><span class="n">lambda</span> <span class="p">(</span><span class="n">e</span><span class="p">)</span> <span class="p">(</span><span class="n">push</span> <span class="n">e</span> <span class="n">errors</span><span class="p">))</span> <span class="p">(</span><span class="n">parse</span><span class="o">-</span><span class="n">the</span><span class="o">-</span><span class="k">return</span><span class="o">-</span><span class="n">value</span> <span class="n">value</span><span class="p">))</span>
<span class="p">(</span><span class="n">funcall</span> <span class="n">callback</span> <span class="s1">'finished errors))</span>
<span class="p">(</span><span class="n">lambda</span> <span class="p">(</span><span class="n">buffer</span> <span class="n">stdout</span><span class="p">)</span>
<span class="p">(</span><span class="n">mapc</span> <span class="p">(</span><span class="n">lambda</span> <span class="p">(</span><span class="n">e</span><span class="p">)</span> <span class="p">(</span><span class="n">push</span> <span class="n">e</span> <span class="n">errors</span><span class="p">))</span> <span class="p">(</span><span class="n">parse</span><span class="o">-</span><span class="n">some</span><span class="o">-</span><span class="n">output</span> <span class="n">value</span><span class="p">)))</span>
<span class="p">(</span><span class="n">lambda</span> <span class="p">(</span><span class="n">buffer</span> <span class="n">stderr</span><span class="p">)</span>
<span class="p">(</span><span class="n">message</span> <span class="s2">"whoops"</span><span class="p">))</span>
<span class="s1">'()))))</span>
</code></pre></div>
<p>assuming we have written <code>parse-something-or-other</code> functions to convert
the clojure responses into error structures.</p>
<h4>things I learned and haven't learned about callbacks</h4>
<ul>
<li>
<p>All of these anonymous callback functions would be a bit silly if they couldn't
close over variables in their lexical scope, so be sure to <code>(setq lexical-binding t)</code>
(or include <code>-*- lexical-binding: t; -*-</code> on the first line).</p>
</li>
<li>
<p>The cider stdout callback will likely be invoked more than once. I <em>believe</em> it gets
called on every <code>fflush</code>, as it sometimes receives multiple lines.</p>
</li>
<li>
<p>The cider value callback seems to be invoked only once, so one may invoke the
flycheck callback from within it.</p>
</li>
<li>
<p><code>cider-tooling-eval</code>s are queued, so I can do a whole bunch of them and
count on them all having completed when I receive a value from the last.</p>
</li>
</ul>
<h3>Invoking typed clojure and parsing its output.</h3>
<p>Normally, one types <code>(check-ns)</code> in the REPL and sits back to enjoy the
printed output. One can, alternatively, call <code>check-ns-info</code>, which returns
the same information as a vector of <code>ex-info</code> exceptions, where the <code>ex-data</code>
is of the form <code>{:env {:file "foo.clj" :column 1 :line 73}}</code> and the usual
multiline output
is in the exception message, which we can just merge into the <code>:env</code>.
Cider seems to use
<a href="https://en.wikipedia.org/wiki/Bencode">bencode</a> to pass data between clojure and
elisp, but I got impatient trying to figure out how, so I used JSON instead:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="n">setq</span><span class="w"> </span><span class="n">cmdf</span><span class="o">-</span><span class="n">tc</span><span class="w"> </span><span class="ss">"(do (require 'clojure.core.typed)</span>
<span class="ss"> (require 'clojure.data.json)</span>
<span class="ss"> (clojure.data.json/write-str</span>
<span class="ss"> (map (fn [e] (assoc (:env (ex-data e)) :msg (.getMessage e)))</span>
<span class="ss"> (:delayed-errors (clojure.core.typed/check-ns-info '%s)))))"</span><span class="p">)</span><span class="w"></span>
</code></pre></div>
<p>With the appropriate namespace <code>format</code>ed in, this string will form the
second argument to <code>cider-tooling-eval</code>.
In the value callback, we
decode the JSON -- maps become alists and keywords become symbols --
and turn the entire thing into a list of tuples <code>(file line column msg)</code>,</p>
<div class="highlight"><pre><span></span><code>(defun get-rec-from-alist (al ks)
(mapcar (lambda (k) (cdr (assoc k al))) ks))
(defun parse-tc-json (s)
(let ((ws (json-read-from-string (json-read-from-string s))))
(mapcar (lambda (w) (get-rec-from-alist w '(file line column msg))) ws)))
</code></pre></div>
<p>for further conversion into flycheck errors:</p>
<div class="highlight"><pre><span></span><code>(defun tuple-to-error (w checker buffer fname)
"Convert W of form '(file, line, column, message) to flycheck error object.
Uses CHECKER, BUFFER and FNAME unmodified."
(pcase-let* ((`(,file ,line ,column ,msg) w))
(flycheck-error-new-at line column 'error msg
:checker checker
:buffer buffer
:filename fname)))
</code></pre></div>
<p>Two things of which I am not proud: First, there really must be a better way to extract a cross-section
of alist values. Second, the JSON string comes to us EDN quoted, which means we must
remove a lot of extra <code>\\</code>s; hence the double <code>json-read-from-string</code>.</p>
<p>The entire type-check wraps up to,</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="n">cider</span><span class="o">-</span><span class="n">tooling</span><span class="o">-</span><span class="n">eval</span> <span class="p">(</span><span class="n">format</span> <span class="n">cmdf</span><span class="o">-</span><span class="n">tc</span> <span class="n">ns</span><span class="p">)</span>
<span class="p">(</span><span class="n">nrepl</span><span class="o">-</span><span class="n">make</span><span class="o">-</span><span class="n">response</span><span class="o">-</span><span class="n">handler</span> <span class="n">buffer</span>
<span class="p">(</span><span class="n">lambda</span> <span class="p">(</span><span class="n">_buffer</span> <span class="n">value</span><span class="p">)</span>
<span class="p">(</span><span class="n">message</span> <span class="s2">"Finished core.typed check."</span><span class="p">)</span>
<span class="p">(</span><span class="n">mapc</span> <span class="p">(</span><span class="n">lambda</span> <span class="p">(</span><span class="n">w</span><span class="p">)</span> <span class="p">(</span><span class="n">push</span> <span class="p">(</span><span class="n">tuple</span><span class="o">-</span><span class="n">to</span><span class="o">-</span><span class="n">error</span> <span class="n">w</span> <span class="n">checker</span> <span class="n">buffer</span> <span class="n">fname</span><span class="p">)</span> <span class="n">errors</span><span class="p">))</span> <span class="p">(</span><span class="n">parse</span><span class="o">-</span><span class="n">tc</span><span class="o">-</span><span class="n">json</span> <span class="n">value</span><span class="p">)))</span>
<span class="p">(</span><span class="n">lambda</span> <span class="p">(</span><span class="n">_buffer</span> <span class="n">out</span><span class="p">))</span>
<span class="p">(</span><span class="n">lambda</span> <span class="p">(</span><span class="n">_buffer</span> <span class="n">err</span><span class="p">))</span>
<span class="s1">'()))</span>
</code></pre></div>
<p>where we're using only one cider hook, so most of the code is do-nothing boilerplate.</p>
<p>At this point, we haven't sent the error list to flycheck, because this can only be done
once, and there are more checks to be done.</p>
<h3>Invoking eastwood and parsing its output.</h3>
<p>Eastwood speaks in formatted output lines like</p>
<div class="highlight"><pre><span></span><code>the/file/name.clj:472:10:a message of some sort
</code></pre></div>
<p>They're easy to parse, at least if one
makes the lazy assumption that file names never contain colons.</p>
<div class="highlight"><pre><span></span><code><span class="o">(</span><span class="nt">setq</span> <span class="nt">cmdf-ew</span> <span class="err">"</span><span class="o">(</span><span class="nt">do</span> <span class="o">(</span><span class="nt">require</span> <span class="err">'</span><span class="nt">eastwood</span><span class="p">.</span><span class="nc">lint</span><span class="o">)</span>
<span class="o">(</span><span class="nt">eastwood</span><span class="p">.</span><span class="nc">lint</span><span class="o">/</span><span class="nt">eastwood</span> <span class="p">{</span><span class="err">:source-paths</span> <span class="cp">[</span><span class="o">\</span><span class="s2">"src</span><span class="se">\"</span><span class="s2">] :namespaces ['%s] } ))"</span><span class="p">)</span>
<span class="p">(</span><span class="nx">defun</span> <span class="nx">parse</span><span class="na">-ew</span> <span class="p">(</span><span class="nx">out</span><span class="p">)</span>
<span class="p">(</span><span class="nx">delq</span> <span class="nx">nil</span>
<span class="p">(</span><span class="nx">mapcar</span> <span class="p">(</span><span class="nx">lambda</span> <span class="p">(</span><span class="nx">s</span><span class="p">)</span>
<span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nx">r</span> <span class="s2">"^</span><span class="se">\\</span><span class="s2">([^[:space:]]+</span><span class="se">\\</span><span class="s2">)</span><span class="se">\\</span><span class="s2">:</span><span class="se">\\</span><span class="s2">([[:digit:]]+</span><span class="se">\\</span><span class="s2">)</span><span class="se">\\</span><span class="s2">:</span><span class="se">\\</span><span class="s2">([[:digit:]]+</span><span class="se">\\</span><span class="s2">)</span><span class="se">\\</span><span class="s2">:[[:space:]]*</span><span class="se">\\</span><span class="s2">(.*</span><span class="se">\\</span><span class="s2">)"</span><span class="p">))</span>
<span class="p">(</span><span class="k">if</span> <span class="p">(</span><span class="kt">string</span><span class="na">-match</span> <span class="nx">r</span> <span class="nx">s</span><span class="p">)</span>
<span class="p">(</span><span class="kt">list</span>
<span class="p">(</span><span class="k">match</span><span class="na">-string</span> <span class="mi">1</span> <span class="nx">s</span><span class="p">)</span> <span class="p">;;</span> <span class="nb">file</span>
<span class="p">(</span><span class="kt">string</span><span class="na">-to-number</span> <span class="p">(</span><span class="k">match</span><span class="na">-string</span> <span class="mi">2</span> <span class="nx">s</span><span class="p">))</span> <span class="p">;;</span> <span class="nx">line</span>
<span class="p">(</span><span class="kt">string</span><span class="na">-to-number</span> <span class="p">(</span><span class="k">match</span><span class="na">-string</span> <span class="mi">3</span> <span class="nx">s</span><span class="p">))</span> <span class="p">;;</span> <span class="nx">col</span>
<span class="p">(</span><span class="k">match</span><span class="na">-string</span> <span class="mi">4</span> <span class="nx">s</span><span class="p">)</span> <span class="p">;;</span> <span class="nx">msg</span>
<span class="p">))))</span>
<span class="p">(</span><span class="nx">split</span><span class="na">-string</span> <span class="nx">out</span> <span class="s2">"</span><span class="se">\n</span><span class="s2">"</span><span class="p">))))</span>
</code></pre></div>
<p>Once the errors have been parsed into tuples, we do almost the same thing as
we did for <code>core.typed</code>,</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="n">cider</span><span class="o">-</span><span class="n">tooling</span><span class="o">-</span><span class="n">eval</span> <span class="n">cmd</span><span class="o">-</span><span class="n">ew</span>
<span class="p">(</span><span class="n">nrepl</span><span class="o">-</span><span class="n">make</span><span class="o">-</span><span class="n">response</span><span class="o">-</span><span class="n">handler</span>
<span class="n">buffer</span>
<span class="p">(</span><span class="n">lambda</span> <span class="p">(</span><span class="n">_buffer</span> <span class="n">_value</span><span class="p">)</span> <span class="p">(</span><span class="n">message</span> <span class="s2">"Finished eastwood check."</span><span class="p">))</span>
<span class="p">(</span><span class="n">lambda</span> <span class="p">(</span><span class="n">_buffer</span> <span class="n">out</span><span class="p">)</span>
<span class="p">(</span><span class="n">mapc</span> <span class="p">(</span><span class="n">lambda</span> <span class="p">(</span><span class="n">w</span><span class="p">)</span> <span class="p">(</span><span class="n">push</span> <span class="p">(</span><span class="n">tuple</span><span class="o">-</span><span class="n">to</span><span class="o">-</span><span class="n">error</span> <span class="n">w</span> <span class="n">checker</span> <span class="n">buffer</span> <span class="n">fname</span><span class="p">)</span> <span class="n">errors</span><span class="p">))</span>
<span class="p">(</span><span class="n">parse</span><span class="o">-</span><span class="n">ew</span> <span class="n">out</span><span class="p">)))</span>
<span class="p">(</span><span class="n">lambda</span> <span class="p">(</span><span class="n">_buffer</span> <span class="n">err</span><span class="p">))</span>
<span class="s1">'()))</span>
</code></pre></div>
<p>with the main difference being that the action occurs in the stdout callback.</p>
<h3>etc</h3>
<p>The final <code>cider-tooling-eval</code> does nothing but detect that the others have
run and then passes the errors back to flycheck for display.</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="n">cider</span><span class="o">-</span><span class="n">tooling</span><span class="o">-</span><span class="n">eval</span> <span class="s2">"true"</span>
<span class="p">(</span><span class="n">nrepl</span><span class="o">-</span><span class="n">make</span><span class="o">-</span><span class="n">response</span><span class="o">-</span><span class="n">handler</span>
<span class="n">buffer</span>
<span class="p">(</span><span class="n">lambda</span> <span class="p">(</span><span class="n">_buffer</span> <span class="n">_value</span><span class="p">)</span>
<span class="p">(</span><span class="n">message</span> <span class="s2">"Finished all clj checks."</span><span class="p">)</span>
<span class="p">(</span><span class="n">funcall</span> <span class="n">callback</span> <span class="s1">'finished errors))</span>
<span class="p">(</span><span class="n">lambda</span> <span class="p">(</span><span class="n">_buffer</span> <span class="n">out</span><span class="p">))</span>
<span class="p">(</span><span class="n">lambda</span> <span class="p">(</span><span class="n">_buffer</span> <span class="n">err</span><span class="p">))</span>
<span class="s1">'()))</span>
</code></pre></div>
<h3>What it looks like</h3>
<p>The following is actual production code from a well-known Fortune 500 company:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">fly-tests</span> <span class="p">[]</span>
<span class="p">(</span><span class="nb">inc </span><span class="s">"foo"</span><span class="p">)</span>
<span class="p">(</span><span class="nb">map inc </span><span class="p">[</span><span class="mi">1</span> <span class="mi">2</span> <span class="mi">3</span><span class="p">])</span>
<span class="p">(</span><span class="nb">+ </span><span class="mi">3</span><span class="p">))</span>
</code></pre></div>
<p>In emacs, it looks like this:</p>
<p><img alt="squiggle" src="images/squiggle0.png"></p>
<p>If you place the cursor right before <code>inc</code>, we quickly see the first
error of our ways, courtesy of <code>core.typed</code>:</p>
<p><img alt="squiggle" src="images/squiggle1.png"></p>
<p>The next complaint is from <code>eastwood</code>, which observes that we're setting
up a lazy <code>map</code> calculation and then never using it.</p>
<p><img alt="squiggle" src="images/squiggle2.png"></p>
<p>The last is also from <code>eastwood</code>, which, again, highlights something that is
not an error per se but probably indicates a mistake:</p>
<p><img alt="squiggle" src="images/squiggle3.png"></p>
<h3>To do:</h3>
<ol>
<li>Find out either that someone has done this already, or that there's a good reason not to do it, or both.</li>
<li>Deal with the not so rare circumstance that either <code>eastwood</code> or <code>core.typed</code> fails
catastrophically. It might be helpful to run <code>lein check</code> or the equivalent first, but ideally
not in a one-time process.</li>
<li>Add configuration options, in case someone doesn't have all the linters installed and on the classpath.</li>
<li>General error handling. Under circumstances I don't entirely understand, it has been necessary to turn
flycheck on and off to restore sanity.</li>
<li>Performance optimizations - perhaps throttling and/or narrowing.</li>
</ol>NYC Clojure Meetup slides on lenses and appropriate typing2014-12-09T00:00:00-05:002014-12-09T00:00:00-05:00Peter Fraenkeltag:blog.podsnap.com,2014-12-09:/lens-pun-goes-here.html<p>To everyone who attended yesterday's NYC Clojure Meetup: thanks for listening to me, asking good questions
and providing some pretty great answers as well.</p>
<p>Here are the slides. For more detail on nearly everything, see previous posts.</p>
<iframe src="http://pnf.github.io/talks/lenses/#/" width=800 height=500> </iframe>
<p>(Navigate using the compass arrows. Up/Down within a section; Left/Right betwen sections; ESC for overview.)</p>Lost in Transduction - Heresy and ingratitude edition2014-11-26T00:00:00-05:002014-11-26T00:00:00-05:00Peter Fraenkeltag:blog.podsnap.com,2014-11-26:/lost-in-translation.html<p>Were you at Clojure/conj in Washington last week? If so, hello again. Wasn't that a great conference?
If not, head to <a href="https://www.youtube.com/channel/UCaLlzGqiPE2QRj6sSOawJRg">Clojure TV</a>, where all the talks
are ready for streaming. Assuming some moderate level of Clojure obsession on your part, I couldn't
recommend skipping any of them, so the full catch-up might take you a while, but there are two in particular
that I strongly recommend.</p>
<h2>Avoiding altercations</h2>
<p>The first is actually the very last talk of the conference. Brian Goetz, whom you may have encountered
previously as the author of <a href="https://github.com/jcip/jcip.github.com">Java Concurrency in Practice</a>
or heard of …</p><p>Were you at Clojure/conj in Washington last week? If so, hello again. Wasn't that a great conference?
If not, head to <a href="https://www.youtube.com/channel/UCaLlzGqiPE2QRj6sSOawJRg">Clojure TV</a>, where all the talks
are ready for streaming. Assuming some moderate level of Clojure obsession on your part, I couldn't
recommend skipping any of them, so the full catch-up might take you a while, but there are two in particular
that I strongly recommend.</p>
<h2>Avoiding altercations</h2>
<p>The first is actually the very last talk of the conference. Brian Goetz, whom you may have encountered
previously as the author of <a href="https://github.com/jcip/jcip.github.com">Java Concurrency in Practice</a>
or heard of as the Java Language Architect at Oracle, spoke about
<a href="https://www.youtube.com/watch?v=2y5Pv4yN0b0">Stewardship: The Sobering Parts</a>. To talk about Java
before an audience known to derive a certain amount of enjoyment from deriding the language takes
mettle, nuance and wit, all of which he displayed in abundance. Of course, he wasn't trying to
convert an audience of Lispers to worship at the altar of curly braces,
but to convey some sense of the responsibility you have when 9 million programmers
use your language and a good number of the world's production systems are running in it.</p>
<p>Clojure's isn't (yet) in that position but one that was,
at least in proportion to the total amount of code at the time, is COBOL. I'm not sure you can
find anyone to defend it from an aesthetic or theoretical standpoint, but it was a very good fit
for the computers of the day and for the purposes to which they were put.</p>
<p>If you feel faint at the sight of <code>GO TO</code> statements, it might be time for a stiff drink, because
we're going to talk about something even worse. That might be hard to imagine from the vantage point
of our enlightened age, but it's true. Sometime in the 1980s, the ANSI standards committee for COBOL
introduced the <code>ALTER</code> statement:</p>
<p><img alt="alter" src="images/alter.png"></p>
<p>The thinking apparently was that if you have a good working memory and like puzzles, ordinary spaghetti
code isn't going to be challenging enough, so you need self-modifying spaghetti code. What
<code>ALTER</code> did was modify the destination of a specified <code>GO TO</code> statement, so that, from now on,
it would go somewhere else.</p>
<p>Brian thinks that this was the precise moment when COBOL jumped the shark.<sup id="fnref:shark"><a class="footnote-ref" href="#fn:shark">1</a></sup> It had had a pretty good run
and a decent remaining cadre of developers, but the <code>ALTER</code> statement pretty much guaranteed that
any codebase under active development would eventually become unmaintainable.</p>
<p>Which brought him to this immortal line:</p>
<blockquote>
<p>Java's ALTER statement is only one bad decision away.</p>
</blockquote>
<p>Although it's hard to type with meat-axes instead of hands, I'm going to attempt a minor refactoring:</p>
<blockquote>
<p>${LANG}'s ALTER statement is only one bad decision away.</p>
</blockquote>
<h2>Always be Composing</h2>
<p>Does this section title seem too sensible and eloquent for me to have thought of myself? It is! Actually, it's
the title of
<a href="http://www.youtube.com/watch?v=3oQTSP4FngY">another great talk</a>,
one by Zach Tellman, and the title isn't even the best part. There's
this:</p>
<blockquote>
<p>Composability isn't whether people can put two things together, but whether they are willing to.</p>
</blockquote>
<p>I'm tempted to do that thing people do in blogs where they say something, and then say they're going to
repeat it because it's so great, and then repeat it. But shucks, I've never had the panache to pull off
something like that, and, anyway, I haven't explained yet what it means.</p>
<p>Composition, broadly speaking, is what emerges when you can combine simple pieces in
different ways to produce complex and interesting results. Zach illustrated this with a
Sierpinski gaskets,<sup id="fnref:sier"><a class="footnote-ref" href="#fn:sier">2</a></sup></p>
<p><img src="images/sierpinski-gasket.png" width=200px><img src="images/sierpinski-real.jpg" width=200px>
<img src="images/sierpinski.png" width=200px></p>
<p>exploring the tradeoffs as you model them as macros, code or data.
Other things you can compose are Lego, phonemes and cold-cuts.
For example, with Lego and a bit of discipline,
you can make really complicated things.</p>
<p><img src="images/lego.jpg" width=150px>
<img src="images/LongRightArrow_L.gif" width=120px>
<img src="images/lego-dogs.jpg" width=150px></p>
<p>Lego is much more popular than this other thing called
<a href="https://en.wikipedia.org/wiki/Soma_cube">Soma</a>,
which Soma programmers say is unfair,
because they can make a dog too:</p>
<p><img src="images/soma.jpg" width=150px>
<img src="images/LongRightArrow_L.gif" width=120px>
<img src="images/soma-dog.jpg" width=150px></p>
<p>And, if they could get hold of enough pieces, plus some glue, an
instruction manual in English, and a
dependency management system, they could probably make great big,
complicated dogs. Unfortunately, some people think of Soma as an
academic block system and complain that making things with it is like
solving a puzzle.</p>
<p>One of the most well known FP abstractions is of course the chaining of sequence operations,</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="nf">->></span> <span class="p">[</span><span class="mi">1</span> <span class="mi">2</span> <span class="mi">3</span> <span class="mi">4</span><span class="p">]</span> <span class="p">(</span><span class="nb">map </span><span class="nv">inc</span><span class="p">)</span> <span class="p">(</span><span class="nb">filter </span><span class="nv">even?</span><span class="p">)</span> <span class="p">(</span><span class="nb">mapcat </span><span class="o">#</span><span class="p">(</span><span class="nb">range </span><span class="nv">%</span> <span class="p">(</span><span class="nb">+ </span><span class="mi">5</span> <span class="nv">%</span><span class="p">)))</span> <span class="nv">sort</span><span class="p">)</span>
</code></pre></div>
<p>which is clicks at such a deeply intuitive level that you can easily understand the
same algorithm in languages you ostensibly don't know.</p>
<p>Which brings us to transducers. It would be extremely unfair to leave Zach open
to charges of unsubtlety, so I'm going to quote the relevant section in full:</p>
<blockquote>
<p>Transducers are a very specific sort of composition. They're not a
higher order of abstraction. They're actually very narrowly targeted
at a certain kind of operation we're trying to do. You can't compose
a transducer with any function. And you can't even transduce with
every kind of sequence operation. Certain operations such as sort,
which require us to realize the sequence are not represented as
transducers, so if we're looking at our code and we have some sort of
great big chain of operations, one of which is one of these
non-transducible things, we now need to separate that out, have
certain things applied as a transducer and apply them, then apply
these other operations separately. This is not to say that
transducers are bad. I think they're a really interesting and very,
very useful tool, but I do think it's interesting to look at how these
are going to shape the tutorials for Clojure, because there's a nice
sort of simplicity and immediacy to be able to say, map a function
over my sequence. I think that it would be a little bit difficult for
someone who's entirely new to Clojure to have their first operation over a sequence
to be defined as a transducer.</p>
</blockquote>
<p>Chronologically, the first quote comes after this one. There was some space in between them,
but, for me, they resonated together.</p>
<h2>History, Nomenclature and Ambition</h2>
<p>Transducers are an ambitious concept with an interesting history. Well, it's history in the sense of 2 1/2 exciting years since
the original <a href="http://clojure.com/blog/2012/05/08/reducers-a-library-and-model-for-collection-processing.html">reducers post</a>,
but we live in interesting times.
That post used transducers but called them by other names, including "reducer fn" and "reducing fn", while the
usual binary function passed to <code>reduce</code> was (and, in
<a href="https://github.com/clojure/clojure/blob/7d84a9f6f35a503cddf98487b6544d18937c669e/src/clj/clojure/core/reducers.clj#289">reducers.clj</a>,
still is) referred to as a "combining function."
In any case, transducers are not yet the main event.</p>
<p>The next installment arrived roughly a year later, with
<a href="http://blog.cognitect.com/blog/2014/8/6/transducers-are-coming">Transducers are Coming</a>.
I think the word "important" can legitimately be applied to that post, but perhaps not the word "clear."
Among other things, it introduced the word "transducer" as a synonym for one of the few terms
for them that hadn't actually been used,
viz. "reducing function transformer." At some point, I attempted
a <a href="http://blog.podsnap.com/ducers.html">glossary</a>, which seems to have been consulted quite a few times
without critical comments, so there's a possibility that might have been accurate. I use the past
perfect, because today, at the very least, it is no longer complete, as it doesn't mention the
latest innovation, <code>educer</code>s.</p>
<h3>Clarity and hospitality</h3>
<p>It is true that I enjoy making fun of other people's prose, but, in my defense (1) I always introduce a solecism
or two of my own, to keep credibility down to unintimidating levels and (2) it doesn't really matter what I say.
As an Unimportant Person, I have the luxury of expressing myself in a manner that amuses me, and if the barrage
of flippancy causes someone to stop reading, the world will continue to turn.
When we're talking about core features of an important computer language, the stakes are higher.
It is a problem that many experienced Clojure programmers are demonstrably confused by transducers;
it would be a bigger problem if we didn't care; it would be truly tragic if it got to the point where
newcomers to Clojure were
welcomed -- as they are to certain other languages --
with the belittling advice to come back after training their brains.</p>
<h3>The expressivity-ink ratio</h3>
<p>The compositional power of a system is inversely proportional to the complexity of its description.
Kernighan and Ritchie in its first edition comprised a mere 228 pages, and they were actually all you needed to
start coding. The successors of C may have had more expressive power, but that increased expressivity came at a
huge cost in confusion and verbiage. Clojure, of course, cheats by being a lisp variant, but even among lisps it
stands out for its crystalline internal coherence. It simply makes so much sense that the necessary textual documentation
can be contained in mouseovers (or <code>C-c C-d d</code> or what have you). As with C, a small amount of information lets
you start coding; better than C, the resulting code has a good chance of being correct.</p>
<p>Whatever you think of transducers, it's hard to argue that they don't require a lot of explanation. At the very
least, they spawn tutorials at a pretty good clip.<sup id="fnref:tutorials"><a class="footnote-ref" href="#fn:tutorials">3</a></sup> We're not quite at monadic levels, but the level of ambient
perplexity represents a shift for Clojure. Perhaps some of the confusion will dissipate once the professional
scribes pump out their next editions, but it seems possible to me that some is in the nature of the
transductional beast. For example, there are quite a few functions involved in the transducer --</p>
<ol>
<li>The transducer is itself a function. (Not really a complaint.)</li>
<li>It's a function of a function. (Not a complaint either. This is a functional language after all.)</li>
<li>The function returns a function.</li>
<li><code>map</code>-like transducers are functions of functions returning a function of a function that returns a function.</li>
</ol>
<p>-- of which the last is required to treat its <code>result</code> argument in a somewhat ritualistic fashion, passing it about in
certain ways while never looking at it.</p>
<h3>The sides of our intent</h3>
<p>Transducers address two weaknesses of the standard approach to chaining
operations over collections. First, they allow composition of sequence operations in a manner
independent of the collection or delivery mechanism holding the sequence. Second, they are
more efficient than the usual methods, because execution does not require reifying multiple
(possibly lazy) collections to feed each other, and the transformations are in sufficient proximity
that optimizer might be able to do something with them.</p>
<p>There are also at least two features that could be seen as advantageous but may not be as fundamental.
First, the composition can be accomplished, literally, with <code>comp</code>, because transducers are
functions of arity 1. Second, transducers are optimally suited for use in reduction operations,
since what that function does is transform a reducing (or combining) function.</p>
<p>Is it important to do all these things at the same time? Is there, at some mathematical level, profundity
in that we could do so? Perhaps, to both questions. But I don't think the answer is obvious.</p>
<h2>Modest proposal</h2>
<p>Actually two proposals, the more specific of which is indeed modest, offered only as an example
of the sort of thing one might think about. The broader proposal, that we should take a step back and
think about things, takes a bit of gall. To make up for that, I promise not to talk about type systems.</p>
<p>Rewind a couple years, and ask yourself what, to a functional programmer, is the most obvious way to
transform a sequence of values into another sequence, containing zero or more elements for each original value.
Wait, I know.
It's this:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">one-to-many</span> <span class="p">[</span><span class="nv">x</span><span class="p">]</span> <span class="p">(</span><span class="k">if </span><span class="p">(</span><span class="nf">something</span> <span class="nv">x</span><span class="p">)</span> <span class="p">[(</span><span class="nf">foo</span> <span class="nv">x</span><span class="p">)</span> <span class="p">(</span><span class="nf">bar</span> <span class="nv">x</span><span class="p">)</span> <span class="s">"whiffle"</span><span class="p">]</span> <span class="p">[]))</span>
<span class="p">(</span><span class="nb">mapcat </span><span class="nv">one-to-many</span> <span class="nv">my-seq</span><span class="p">))</span>
<span class="c1">;; composition</span>
<span class="p">(</span><span class="nf">->></span> <span class="nv">my-seq</span>
<span class="p">(</span><span class="nb">mapcat </span><span class="nv">one-to-many</span><span class="p">)</span>
<span class="p">(</span><span class="nb">mapcat </span><span class="nv">another-to-many</span><span class="p">))</span>
</code></pre></div>
<p>The pattern is pretty well enshrined; it is not controversial. Now, suppose you had the additional requirement
that the transformations might potentially depend on previous values. One obvious possibility is a function that
takes a state as well as an input and returns both the output values and the new state, e.g.
<code>[ [out1 out2 ...] new-state]</code>:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">one-to-many-with-state</span> <span class="p">[</span><span class="nv">x</span> <span class="nv">s</span><span class="p">]</span>
<span class="p">(</span><span class="k">if </span><span class="p">(</span><span class="nf">something</span> <span class="nv">x</span> <span class="nv">s</span><span class="p">)</span>
<span class="p">[[(</span><span class="nf">foo</span> <span class="nv">x</span><span class="p">)]</span> <span class="p">(</span><span class="nb">inc </span><span class="nv">s</span><span class="p">)]</span>
<span class="p">[[(</span><span class="nf">bar</span> <span class="nv">x</span><span class="p">)</span> <span class="p">(</span><span class="nf">bar</span> <span class="nv">x</span><span class="p">)]</span> <span class="p">(</span><span class="nb">dec </span><span class="nv">s</span><span class="p">)]))</span>
</code></pre></div>
<p>So far, this is all in the category of the obvious, and, happily, everything that follows
is in the category of internal implementation details that need not be obvious.</p>
<h3>mapcatreduce</h3>
<p>At this point, we're going to need something other than <code>mapcat</code> to apply our function.
This something is going to be a hybrid of <code>mapcat</code> and <code>reduce</code>, where the former
gets applied iteratively to each input, while the latter accumulates state.
This will do for now:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">translate-seqable</span> <span class="p">[</span><span class="nv">tl</span> <span class="nv">state</span> <span class="nv">inputs</span><span class="p">]</span>
<span class="p">(</span><span class="k">let </span><span class="p">[[</span><span class="nv">input</span> <span class="o">&</span> <span class="nv">inputs</span><span class="p">]</span> <span class="nv">inputs</span>
<span class="p">[</span><span class="nv">outputs</span> <span class="nv">state</span><span class="p">]</span> <span class="p">(</span><span class="nf">tl</span> <span class="nv">input</span> <span class="nv">state</span><span class="p">)]</span>
<span class="p">(</span><span class="k">if </span><span class="nv">inputs</span>
<span class="p">(</span><span class="nb">concat </span><span class="nv">outputs</span> <span class="p">(</span><span class="nf">lazy-seq</span> <span class="p">(</span><span class="nf">translate-seqable</span> <span class="nv">tl</span> <span class="nv">state</span> <span class="nv">inputs</span><span class="p">)))</span>
<span class="p">(</span><span class="nb">concat </span><span class="nv">outputs</span> <span class="p">(</span><span class="nb">first </span><span class="p">(</span><span class="nf">tl</span> <span class="nv">nil</span> <span class="nv">state</span><span class="p">))))))</span>
</code></pre></div>
<p>The last line introduces a small twist: we're going to invoke <code>tl</code> one last time with a <code>nil</code>
value, so it can clear out any residual state.</p>
<p>We can write a similar <code>mapcat</code>/<code>reduce</code> to translate the contents of an <code>async</code> channel:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn- </span><span class="nv">translate-channel</span> <span class="p">[</span><span class="nv">tl</span> <span class="nv">s0</span> <span class="nv">c-in</span> <span class="o">&</span> <span class="p">[</span><span class="nv">buf-or-n</span><span class="p">]]</span>
<span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">c-out</span> <span class="p">(</span><span class="nf">chan</span> <span class="nv">buf-or-n</span><span class="p">)]</span>
<span class="p">(</span><span class="nf">async/go-loop</span> <span class="p">[</span><span class="nv">state</span> <span class="nv">s0</span><span class="p">]</span>
<span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">v</span> <span class="p">(</span><span class="nf"><!</span> <span class="nv">c-in</span><span class="p">)]</span>
<span class="p">(</span><span class="nb">if-not </span><span class="p">(</span><span class="nb">nil? </span><span class="nv">v</span><span class="p">)</span>
<span class="p">(</span><span class="k">let </span><span class="p">[[</span><span class="nv">vs</span> <span class="nv">state</span><span class="p">]</span> <span class="p">(</span><span class="nf">tl</span> <span class="nv">v</span> <span class="nv">state</span><span class="p">)]</span>
<span class="p">(</span><span class="nf">async/onto-chan</span> <span class="nv">c-out</span> <span class="nv">vs</span> <span class="nv">false</span><span class="p">)</span>
<span class="p">(</span><span class="nf">recur</span> <span class="nv">state</span><span class="p">))</span>
<span class="p">(</span><span class="k">let </span><span class="p">[[</span><span class="nv">vs</span> <span class="nv">_</span><span class="p">]</span> <span class="p">(</span><span class="nf">tl</span> <span class="nv">nil</span> <span class="nv">state</span><span class="p">)]</span>
<span class="p">(</span><span class="nf">async/onto-chan</span> <span class="nv">c-out</span> <span class="nv">vs</span> <span class="nv">true</span><span class="p">)))))</span>
<span class="nv">c-out</span><span class="p">))</span>
</code></pre></div>
<p>Both of these could probably be written more efficiently, perhaps directly in Java,
but let's not bother with that yet.</p>
<h3>Example translators</h3>
<p>The finishing logic necessitates checking for
a <code>nil</code> input, which under most circumstances is just wrapping it in an <code>(if-not (nil? v) ...)</code>.</p>
<p>The standard mapping and predicate filtering ignore state,</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">tmap</span> <span class="p">[</span><span class="nv">f</span><span class="p">]</span>
<span class="p">(</span><span class="k">fn </span><span class="p">[</span><span class="nv">v</span> <span class="nv">_</span><span class="p">]</span> <span class="p">(</span><span class="nb">if-not </span><span class="p">(</span><span class="nb">nil? </span><span class="nv">v</span><span class="p">)</span> <span class="p">[[(</span><span class="nf">f</span> <span class="nv">v</span><span class="p">)]</span> <span class="nv">nil</span><span class="p">])))</span>
<span class="p">(</span><span class="kd">defn </span><span class="nv">tfilter</span> <span class="p">[</span><span class="nv">p</span><span class="p">]</span>
<span class="p">(</span><span class="k">fn </span><span class="p">[</span><span class="nv">v</span> <span class="nv">_</span><span class="p">]</span>
<span class="p">(</span><span class="nb">if-not </span><span class="p">(</span><span class="nb">nil? </span><span class="nv">v</span><span class="p">)</span>
<span class="p">[[(</span><span class="k">if </span><span class="p">(</span><span class="nf">p</span> <span class="nv">v</span><span class="p">)</span> <span class="p">[</span><span class="nv">v</span><span class="p">]</span> <span class="p">[])]</span> <span class="nv">nil</span><span class="p">])))</span>
</code></pre></div>
<p>while the canonical deduplicator uses it to hold a scalar:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">dedup</span> <span class="p">[</span><span class="nv">v</span> <span class="nv">state</span><span class="p">]</span>
<span class="p">(</span><span class="nb">if-not </span><span class="p">(</span><span class="nb">nil? </span><span class="nv">v</span><span class="p">)</span>
<span class="p">[(</span><span class="k">if </span><span class="p">(</span><span class="nb">not= </span><span class="nv">v</span> <span class="nv">state</span><span class="p">)</span> <span class="p">[</span><span class="nv">v</span><span class="p">]</span> <span class="p">[])</span> <span class="nv">v</span><span class="p">]))</span>
</code></pre></div>
<p>Duplication does not use state, but does make use of the ability to return multiple values:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">dup</span> <span class="p">[</span><span class="nv">v</span> <span class="nv">_</span><span class="p">]</span>
<span class="p">(</span><span class="nb">if-not </span><span class="p">(</span><span class="nb">nil? </span><span class="nv">v</span><span class="p">)</span> <span class="p">[[</span><span class="nv">v</span> <span class="nv">v</span><span class="p">]</span> <span class="nv">nil</span><span class="p">]))</span>
</code></pre></div>
<p>For fun, we can implement a sorting translator that just
swallows input until finishing time, when it releases the
sorted input in one go:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">tsort</span> <span class="p">[</span><span class="nv">v</span> <span class="nv">s</span><span class="p">]</span>
<span class="p">(</span><span class="nb">if-not </span><span class="p">(</span><span class="nb">nil? </span><span class="nv">v</span><span class="p">)</span> <span class="p">[[]</span> <span class="p">(</span><span class="nb">cons </span><span class="nv">v</span> <span class="nv">s</span><span class="p">)]</span> <span class="p">[(</span><span class="nb">sort </span><span class="nv">s</span><span class="p">)</span> <span class="nv">nil</span><span class="p">]))</span>
</code></pre></div>
<p>(To be fair, one can do this with transducers too.)</p>
<h3>composition</h3>
<p>Suppose, next, that we would like to be able to compose multiple
<code>one-to-many-with-state</code>-like functions into a single function to pass to
one of these <code>translate</code>s. This single function will consume a state that is
actually a vector of the states of the functions that compose it. The only
comments I'll make about the following proof-of-concept implementation are
that (1) it can undoubtedly be done more efficiently, (2) its complexity is
not something the user needs to think about:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">tcomp</span>
<span class="s">"Compose stateful translators; state in the returned translator is a vector of composed states."</span>
<span class="p">[</span><span class="o">&</span> <span class="nv">tls</span><span class="p">]</span>
<span class="p">(</span><span class="k">fn </span><span class="p">[</span><span class="nv">input</span> <span class="nv">states</span><span class="p">]</span>
<span class="p">(</span><span class="k">loop </span><span class="p">[[</span><span class="nv">tl</span> <span class="o">&</span> <span class="nv">tls</span><span class="p">]</span> <span class="nv">tls</span> <span class="c1">;; loop over translators</span>
<span class="p">[</span><span class="nv">s</span> <span class="o">&</span> <span class="nv">ss</span><span class="p">]</span> <span class="nv">states</span> <span class="c1">;; and states</span>
<span class="nv">input</span> <span class="p">[</span><span class="nv">input</span><span class="p">]</span> <span class="c1">;; accrue flattened inputs</span>
<span class="nv">s-acc</span> <span class="p">[]]</span> <span class="c1">;; new states</span>
<span class="p">(</span><span class="nb">if-not </span><span class="nv">tl</span>
<span class="p">[</span><span class="nv">input</span> <span class="nv">s-acc</span> <span class="p">]</span> <span class="c1">;; done</span>
<span class="p">(</span><span class="k">let </span><span class="p">[[</span><span class="nv">input</span> <span class="nv">s</span><span class="p">]</span> <span class="p">(</span><span class="nf">translate-seqable</span> <span class="nv">tl</span> <span class="nv">s</span> <span class="nv">input</span><span class="p">)]</span>
<span class="p">(</span><span class="nf">recur</span> <span class="nv">tls</span> <span class="nv">ss</span> <span class="nv">input</span> <span class="p">(</span><span class="nb">conj </span><span class="nv">s-acc</span> <span class="nv">s</span><span class="p">)))))))</span>
</code></pre></div>
<h3>protocols</h3>
<p>Finally, we wrap our <code>translate-whatever</code>s in a protocol, so we can just call <code>translate</code>, irrespective
of the container:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defprotocol </span><span class="nv">ITranslatable</span>
<span class="p">(</span><span class="nf">translate*</span> <span class="p">[</span><span class="nv">this</span> <span class="nv">s0</span> <span class="nv">tl</span><span class="p">]))</span>
<span class="p">(</span><span class="nf">extend-protocol</span> <span class="nv">ITranslatable</span>
<span class="nv">clojure.lang.Seqable</span>
<span class="p">(</span><span class="nf">translate*</span> <span class="p">[</span><span class="nv">this</span> <span class="nv">s0</span> <span class="nv">tl</span><span class="p">]</span> <span class="p">(</span><span class="nf">translate-seqable</span> <span class="nv">tl</span> <span class="nv">s0</span> <span class="nv">this</span><span class="p">))</span>
<span class="nv">clojure.core.async.impl.protocols.Channel</span>
<span class="p">(</span><span class="nf">translate*</span> <span class="p">[</span><span class="nv">this</span> <span class="nv">s0</span> <span class="nv">tl</span><span class="p">]</span> <span class="p">(</span><span class="nf">translate-channel</span> <span class="nv">tl</span> <span class="nv">s0</span> <span class="nv">this</span><span class="p">)))</span>
<span class="p">(</span><span class="kd">defn </span><span class="nv">translate</span> <span class="p">[</span><span class="nv">translator</span> <span class="nv">input</span><span class="p">]</span> <span class="p">(</span><span class="nf">translate*</span> <span class="nv">input</span> <span class="nv">nil</span> <span class="nv">translator</span><span class="p">))</span>
</code></pre></div>
<h3>rah</h3>
<p>For completeness, </p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="k">def </span><span class="nv">blort</span> <span class="p">(</span><span class="nf">tcomp</span> <span class="p">(</span><span class="nf">tmap</span> <span class="nv">inc</span><span class="p">)</span> <span class="p">(</span><span class="nf">tfilter</span> <span class="nv">even?</span><span class="p">)</span> <span class="nv">dedup</span> <span class="p">(</span><span class="nf">tmap</span> <span class="nv">inc</span><span class="p">)</span> <span class="nv">tsort</span><span class="p">))</span>
<span class="p">(</span><span class="nf">translate</span> <span class="nv">blort</span> <span class="p">[</span><span class="mi">3</span> <span class="mi">1</span> <span class="mi">1</span> <span class="mi">2</span> <span class="mi">5</span> <span class="mi">1</span><span class="p">])</span>
<span class="c1">;;(3 3 5 7)</span>
<span class="p">(</span><span class="k">def </span><span class="nv">c</span> <span class="p">(</span><span class="nf">chan</span><span class="p">))</span>
<span class="p">(</span><span class="k">def </span><span class="nv">ct</span> <span class="p">(</span><span class="nf">translate</span> <span class="nv">blort</span> <span class="nv">c</span><span class="p">)</span>
<span class="p">(</span><span class="nf">async/onto-chan</span> <span class="nv">c</span> <span class="p">[</span><span class="mi">3</span> <span class="mi">1</span> <span class="mi">1</span> <span class="mi">2</span> <span class="mi">5</span> <span class="mi">1</span><span class="p">])</span>
<span class="p">(</span><span class="nf"><!!</span> <span class="p">(</span><span class="nf">async/into</span> <span class="p">[]</span> <span class="nv">ct</span><span class="p">))</span>
<span class="c1">;; [3 3 5 7]</span>
</code></pre></div>
<p>As bonuses, we never have to deal with mutability, and we don't have to take
any special care to do exactly the right thing with those <code>result</code> values we're not
supposed to look at.</p>
<h2>Question and Answer</h2>
<h3>Do you hate transducers?</h3>
<p>No, I think they're cool.</p>
<h3>So, what are you complaining about?</h3>
<p>Maybe they're too complicated. Yes, I can wrap my head around that complexity, but that's the
sort of thing I do for fun. Most aspects of Clojure not only do not require a love of complexity
but appeal to people who loathe it. Transducers stand out as an exception.</p>
<h3>What's so complicated?</h3>
<p>I answered this above, but, to summarize, the <code>reduce</code> heritage seems to stick out more than necessary.</p>
<h3>Didn't Christophe Grand already make these points?</h3>
<p>He made some of them. As I recall (his site is down at the moment),
he too objected to the visibility of <code>reduce</code> semantics in contexts that were really about transformation.
However, he reached somewhat different conclusions from mine, in the end proposing that every
transducer-like thing be intrinsically stateful and mutative, in the interests of simplicity and efficiency.
The arguments for doing this could be applied nearly anywhere we use a functional style, which implies
strong arguments against it that are familiar enough not to require rehearsal here.</p>
<h3>Your solution is inefficient.</h3>
<p>True, but it's not supposed to be a solution. It's somewhere between
a rumination and a proof of concept. Something to think about. If we were
to go in a similar direction, there are many possible optimizations. For example,
I don't see anything wrong with violating functional purity within core language
facilities; rather, what bothers me is when users are regularly required to do so.</p>
<div class="footnote">
<hr>
<ol>
<li id="fn:shark">
<p>Does anyone have a clip of the scene in Battlestar Galactica where someone actually shouts, "Jump the shark!"? Google is failing me here. <a class="footnote-backref" href="#fnref:shark" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
<li id="fn:sier">
<p>He didn't use any of these illustrations, particularly not the third one. <a class="footnote-backref" href="#fnref:sier" title="Jump back to footnote 2 in the text">↩</a></p>
</li>
<li id="fn:tutorials">
<p>Guilty as charged. <a class="footnote-backref" href="#fnref:tutorials" title="Jump back to footnote 3 in the text">↩</a></p>
</li>
</ol>
</div>Stateless but state-aware types for transducers in Scala, using what seems to be magic2014-11-11T00:00:00-05:002014-11-11T00:00:00-05:00Peter Fraenkeltag:blog.podsnap.com,2014-11-11:/ducers4.html<p>How strange to think that, a mere week ago, the world had not yet heard
my public pronouncement that
<a href="http://blog.podsnap.com/ducers3.html">transducers ought to be stateless</a>.
True, the media frenzy has died down a bit, but in its stead comes the quiet
awareness that life will never be the same. Anyway, that's the way I like
to think about it.</p>
<h4>TL;DR</h4>
<ol>
<li>Storing state in the transducer makes it mutable, which
might be unfortunate on general principles. In any event, it interferes with
the metaphor of transducers as recipes for transformation and arguably makes
code more difficult to understand.</li>
<li>A natural place …</li></ol><p>How strange to think that, a mere week ago, the world had not yet heard
my public pronouncement that
<a href="http://blog.podsnap.com/ducers3.html">transducers ought to be stateless</a>.
True, the media frenzy has died down a bit, but in its stead comes the quiet
awareness that life will never be the same. Anyway, that's the way I like
to think about it.</p>
<h4>TL;DR</h4>
<ol>
<li>Storing state in the transducer makes it mutable, which
might be unfortunate on general principles. In any event, it interferes with
the metaphor of transducers as recipes for transformation and arguably makes
code more difficult to understand.</li>
<li>A natural place to store state, such as the previous value, is in the accumulating
reduction. After all, the reduction is already changing.</li>
<li>So we make the reduction value a wrapper around the original reduction of interest
(e.g. a sum) and the internal state being maintained by the transducer, e.g. <code>RH[Double,Option[S]]</code>.</li>
<li>Since state-aware transducers can be composed, there may be multiple layers of nesting, with
different types of state at each level, so</li>
<li>the <strong>the type of the reduction value</strong> is also nested, e.g. <code>RH[RH[RH[R,Option[S1]],Option[S2]],Option[S3]]</code>.</li>
<li>The <strong>type of the transducer</strong> must encode not the type of the reduction value, but
the <strong>way it will modify</strong> the type of the reduction.</li>
<li>The composition of multiple modification encodings seems like it might be nontrivial.</li>
<li>BUT: Using parameterized type projections in Scala, we can write what are effectively functional
programs to <strong>compute the new type of the reduction given the type of the transducer</strong> and
to <strong>compute the new type of a transducer composed from others</strong>.</li>
<li>The gussying up process will involve type constraints (nice!), implicitous fiddling (uh huh ...) and manifests (hmmf).</li>
</ol>
<p>We will use funny symbols!</p>
<h1>=:= ⟐ # ∘</h1>
<p>(Only half of them gratuitously.)</p>
<h4>Language Warnings</h4>
<ol>
<li>Other than the traditional association of transducers with Clojure and the plea implicit in
this post for help in doing what I do here in <code>core.typed</code>, there is no further Clojure
to be found here.</li>
<li><a href="http://blog.podsnap.com/ducers2.html">Several posts ago</a>, I kind of suggested
using <strike>existential</strike> universal types for transducers in Scala. Doing so would actually be a mistake. I'll
instead be generalizing from one of the much improved implementations suggested to me by thoughtful readers.</li>
</ol>
<h4>Overture</h4>
<p>But really, they shouldn't have state.</p>
<p>The bulk of this post is going to be about the combined ramifications of
state-aware but stateless transducers and static typing, but I wanted to
back a bit and spend a moment reinforcing the argument against stateful
transducers.</p>
<p>In his
<a href="https://www.youtube.com/watch?v=6mTbuzafcII&noredirect=1">Strangeloop talk</a>,
Rich Hickey used the metaphor of luggage handling. A transducer is akin to
instructions to unbundle pallets containing multiple bags (mapcatting), or
remove bags that smell like food (filtering), or wrap up ones that look like
they'll break (mapping). Each instruction should be independent of whether
the cargo arrives on a conveyor belt or the back of a tractor, and, furthermore,
the combination of instructions should also be deliverable independent of
transport.</p>
<p>In this light, he notes, it seems strange that we're ok with <code>map</code>, <code>filter</code> and <code>mapcat</code>/<code>flatMap</code>
methods specific to <code>core.async</code> channels or streams or lists. It's especially
bizarre that we tolerate transport-specific <em>composition</em> of such operations. I.e.,
in Scala-esque, we could have textually similar lines like</p>
<div class="highlight"><pre><span></span><code> <span class="n">theList</span><span class="p">.</span><span class="n">map</span><span class="p">(</span><span class="n">wrapWithTape</span><span class="p">).</span><span class="n">filter</span><span class="p">(</span><span class="n">isSmelly</span><span class="p">)</span>
<span class="n">theChan</span><span class="p">.</span><span class="n">map</span><span class="p">(</span><span class="n">wrapWithTape</span><span class="p">).</span><span class="n">filter</span><span class="p">(</span><span class="n">isSmelly</span><span class="p">)</span>
</code></pre></div>
<p>where you cannot possibly factor out the stuff after the first <code>.</code> because it
embeds container methods.</p>
<p>With lovely transducers, it would be more like this:</p>
<div class="highlight"><pre><span></span><code> <span class="kd">val</span> <span class="n">td</span> <span class="o">=</span> <span class="n">mapping</span><span class="p">(</span><span class="n">wrapWithTape</span><span class="p">)</span> <span class="n">compose</span> <span class="n">filtering</span><span class="p">(</span><span class="n">isSmelly</span><span class="p">)</span>
<span class="n">theList</span><span class="p">.</span><span class="n">sequence</span><span class="p">(</span><span class="n">td</span><span class="p">)</span>
<span class="n">theChan</span><span class="p">.</span><span class="n">sequence</span><span class="p">(</span><span class="n">td</span><span class="p">)</span>
</code></pre></div>
<p>where <code>td</code> and the things that compose it are just reducing functions
<code>(R,A)=>A</code>. So far, so good.</p>
<p>Here are two more luggage examples:</p>
<ol>
<li>If you find consecutive bags belonging to the same person, wrap them up together.</li>
<li>If you find a bag that's ticking, stop the entire operation.</li>
</ol>
<p>Both of these involve state of different sorts - keeping track, respectively, of the previous bag's owner or
of whether a tick has ever been heard - but, when we transition back to computer-land,
the suggested location of that state differs. The halt order is conveyed by wrapping the
reduction value in a <code>Reduced</code> object, while bundling state is kept in an atom
inside the transducer itself.</p>
<p>Back to our story.</p>
<p>Last week I discussed applying <code>Reduced</code> approach to state of all kinds and
illustrated how it might be done in Clojure - conventional, untyped Clojure.
This week, it's Scala and static typing. Things get interesting.</p>
<h4>Baseline transducer</h4>
<p>Here. This is <a href="https://gist.github.com/paulp/7c69c7ba268686402b97">stolen outright</a> from Paul Phillips and then
slightly mangled:</p>
<div class="highlight"><pre><span></span><code> <span class="k">type</span> <span class="nc">RFn</span><span class="p">[</span><span class="o">-</span><span class="nc">A</span><span class="p">,</span> <span class="nc">R</span><span class="p">]</span> <span class="o">=</span> <span class="p">(</span><span class="nc">R</span><span class="p">,</span> <span class="nc">A</span><span class="p">)</span> <span class="o">=></span> <span class="nc">R</span>
<span class="k">trait</span> <span class="nc">Transducer</span><span class="p">[</span><span class="o">+</span><span class="nc">A</span><span class="p">,</span> <span class="o">-</span><span class="nc">B</span><span class="p">]</span> <span class="p">{</span>
<span class="k">def</span> <span class="nf">apply</span><span class="p">[</span><span class="nc">R</span><span class="p">](</span><span class="n">f</span><span class="p">:</span> <span class="nc">RFn</span><span class="p">[</span><span class="nc">A</span><span class="p">,</span> <span class="nc">R</span><span class="p">]):</span> <span class="nc">RFn</span><span class="p">[</span><span class="nc">B</span><span class="p">,</span> <span class="nc">R</span><span class="p">]</span>
<span class="k">def</span> <span class="nf">compose</span><span class="p">[</span><span class="nc">C</span><span class="p">](</span><span class="n">t2</span> <span class="p">:</span> <span class="nc">Transducer</span><span class="p">[</span><span class="nc">C</span><span class="p">,</span> <span class="nc">A</span><span class="p">]):</span> <span class="nc">Transducer</span><span class="p">[</span><span class="nc">C</span><span class="p">,</span> <span class="nc">B</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span>
<span class="kd">val</span> <span class="n">t1</span> <span class="o">=</span> <span class="bp">this</span>
<span class="k">new</span> <span class="nc">Transducer</span><span class="p">[</span><span class="nc">C</span><span class="p">,</span> <span class="nc">B</span><span class="p">]</span> <span class="p">{</span> <span class="k">def</span> <span class="nf">apply</span><span class="p">[</span><span class="nc">R</span><span class="p">](</span><span class="n">rf</span><span class="p">:</span> <span class="nc">RFn</span><span class="p">[</span><span class="nc">C</span><span class="p">,</span> <span class="nc">R</span><span class="p">])</span> <span class="o">=</span> <span class="n">t1</span><span class="p">(</span><span class="n">t2</span><span class="p">(</span><span class="n">rf</span><span class="p">))</span> <span class="p">}}</span>
<span class="k">def</span> <span class="nf">∘</span><span class="p">[</span><span class="nc">C</span><span class="p">]</span> <span class="o">=</span> <span class="n">compose</span><span class="p">[</span><span class="nc">C</span><span class="p">]</span> <span class="n">_</span>
<span class="k">def</span> <span class="nf">⟐</span><span class="p">[</span><span class="nc">R</span><span class="p">]</span> <span class="o">=</span> <span class="n">apply</span><span class="p">[</span><span class="nc">R</span><span class="p">]</span> <span class="n">_</span>
<span class="p">}</span>
<span class="k">def</span> <span class="nf">map</span><span class="p">[</span><span class="nc">A</span><span class="p">,</span> <span class="nc">B</span><span class="p">](</span><span class="n">f</span><span class="p">:</span> <span class="nc">A</span> <span class="o">=></span> <span class="nc">B</span><span class="p">):</span> <span class="nc">Transducer</span><span class="p">[</span><span class="nc">B</span><span class="p">,</span> <span class="nc">A</span><span class="p">]</span> <span class="o">=</span>
<span class="k">new</span> <span class="nc">Transducer</span><span class="p">[</span><span class="nc">B</span><span class="p">,</span> <span class="nc">A</span><span class="p">]</span> <span class="p">{</span> <span class="k">def</span> <span class="nf">apply</span><span class="p">[</span><span class="nc">R</span><span class="p">](</span><span class="n">rf</span><span class="p">:</span> <span class="nc">RFn</span><span class="p">[</span><span class="nc">B</span><span class="p">,</span> <span class="nc">R</span><span class="p">])</span> <span class="o">=</span> <span class="p">(</span><span class="n">r</span><span class="p">,</span> <span class="n">a</span><span class="p">)</span> <span class="o">=></span> <span class="n">rf</span><span class="p">(</span><span class="n">r</span><span class="p">,</span> <span class="n">f</span><span class="p">(</span><span class="n">a</span><span class="p">))</span> <span class="p">}</span>
<span class="k">def</span> <span class="nf">sequence</span><span class="p">[</span><span class="nc">A</span><span class="p">,</span> <span class="nc">B</span><span class="p">](</span><span class="n">t</span><span class="p">:</span> <span class="nc">Transducer</span><span class="p">[</span><span class="nc">B</span><span class="p">,</span> <span class="nc">A</span><span class="p">],</span> <span class="n">data</span><span class="p">:</span> <span class="nc">Seq</span><span class="p">[</span><span class="nc">A</span><span class="p">])</span> <span class="o">=</span> <span class="n">data</span><span class="p">.</span><span class="n">foldLeft</span><span class="p">(</span><span class="nc">Seq</span><span class="p">[</span><span class="nc">B</span><span class="p">]())(</span><span class="n">t</span><span class="p">(</span><span class="n">_</span> <span class="o">:+</span> <span class="n">_</span><span class="p">))</span>
</code></pre></div>
<p>This clean, straightforward and uses rather than abuses type inference.
I'm having trouble reconstructing the reasons I originally objected to this approach.</p>
<p>We can perform the usual tricks:</p>
<div class="highlight"><pre><span></span><code> <span class="kd">val</span> <span class="n">t_parsei</span><span class="p">:</span> <span class="nc">Transducer</span><span class="p">[</span><span class="nc">Int</span><span class="p">,</span> <span class="nc">String</span><span class="p">]</span> <span class="o">=</span> <span class="n">map</span><span class="p">(</span><span class="n">_</span><span class="p">.</span><span class="n">toInt</span><span class="p">)</span>
<span class="kd">val</span> <span class="n">t_root2</span> <span class="p">:</span> <span class="nc">Transducer</span><span class="p">[</span><span class="nc">Double</span><span class="p">,</span> <span class="nc">Int</span><span class="p">]</span> <span class="o">=</span> <span class="n">map</span><span class="p">(</span><span class="n">i</span> <span class="o">=></span> <span class="nc">Math</span><span class="p">.</span><span class="n">pow</span><span class="p">(</span><span class="mf">2.0</span><span class="p">,</span> <span class="mf">1.0</span> <span class="o">/</span> <span class="n">i</span><span class="p">))</span>
<span class="k">def</span> <span class="nf">t_repeat</span><span class="p">[</span><span class="nc">A</span><span class="p">,</span> <span class="nc">R</span><span class="p">]</span> <span class="o">=</span> <span class="k">new</span> <span class="nc">Transducer</span><span class="p">[</span><span class="nc">A</span><span class="p">,</span> <span class="nc">A</span><span class="p">]</span> <span class="p">{</span> <span class="k">def</span> <span class="nf">apply</span><span class="p">[</span><span class="nc">R</span><span class="p">](</span><span class="n">rf</span><span class="p">:</span> <span class="nc">RFn</span><span class="p">[</span><span class="nc">A</span><span class="p">,</span> <span class="nc">R</span><span class="p">])</span> <span class="o">=</span> <span class="p">(</span><span class="n">r</span><span class="p">,</span> <span class="n">a</span><span class="p">)</span> <span class="o">=></span> <span class="n">rf</span><span class="p">(</span><span class="n">rf</span><span class="p">(</span><span class="n">r</span><span class="p">,</span> <span class="n">a</span><span class="p">),</span> <span class="n">a</span><span class="p">)</span> <span class="p">}</span>
<span class="n">println</span><span class="p">(</span><span class="n">sequence</span><span class="p">(</span><span class="n">t_parsei</span> <span class="n">∘</span> <span class="n">t_repeat</span> <span class="n">∘</span> <span class="n">t_root2</span><span class="p">,</span> <span class="nc">List</span><span class="p">(</span><span class="s">"1"</span><span class="p">,</span> <span class="s">"2"</span><span class="p">,</span> <span class="s">"3"</span><span class="p">)))</span>
<span class="n">println</span><span class="p">(</span><span class="nc">List</span><span class="p">(</span><span class="s">"1"</span><span class="p">,</span> <span class="s">"2"</span><span class="p">,</span> <span class="s">"3"</span><span class="p">).</span><span class="n">foldLeft</span><span class="p">(</span><span class="mf">0.0d</span><span class="p">)(</span><span class="n">t_parsei</span> <span class="n">∘</span> <span class="n">t_repeat</span> <span class="n">∘</span> <span class="n">t_root2</span> <span class="n">⟐</span> <span class="p">(</span><span class="n">_</span> <span class="o">+</span> <span class="n">_</span><span class="p">)))</span>
</code></pre></div>
<p>Now let's introduce state. Recalling the canonical Clojure example of a deduplicator,</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="kd">defn </span><span class="nv">t-dedup</span> <span class="p">[]</span>
<span class="p">(</span><span class="k">fn </span><span class="p">[</span><span class="nv">xf</span><span class="p">]</span>
<span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">prev</span> <span class="p">(</span><span class="nf">volatile!</span> <span class="ss">::none</span><span class="p">)]</span>
<span class="p">(</span><span class="k">fn </span><span class="p">[</span><span class="nv">result</span> <span class="nv">input</span><span class="p">]</span>
<span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">prior</span> <span class="o">@</span><span class="nv">prev</span><span class="p">]</span>
<span class="p">(</span><span class="nf">vreset!</span> <span class="nv">prev</span> <span class="nv">input</span><span class="p">)</span>
<span class="p">(</span><span class="k">if </span><span class="p">(</span><span class="nb">= </span><span class="nv">prior</span> <span class="nv">input</span><span class="p">)</span>
<span class="nv">result</span>
<span class="p">(</span><span class="nf">xf</span> <span class="nv">result</span> <span class="nv">input</span><span class="p">)))))))</span>
</code></pre></div>
<p>which store state in the volatile <code>prev</code> variable, let's write a Scala version in the obvious way,</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">t_dedup</span><span class="p">[</span><span class="nc">A</span><span class="p">]</span> <span class="o">=</span> <span class="k">new</span> <span class="nc">Transducer</span><span class="p">[</span><span class="nc">A</span><span class="p">,</span><span class="nc">A</span><span class="p">]</span> <span class="p">{</span>
<span class="kd">var</span> <span class="n">aPrev</span> <span class="p">:</span> <span class="nc">Option</span><span class="p">[</span><span class="nc">A</span><span class="p">]</span> <span class="o">=</span> <span class="nc">None</span>
<span class="k">def</span> <span class="nf">apply</span><span class="p">[</span><span class="nc">R</span><span class="p">](</span><span class="n">rf</span><span class="p">:</span><span class="nc">RFn</span><span class="p">[</span><span class="nc">A</span><span class="p">,</span><span class="nc">R</span><span class="p">])</span> <span class="p">:</span> <span class="nc">RFn</span><span class="p">[</span><span class="nc">A</span><span class="p">,</span><span class="nc">R</span><span class="p">]</span> <span class="o">=</span> <span class="p">{(</span><span class="n">r</span><span class="p">:</span><span class="nc">R</span><span class="p">,</span><span class="n">a</span><span class="p">:</span><span class="nc">A</span><span class="p">)</span> <span class="o">=></span> <span class="n">aPrev</span> <span class="k">match</span> <span class="p">{</span>
<span class="k">case</span> <span class="nc">Some</span><span class="p">(</span><span class="n">a0</span><span class="p">)</span> <span class="k">if</span> <span class="n">a</span><span class="o">==</span><span class="n">a0</span> <span class="o">=></span> <span class="n">r</span>
<span class="k">case</span> <span class="n">_</span> <span class="o">=></span> <span class="p">{</span>
<span class="n">aPrev</span> <span class="o">=</span> <span class="nc">Some</span><span class="p">(</span><span class="n">a</span><span class="p">)</span>
<span class="n">rf</span><span class="p">(</span><span class="n">r</span><span class="p">,</span><span class="n">a</span><span class="p">)</span>
<span class="p">}}}}</span>
</code></pre></div>
<p>by storing the previous value in the <code>aPrev</code> option. That looks OK, but there's trouble abrewing:</p>
<div class="highlight"><pre><span></span><code> <span class="n">scala</span><span class="o">></span> <span class="kd">val</span> <span class="n">t</span> <span class="o">=</span> <span class="n">t_parsei</span> <span class="n">∘</span> <span class="n">t_dedup</span><span class="p">[</span><span class="nc">Int</span><span class="p">]</span>
<span class="n">scala</span><span class="o">></span> <span class="n">println</span><span class="p">(</span><span class="n">sequence</span><span class="p">(</span><span class="n">t</span><span class="p">,</span><span class="nc">List</span><span class="p">(</span><span class="s">"1"</span><span class="p">,</span><span class="s">"2"</span><span class="p">,</span><span class="s">"2"</span><span class="p">,</span><span class="s">"3"</span><span class="p">)))</span>
<span class="nc">List</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">)</span>
<span class="n">println</span><span class="p">(</span><span class="n">sequence</span><span class="p">(</span><span class="n">t</span><span class="p">,</span><span class="nc">List</span><span class="p">(</span><span class="s">"3"</span><span class="p">,</span><span class="s">"2"</span><span class="p">,</span><span class="s">"1"</span><span class="p">)))</span>
<span class="nc">List</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span><span class="mi">1</span><span class="p">))</span>
</code></pre></div>
<p>Eek. The transducer thought the first "3" in the second sequence was a repeat, because it still
held state from the first sequence.</p>
<p>We can kick the can down the road a bit by initializing the option in the <code>apply</code> method,
thus allowing repeated uses of <code>t</code>:</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">t_dedup</span><span class="p">[</span><span class="nc">A</span><span class="p">]</span> <span class="o">=</span> <span class="k">new</span> <span class="nc">Transducer</span><span class="p">[</span><span class="nc">A</span><span class="p">,</span><span class="nc">A</span><span class="p">]</span> <span class="p">{</span>
<span class="k">def</span> <span class="nf">apply</span><span class="p">[</span><span class="nc">R</span><span class="p">](</span><span class="n">rf</span><span class="p">:</span><span class="nc">RFn</span><span class="p">[</span><span class="nc">A</span><span class="p">,</span><span class="nc">R</span><span class="p">])</span> <span class="p">:</span> <span class="nc">RFn</span><span class="p">[</span><span class="nc">A</span><span class="p">,</span><span class="nc">R</span><span class="p">]</span> <span class="o">=</span> <span class="k">new</span> <span class="nc">Function2</span><span class="p">[</span><span class="nc">R</span><span class="p">,</span><span class="nc">A</span><span class="p">,</span><span class="nc">R</span><span class="p">]</span> <span class="p">{</span>
<span class="kd">var</span> <span class="n">aPrev</span> <span class="p">:</span> <span class="nc">Option</span><span class="p">[</span><span class="nc">A</span><span class="p">]</span> <span class="o">=</span> <span class="nc">None</span>
<span class="k">def</span> <span class="nf">apply</span><span class="p">(</span><span class="n">r</span><span class="p">:</span><span class="nc">R</span><span class="p">,</span><span class="n">a</span><span class="p">:</span><span class="nc">A</span><span class="p">)</span> <span class="o">=</span> <span class="n">aPrev</span> <span class="k">match</span> <span class="p">{</span>
<span class="k">case</span> <span class="nc">Some</span><span class="p">(</span><span class="n">a0</span><span class="p">)</span> <span class="k">if</span> <span class="n">a</span><span class="o">==</span><span class="n">a0</span> <span class="o">=></span> <span class="n">r</span>
<span class="k">case</span> <span class="n">_</span> <span class="o">=></span> <span class="p">{</span>
<span class="n">aPrev</span> <span class="o">=</span> <span class="nc">Some</span><span class="p">(</span><span class="n">a</span><span class="p">)</span>
<span class="n">rf</span><span class="p">(</span><span class="n">r</span><span class="p">,</span><span class="n">a</span><span class="p">)</span>
<span class="p">}}}}}</span>
<span class="kd">val</span> <span class="n">t</span> <span class="o">=</span> <span class="n">t_parsei</span> <span class="n">∘</span> <span class="n">t_dedup</span><span class="p">[</span><span class="nc">Int</span><span class="p">]</span>
<span class="n">println</span><span class="p">(</span><span class="n">sequence</span><span class="p">(</span><span class="n">t</span><span class="p">,</span><span class="nc">List</span><span class="p">(</span><span class="s">"1"</span><span class="p">,</span><span class="s">"2"</span><span class="p">,</span><span class="s">"2"</span><span class="p">,</span><span class="s">"3"</span><span class="p">)))</span> <span class="c1">// List (1,2,3)</span>
<span class="n">println</span><span class="p">(</span><span class="n">sequence</span><span class="p">(</span><span class="n">t</span><span class="p">,</span><span class="nc">List</span><span class="p">(</span><span class="s">"3"</span><span class="p">,</span><span class="s">"2"</span><span class="p">,</span><span class="s">"1"</span><span class="p">)))</span> <span class="c1">// List (3,2,1)</span>
</code></pre></div>
<p>but we'd still run into trouble if we had the nerve to re-use a transformed <code>RFn</code>:</p>
<div class="highlight"><pre><span></span><code> <span class="kd">val</span> <span class="n">r</span> <span class="o">=</span> <span class="p">(</span><span class="n">t_parsei</span> <span class="n">∘</span> <span class="n">t_dedup</span><span class="p">[</span><span class="nc">Int</span><span class="p">])</span> <span class="n">⟐</span> <span class="p">{(</span><span class="n">x</span><span class="p">:</span><span class="nc">Int</span><span class="p">,</span> <span class="n">y</span><span class="p">:</span><span class="nc">Int</span><span class="p">)</span> <span class="o">=></span> <span class="n">x</span> <span class="o">+</span> <span class="n">y</span><span class="p">}</span>
<span class="n">println</span><span class="p">(</span><span class="nc">List</span><span class="p">(</span><span class="s">"1"</span><span class="p">,</span><span class="s">"2"</span><span class="p">,</span><span class="s">"2"</span><span class="p">,</span><span class="s">"3"</span><span class="p">).</span><span class="n">foldLeft</span><span class="p">(</span><span class="mi">0</span><span class="p">)(</span><span class="n">r</span><span class="p">))</span>
<span class="n">println</span><span class="p">(</span><span class="nc">List</span><span class="p">(</span><span class="s">"3"</span><span class="p">,</span><span class="s">"2"</span><span class="p">,</span><span class="s">"1"</span><span class="p">).</span><span class="n">foldLeft</span><span class="p">(</span><span class="mi">0</span><span class="p">)(</span><span class="n">r</span><span class="p">))</span>
</code></pre></div>
<p>gives us <code>6</code> and then <code>3</code>. That won't do.</p>
<h4>Wrapping state</h4>
<p>Let's introduce a reduction holder, so we can store state alongside the reduction value,</p>
<div class="highlight"><pre><span></span><code> <span class="k">case</span> <span class="k">class</span> <span class="nc">RH</span><span class="p">[</span><span class="nc">R</span><span class="p">,</span><span class="nc">S</span><span class="p">](</span><span class="n">r</span><span class="p">:</span><span class="nc">R</span><span class="p">,</span> <span class="n">s</span><span class="p">:</span> <span class="nc">Option</span><span class="p">[</span><span class="nc">S</span><span class="p">])</span>
</code></pre></div>
<p>and rewrite <code>t_dedup</code> to use it:</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">t_dedup</span><span class="p">[</span><span class="nc">A</span><span class="p">]</span> <span class="o">=</span> <span class="k">new</span> <span class="nc">Transducer</span><span class="p">[</span><span class="nc">A</span><span class="p">,</span><span class="nc">A</span><span class="p">]</span> <span class="p">{</span>
<span class="k">def</span> <span class="nf">apply</span><span class="p">[</span><span class="nc">R</span><span class="p">](</span><span class="n">rf</span> <span class="p">:</span> <span class="nc">RFn</span><span class="p">[</span><span class="nc">A</span><span class="p">,</span><span class="nc">R</span><span class="p">])</span> <span class="p">:</span> <span class="nc">RFn</span><span class="p">[</span><span class="nc">A</span><span class="p">,</span><span class="nc">RH</span><span class="p">[</span><span class="nc">R</span><span class="p">,</span><span class="nc">A</span><span class="p">]]</span><span class="o">=</span> <span class="p">{(</span><span class="n">rw</span><span class="p">,</span><span class="n">a</span><span class="p">)</span> <span class="o">=></span>
<span class="n">rw</span> <span class="k">match</span> <span class="p">{</span>
<span class="k">case</span> <span class="nc">RH</span><span class="p">(</span><span class="n">r</span><span class="p">,</span><span class="nc">None</span><span class="p">)</span> <span class="o">=></span> <span class="nc">RH</span><span class="p">(</span><span class="n">rf</span><span class="p">(</span><span class="n">r</span><span class="p">,</span><span class="n">a</span><span class="p">),</span><span class="nc">Some</span><span class="p">(</span><span class="n">a</span><span class="p">))</span>
<span class="k">case</span> <span class="nc">RH</span><span class="p">(</span><span class="n">r</span><span class="p">,</span><span class="nc">Some</span><span class="p">(</span><span class="n">a0</span><span class="p">))</span> <span class="o">=></span> <span class="k">if</span> <span class="p">(</span><span class="n">a</span><span class="o">==</span><span class="n">a0</span><span class="p">)</span> <span class="n">rw</span> <span class="k">else</span> <span class="nc">RH</span><span class="p">(</span><span class="n">rf</span><span class="p">(</span><span class="n">r</span><span class="p">,</span><span class="n">a</span><span class="p">),</span><span class="nc">Some</span><span class="p">(</span><span class="n">a</span><span class="p">))</span>
<span class="p">}</span>
<span class="p">}}</span>
</code></pre></div>
<p>Now, we match against the accumulated reduction rather than against a mutable variable, which is
totally awesome... except that it won't compile, because <code>apply</code> has the wrong return type.</p>
<p>No big deal, we'll redefine the transducer</p>
<div class="highlight"><pre><span></span><code> <span class="nv">trait</span> <span class="nv">Transducer</span><span class="p">[</span><span class="nv">+A</span>, <span class="nv">-B</span>, <span class="nv">S</span><span class="p">]</span> <span class="p">{</span>
<span class="k">def </span><span class="nv">apply</span><span class="p">[</span><span class="nv">R</span><span class="p">](</span><span class="nf">f</span><span class="err">:</span> <span class="nv">RFn</span><span class="p">[</span><span class="nv">A</span>, <span class="nv">R</span><span class="p">])</span><span class="err">:</span> <span class="nv">RFn</span><span class="p">[</span><span class="nv">B</span>, <span class="nv">RH</span><span class="p">[</span><span class="nv">R</span>,<span class="nv">S</span><span class="p">]]</span>
<span class="nv">...</span>
<span class="p">}</span>
</code></pre></div>
<p>so that function application <em>always</em> wraps up state. Sometimes, we won't need state, but we can always
store something and then not use it.</p>
<p>A much more serious problem is that we no longer know how to write <code>compose</code>.
We know that the composition of <code>Transducer[C,A,S]</code> and <code>Transducer[C,A,T]</code> will
need to have an <code>apply</code> method with return type <code>RFn[B,RH[RH[R,S],T]]</code>, as
the reduction is first wrapped with an <code>S</code> and then again with a <code>T</code>:</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">compose</span><span class="p">[</span><span class="nc">C</span><span class="p">](</span><span class="n">t2</span> <span class="p">:</span> <span class="nc">Transducer</span><span class="p">[</span><span class="nc">C</span><span class="p">,</span> <span class="nc">A</span><span class="p">,</span> <span class="nc">T</span><span class="p">]):</span> <span class="nc">Transducer</span><span class="p">[</span><span class="nc">C</span><span class="p">,</span> <span class="nc">B</span><span class="p">]</span> <span class="o">=</span>
<span class="k">new</span> <span class="nc">Transducer</span><span class="p">[</span><span class="nc">C</span><span class="p">,</span> <span class="nc">B</span><span class="p">,</span> <span class="nc">DRAT_DRAT_AND_DOUBLE_DRAT</span> <span class="p">]</span> <span class="p">{</span>
<span class="k">def</span> <span class="nf">apply</span><span class="p">[</span><span class="nc">R</span><span class="p">](</span><span class="n">rf</span><span class="p">:</span> <span class="nc">RFn</span><span class="p">[</span><span class="nc">C</span><span class="p">,</span> <span class="nc">R</span><span class="p">])</span> <span class="p">:</span> <span class="nc">RFn</span><span class="p">[</span><span class="nc">B</span><span class="p">,</span> <span class="nc">RH</span><span class="p">[</span><span class="nc">RH</span><span class="p">[</span><span class="nc">R</span><span class="p">,</span><span class="nc">T</span><span class="p">],</span><span class="nc">S</span><span class="p">]]</span> <span class="o">=</span> <span class="n">apply</span><span class="p">(</span><span class="n">t2</span><span class="p">(</span><span class="n">rf</span><span class="p">))</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<p>but exactly the type of <code>Transducer</code> thus produced is not just unclear but, as yet, inexpressible.
Somehow, its type will need to embed <code>S</code> and <code>T</code> in a manner that lets us deduce the
proper wrapped return type for <code>apply</code>. Furthermore, this should work no matter how many compositions
we put together.</p>
<h4>Lists of Types</h4>
<p>Having poked around in the <a href="https://github.com/milessabin/shapeless">shapeless</a> codebase, we are... well, overall, we
are sad and confused, because it doesn't seem possible to ever be smart enough to understand this. But, we can
sort of pick up the idea that it's possible to build data structures with types:</p>
<div class="highlight"><pre><span></span><code> <span class="k">sealed</span> <span class="k">trait</span> <span class="nc">SList</span>
<span class="k">sealed</span> <span class="k">class</span> <span class="nc">SCons</span><span class="p">[</span><span class="nc">S</span><span class="p">,</span><span class="nc">T</span><span class="o"><:</span><span class="nc">SList</span><span class="p">]</span> <span class="k">extends</span> <span class="nc">SList</span>
<span class="k">sealed</span> <span class="k">class</span> <span class="nc">SNil</span> <span class="k">extends</span> <span class="nc">SList</span>
</code></pre></div>
<p>We can even write little programs, by declaring parameterized subtypes within the traits
and having them "call" each other recursively:</p>
<div class="highlight"><pre><span></span><code> <span class="k">sealed</span> <span class="k">trait</span> <span class="nc">SList</span> <span class="p">{</span>
<span class="cm">/* abstract */</span> <span class="k">type</span> <span class="nc">Concat</span><span class="p">[</span><span class="nc">S2</span> <span class="o"><:</span> <span class="nc">SList</span><span class="p">]</span> <span class="o"><:</span> <span class="nc">SList</span>
<span class="p">}</span>
<span class="k">sealed</span> <span class="k">class</span> <span class="nc">SCons</span><span class="p">[</span><span class="nc">S</span><span class="p">,</span><span class="nc">T</span><span class="o"><:</span><span class="nc">SList</span><span class="p">]</span> <span class="k">extends</span> <span class="nc">SList</span> <span class="p">{</span>
<span class="k">type</span> <span class="nc">Concat</span><span class="p">[</span><span class="nc">S2</span> <span class="o"><:</span> <span class="nc">SList</span><span class="p">]</span> <span class="o">=</span> <span class="nc">SCons</span><span class="p">[</span><span class="nc">S</span><span class="p">,</span><span class="nc">T</span><span class="n">#</span><span class="nc">Concat</span><span class="p">[</span><span class="nc">S2</span><span class="p">]]</span>
<span class="p">}</span>
<span class="k">sealed</span> <span class="k">class</span> <span class="nc">SNil</span> <span class="k">extends</span> <span class="nc">SList</span> <span class="p">{</span>
<span class="k">type</span> <span class="nc">Concat</span><span class="p">[</span><span class="nc">S2</span> <span class="o"><:</span> <span class="nc">SList</span><span class="p">]</span> <span class="o">=</span> <span class="nc">S2</span>
<span class="p">}</span>
</code></pre></div>
<p>As long as all the types are known, the compiler will merrily expand <code>SCons#Concat</code> recursively,
stopping when it hits an <code>SNil</code>:</p>
<div class="highlight"><pre><span></span><code> <span class="k">type</span> <span class="nc">S1</span> <span class="o">=</span> <span class="nc">SCons</span><span class="p">[</span><span class="nc">Int</span><span class="p">,</span><span class="nc">SNil</span><span class="p">]</span>
<span class="k">type</span> <span class="nc">S2</span> <span class="o">=</span> <span class="nc">SCons</span><span class="p">[</span><span class="nc">Double</span><span class="p">,</span><span class="nc">SCons</span><span class="p">[</span><span class="nc">String</span><span class="p">,</span><span class="nc">SNil</span><span class="p">]]</span>
<span class="k">type</span> <span class="nc">SS</span> <span class="o">=</span> <span class="nc">S2</span><span class="n">#</span><span class="nc">Concat</span><span class="p">[</span><span class="nc">S1</span><span class="p">]</span>
<span class="k">def</span> <span class="nf">x0</span> <span class="p">:</span> <span class="nc">SS</span> <span class="o">=</span> <span class="kc">null</span><span class="p">.</span><span class="k">asInstanceOf</span><span class="p">[</span><span class="nc">SCons</span><span class="p">[</span><span class="nc">Double</span><span class="p">,</span><span class="nc">SCons</span><span class="p">[</span><span class="nc">String</span><span class="p">,</span> <span class="nc">SCons</span><span class="p">[</span><span class="nc">Int</span><span class="p">,</span><span class="nc">SNil</span><span class="p">]]]]</span>
</code></pre></div>
<p>(The last line shows a handy trick for verifying the resolution of complicated types. If the
cast didn't match the declaration, it wouldn't have compiled. There's a better trick, which
we'll use later.)</p>
<p>So now we have half our answer. If we parameterize <code>Transducer</code> with <code>SList</code>, then
<code>compose</code> can simply return the concatenation:</p>
<div class="highlight"><pre><span></span><code> <span class="k">trait</span> <span class="nc">Transducer</span><span class="p">[</span><span class="o">+</span><span class="nc">A</span><span class="p">,</span> <span class="o">-</span><span class="nc">B</span><span class="p">,</span> <span class="nc">S</span> <span class="o"><:</span> <span class="nc">SList</span><span class="p">]</span> <span class="p">{</span>
<span class="k">def</span> <span class="nf">compose</span><span class="p">[</span><span class="nc">C</span><span class="p">,</span><span class="nc">T</span> <span class="o"><:</span> <span class="nc">SList</span><span class="p">](</span><span class="n">t2</span> <span class="p">:</span> <span class="nc">Transducer</span><span class="p">[</span><span class="nc">C</span><span class="p">,</span><span class="nc">A</span><span class="p">,</span><span class="nc">T</span><span class="p">])</span> <span class="p">:</span> <span class="nc">Transducer</span><span class="p">[</span><span class="nc">C</span><span class="p">,</span><span class="nc">B</span><span class="p">,</span><span class="nc">T</span><span class="n">#</span><span class="nc">Concat</span><span class="p">[</span><span class="nc">S</span><span class="p">]]</span> <span class="o">=</span> <span class="p">{</span>
<span class="p">...}}</span>
</code></pre></div>
<p>Now we have to figure out what the various <code>apply</code> methods return. Can we infer the proper
<code>RH</code> nesting from the <code>SList</code>? We can define another subtype to do the wrapping,</p>
<div class="highlight"><pre><span></span><code> <span class="k">sealed</span> <span class="k">class</span> <span class="nc">SCons</span><span class="p">[</span><span class="nc">S</span><span class="p">,</span><span class="nc">T</span><span class="o"><:</span><span class="nc">SList</span><span class="p">]</span> <span class="k">extends</span> <span class="nc">SList</span> <span class="p">{</span>
<span class="k">type</span> <span class="nc">Wrapped</span><span class="p">[</span><span class="nc">R</span><span class="p">]</span> <span class="o">=</span> <span class="nc">RH</span><span class="p">[</span><span class="nc">T</span><span class="n">#</span><span class="nc">Wrapped</span><span class="p">[</span><span class="nc">R</span><span class="p">],</span><span class="nc">S</span><span class="p">]</span>
<span class="k">type</span> <span class="nc">Concat</span><span class="p">[</span><span class="nc">S2</span> <span class="o"><:</span> <span class="nc">SList</span><span class="p">]</span> <span class="o">=</span> <span class="nc">SCons</span><span class="p">[</span><span class="nc">S</span><span class="p">,</span><span class="nc">T</span><span class="n">#</span><span class="nc">Concat</span><span class="p">[</span><span class="nc">S2</span><span class="p">]]</span>
<span class="p">}</span>
<span class="k">sealed</span> <span class="k">class</span> <span class="nc">SNil</span> <span class="k">extends</span> <span class="nc">SList</span> <span class="p">{</span>
<span class="k">type</span> <span class="nc">Wrapped</span><span class="p">[</span><span class="nc">R</span><span class="p">]</span> <span class="o">=</span> <span class="nc">R</span>
<span class="k">type</span> <span class="nc">Concat</span><span class="p">[</span><span class="nc">S2</span> <span class="o"><:</span> <span class="nc">SList</span><span class="p">]</span> <span class="o">=</span> <span class="nc">S2</span>
<span class="p">}</span>
</code></pre></div>
<p>so <code>SCons[S1,SCons[S2,SCons[S3,SNil]]]#Wrapped[R]</code> expands to</p>
<ol>
<li><code>RH[SCons[S2,SCons[S3,SNil]]#Wrapped[R], S1]</code> to</li>
<li><code>RH[RH[SCons[S3,SNil]#Wrapped[R],S2],S1]</code> to</li>
<li><code>RH[RH[RH[SNil#Wrapped[R],S3],S2],S1]</code> to</li>
<li><code>RH[RH[RH[R,S3],S2],S1]</code></li>
</ol>
<h4>Building the state-aware Transducer</h4>
<p>Thus armed, we can now write out the compose method, breaking it up to reassure ourselves that
types line up:</p>
<div class="highlight"><pre><span></span><code> <span class="k">trait</span> <span class="nc">Transducer</span><span class="p">[</span><span class="o">+</span><span class="nc">A</span><span class="p">,</span> <span class="o">-</span><span class="nc">B</span><span class="p">,</span> <span class="nc">S</span> <span class="o"><:</span> <span class="nc">SList</span><span class="p">]</span> <span class="p">{</span>
<span class="k">def</span> <span class="nf">apply</span><span class="p">[</span><span class="nc">R</span><span class="p">](</span><span class="n">rf</span> <span class="p">:</span> <span class="nc">RFn</span><span class="p">[</span><span class="nc">A</span><span class="p">,</span><span class="nc">R</span><span class="p">])</span> <span class="p">:</span> <span class="nc">RFn</span> <span class="p">[</span><span class="nc">B</span><span class="p">,</span><span class="nc">S</span><span class="n">#</span><span class="nc">Wrapped</span><span class="p">[</span><span class="nc">R</span><span class="p">]]</span>
<span class="k">def</span> <span class="nf">compose</span><span class="p">[</span><span class="nc">C</span><span class="p">,</span><span class="nc">T</span> <span class="o"><:</span> <span class="nc">SList</span><span class="p">](</span><span class="n">t2</span> <span class="p">:</span> <span class="nc">Transducer</span><span class="p">[</span><span class="nc">C</span><span class="p">,</span><span class="nc">A</span><span class="p">,</span><span class="nc">T</span><span class="p">])</span> <span class="p">:</span> <span class="nc">Transducer</span><span class="p">[</span><span class="nc">C</span><span class="p">,</span><span class="nc">B</span><span class="p">,</span><span class="nc">T</span><span class="n">#</span><span class="nc">Concat</span><span class="p">[</span><span class="nc">S</span><span class="p">]]</span> <span class="o">=</span> <span class="p">{</span>
<span class="kd">val</span> <span class="n">t1</span> <span class="o">=</span> <span class="bp">this</span>
<span class="k">new</span> <span class="nc">Transducer</span><span class="p">[</span><span class="nc">C</span><span class="p">,</span><span class="nc">B</span><span class="p">,</span><span class="nc">T</span><span class="n">#</span><span class="nc">Concat</span><span class="p">[</span><span class="nc">S</span><span class="p">]]</span> <span class="p">{</span>
<span class="k">def</span> <span class="nf">apply</span><span class="p">[</span><span class="nc">R</span><span class="p">](</span><span class="n">rf</span><span class="p">:</span> <span class="nc">RFn</span><span class="p">[</span><span class="nc">C</span><span class="p">,</span><span class="nc">R</span><span class="p">])</span> <span class="p">:</span> <span class="nc">RFn</span><span class="p">[</span><span class="nc">B</span><span class="p">,(</span><span class="nc">T</span><span class="n">#</span><span class="nc">Concat</span><span class="p">[</span><span class="nc">S</span><span class="p">])</span><span class="n">#</span><span class="nc">Wrapped</span><span class="p">[</span><span class="nc">R</span><span class="p">]]</span> <span class="o">=</span> <span class="p">{</span>
<span class="kd">val</span> <span class="n">rf2</span> <span class="p">:</span> <span class="nc">RFn</span><span class="p">[</span><span class="nc">A</span><span class="p">,</span><span class="nc">T</span><span class="n">#</span><span class="nc">Wrapped</span><span class="p">[</span><span class="nc">R</span><span class="p">]]</span> <span class="o">=</span> <span class="n">t2</span><span class="p">(</span><span class="n">rf</span><span class="p">)</span>
<span class="kd">val</span> <span class="n">rf3</span> <span class="p">:</span> <span class="nc">RFn</span><span class="p">[</span><span class="nc">B</span><span class="p">,</span> <span class="nc">S</span><span class="n">#</span><span class="nc">Wrapped</span><span class="p">[</span><span class="nc">T</span><span class="n">#</span><span class="nc">Wrapped</span><span class="p">[</span><span class="nc">R</span><span class="p">]]]</span> <span class="o">=</span> <span class="n">t1</span><span class="p">(</span><span class="n">rf2</span><span class="p">)</span>
<span class="n">rf3</span><span class="p">.</span><span class="k">asInstanceOf</span><span class="p">[</span><span class="nc">RFn</span><span class="p">[</span><span class="nc">B</span><span class="p">,</span><span class="nc">T</span><span class="n">#</span><span class="nc">Concat</span><span class="p">[</span><span class="nc">S</span><span class="p">]</span><span class="n">#</span><span class="nc">Wrapped</span><span class="p">[</span><span class="nc">R</span><span class="p">]]]</span>
<span class="p">}}</span>
<span class="p">}</span>
</code></pre></div>
<p>That reassurance stops at the last line: equivalence of <code>T#Concat[S]#Wrapped[R]</code> and <code>S#Wrapped[T#Wrapped[R]]</code> seems to
be beyond the capability of the type engine to prove. Given concrete types for <code>R</code>, <code>S</code>
and <code>T</code>, it could expand both expressions to the same nest <code>RH</code>s, but, here in the depths of our transducer,
it doesn't know the concrete types.</p>
<p>They are, however, generally known at the time that <code>compose</code> is called, where
can force the compiler to check our math by using a type constraint.
We'll add an implicit instance of the <code>=:=</code> class to the <code>compose</code> declaration,
along the lines of.</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="k">implicit</span> <span class="n">ok</span> <span class="p">:</span> <span class="nc">T</span><span class="n">#</span><span class="nc">Concat</span><span class="p">[</span><span class="nc">S</span><span class="p">]</span><span class="n">#</span><span class="nc">Wrapped</span><span class="p">[</span><span class="nc">R</span><span class="p">]</span> <span class="o">=:=</span> <span class="nc">S</span><span class="n">#</span><span class="nc">Wrapped</span><span class="p">[</span><span class="nc">T</span><span class="n">#</span><span class="nc">Wrapped</span><span class="p">[</span><span class="nc">R</span><span class="p">]])</span>
</code></pre></div>
<p>which is especially good fun as we can use infix form. The compiler will now
alert us if these two types for some reason aren't identical in a particular concrete
instance.</p>
<p>Of course, we won't know <code>R</code> until <code>apply</code> time, but our primary concern is that all the
fiddly wrapping works out, which is really independent of <code>R</code>, so we'll use <code>Any</code> instead.</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">compose</span><span class="p">[</span><span class="nc">C</span><span class="p">,</span><span class="nc">T</span> <span class="o"><:</span> <span class="nc">SList</span><span class="p">](</span><span class="n">t2</span> <span class="p">:</span> <span class="nc">Transducer</span><span class="p">[</span><span class="nc">C</span><span class="p">,</span><span class="nc">A</span><span class="p">,</span><span class="nc">T</span><span class="p">])</span>
<span class="p">(</span><span class="k">implicit</span> <span class="n">ok</span> <span class="p">:</span> <span class="nc">T</span><span class="n">#</span><span class="nc">Concat</span><span class="p">[</span><span class="nc">S</span><span class="p">]</span><span class="n">#</span><span class="nc">Wrapped</span><span class="p">[</span><span class="nc">Any</span><span class="p">]</span> <span class="o">=:=</span> <span class="nc">S</span><span class="n">#</span><span class="nc">Wrapped</span><span class="p">[</span><span class="nc">T</span><span class="n">#</span><span class="nc">Wrapped</span><span class="p">[</span><span class="nc">Any</span><span class="p">]])</span> <span class="p">:</span> <span class="nc">Transducer</span><span class="p">[</span><span class="nc">C</span><span class="p">,</span><span class="nc">B</span><span class="p">,</span><span class="nc">T</span><span class="n">#</span><span class="nc">Concat</span><span class="p">[</span><span class="nc">S</span><span class="p">]]</span> <span class="o">=</span> <span class="p">{</span>
<span class="kd">val</span> <span class="n">t1</span> <span class="o">=</span> <span class="bp">this</span>
<span class="k">new</span> <span class="nc">Transducer</span><span class="p">[</span><span class="nc">C</span><span class="p">,</span><span class="nc">B</span><span class="p">,</span><span class="nc">T</span><span class="n">#</span><span class="nc">Concat</span><span class="p">[</span><span class="nc">S</span><span class="p">]]</span> <span class="p">{</span>
<span class="k">def</span> <span class="nf">apply</span><span class="p">[</span><span class="nc">R</span><span class="p">](</span><span class="n">rf</span><span class="p">:</span> <span class="nc">RFn</span><span class="p">[</span><span class="nc">C</span><span class="p">,</span><span class="nc">R</span><span class="p">])</span> <span class="p">:</span> <span class="nc">RFn</span><span class="p">[</span><span class="nc">B</span><span class="p">,(</span><span class="nc">T</span><span class="n">#</span><span class="nc">Concat</span><span class="p">[</span><span class="nc">S</span><span class="p">])</span><span class="n">#</span><span class="nc">Wrapped</span><span class="p">[</span><span class="nc">R</span><span class="p">]]</span> <span class="o">=</span>
<span class="n">t1</span><span class="p">(</span><span class="n">t2</span><span class="p">(</span><span class="n">rf</span><span class="p">)).</span><span class="k">asInstanceOf</span><span class="p">[</span><span class="nc">RFn</span><span class="p">[</span><span class="nc">B</span><span class="p">,</span><span class="nc">T</span><span class="n">#</span><span class="nc">Concat</span><span class="p">[</span><span class="nc">S</span><span class="p">]</span><span class="n">#</span><span class="nc">Wrapped</span><span class="p">[</span><span class="nc">R</span><span class="p">]]]</span>
<span class="p">}}</span>
<span class="k">def</span> <span class="nf">∘</span><span class="p">[</span><span class="nc">C</span><span class="p">,</span><span class="nc">T</span> <span class="o"><:</span> <span class="nc">SList</span><span class="p">](</span><span class="n">t2</span> <span class="p">:</span> <span class="nc">Transducer</span><span class="p">[</span><span class="nc">C</span><span class="p">,</span><span class="nc">A</span><span class="p">,</span><span class="nc">T</span><span class="p">])(</span><span class="k">implicit</span> <span class="n">ok</span> <span class="p">:</span> <span class="nc">T</span><span class="n">#</span><span class="nc">Concat</span><span class="p">[</span><span class="nc">S</span><span class="p">]</span><span class="n">#</span><span class="nc">Wrapped</span><span class="p">[</span><span class="nc">Any</span><span class="p">]</span> <span class="o">=:=</span> <span class="nc">S</span><span class="n">#</span><span class="nc">Wrapped</span><span class="p">[</span><span class="nc">T</span><span class="n">#</span><span class="nc">Wrapped</span><span class="p">[</span><span class="nc">Any</span><span class="p">]])</span> <span class="o">=</span> <span class="n">compose</span><span class="p">[</span><span class="nc">C</span><span class="p">,</span><span class="nc">T</span><span class="p">]</span> <span class="p">(</span><span class="n">t2</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">⟐</span><span class="p">[</span><span class="nc">R</span><span class="p">]</span> <span class="o">=</span> <span class="n">apply</span><span class="p">[</span><span class="nc">R</span><span class="p">]</span> <span class="n">_</span>
</code></pre></div>
<p>The cast is still necessary, but it feels a bit safer now.
(If anyone knows how to avoid the cast entirely, I'd love to hear it...)</p>
<p>We can now write a new deduplicator, totally free of <code>var</code>s and side-effects,
that passes along state to itself by
wrapping it up with the succeeding reduction value and declares its
intention to do so with the wrapping directive type <code>SCons[A,SNil]</code>:</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">t_dedup</span><span class="p">[</span><span class="nc">A</span><span class="p">]</span> <span class="o">=</span> <span class="k">new</span> <span class="nc">Transducer</span><span class="p">[</span><span class="nc">A</span><span class="p">,</span><span class="nc">A</span><span class="p">,</span><span class="nc">SCons</span><span class="p">[</span><span class="nc">A</span><span class="p">,</span><span class="nc">SNil</span><span class="p">]]</span> <span class="p">{</span>
<span class="k">def</span> <span class="nf">apply</span><span class="p">[</span><span class="nc">R</span><span class="p">](</span><span class="n">rf</span> <span class="p">:</span> <span class="nc">RFn</span><span class="p">[</span><span class="nc">A</span><span class="p">,</span><span class="nc">R</span><span class="p">])</span> <span class="p">:</span> <span class="nc">RFn</span><span class="p">[</span><span class="nc">A</span><span class="p">,</span><span class="nc">RH</span><span class="p">[</span><span class="nc">R</span><span class="p">,</span><span class="nc">A</span><span class="p">]]</span> <span class="o">=</span> <span class="p">{(</span><span class="n">rw</span><span class="p">,</span><span class="n">a</span><span class="p">)</span> <span class="o">=></span>
<span class="n">rw</span> <span class="k">match</span> <span class="p">{</span>
<span class="k">case</span> <span class="nc">RH</span><span class="p">(</span><span class="n">r</span><span class="p">,</span><span class="nc">None</span><span class="p">)</span> <span class="o">=></span> <span class="nc">RH</span><span class="p">(</span><span class="n">rf</span><span class="p">(</span><span class="n">r</span><span class="p">,</span><span class="n">a</span><span class="p">),</span><span class="nc">Some</span><span class="p">(</span><span class="n">a</span><span class="p">))</span>
<span class="k">case</span> <span class="nc">RH</span><span class="p">(</span><span class="n">r</span><span class="p">,</span><span class="nc">Some</span><span class="p">(</span><span class="n">a0</span><span class="p">))</span> <span class="o">=></span> <span class="k">if</span> <span class="p">(</span><span class="n">a</span><span class="o">==</span><span class="n">a0</span><span class="p">)</span> <span class="n">rw</span> <span class="k">else</span> <span class="nc">RH</span><span class="p">(</span><span class="n">rf</span><span class="p">(</span><span class="n">r</span><span class="p">,</span><span class="n">a</span><span class="p">),</span><span class="nc">Some</span><span class="p">(</span><span class="n">a</span><span class="p">))</span>
<span class="p">}</span>
<span class="p">}}</span>
</code></pre></div>
<p>The standard factories for simple mappings and filterings are essentially
identical to the state-unaware versions, because <code>SNil::Wrapped[R]</code> is just <code>R</code>:</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">map</span><span class="p">[</span><span class="nc">A</span><span class="p">,</span> <span class="nc">B</span><span class="p">](</span><span class="n">f</span><span class="p">:</span> <span class="nc">A</span> <span class="o">=></span> <span class="nc">B</span><span class="p">):</span> <span class="nc">Transducer</span><span class="p">[</span><span class="nc">B</span><span class="p">,</span> <span class="nc">A</span><span class="p">,</span> <span class="nc">SNil</span><span class="p">]</span> <span class="o">=</span> <span class="k">new</span> <span class="nc">Transducer</span><span class="p">[</span><span class="nc">B</span><span class="p">,</span> <span class="nc">A</span><span class="p">,</span> <span class="nc">SNil</span><span class="p">]</span> <span class="p">{</span>
<span class="k">def</span> <span class="nf">apply</span><span class="p">[</span><span class="nc">R</span><span class="p">](</span><span class="n">rf</span><span class="p">:</span> <span class="nc">RFn</span><span class="p">[</span><span class="nc">B</span><span class="p">,</span> <span class="nc">R</span><span class="p">])</span> <span class="p">:</span> <span class="nc">RFn</span><span class="p">[</span><span class="nc">A</span><span class="p">,</span><span class="nc">R</span><span class="p">]</span><span class="o">=</span> <span class="p">(</span><span class="n">r</span><span class="p">,</span> <span class="n">a</span><span class="p">)</span> <span class="o">=></span> <span class="n">rf</span><span class="p">(</span><span class="n">r</span><span class="p">,</span><span class="n">f</span><span class="p">(</span><span class="n">a</span><span class="p">))</span> <span class="p">}</span>
<span class="k">def</span> <span class="nf">filter</span><span class="p">[</span><span class="nc">A</span><span class="p">](</span><span class="n">p</span><span class="p">:</span> <span class="nc">A</span> <span class="o">=></span> <span class="nc">Boolean</span><span class="p">):</span> <span class="nc">Transducer</span><span class="p">[</span><span class="nc">A</span><span class="p">,</span> <span class="nc">A</span><span class="p">,</span> <span class="nc">SNil</span><span class="p">]</span> <span class="o">=</span> <span class="k">new</span> <span class="nc">Transducer</span><span class="p">[</span><span class="nc">A</span><span class="p">,</span> <span class="nc">A</span><span class="p">,</span> <span class="nc">SNil</span><span class="p">]</span> <span class="p">{</span>
<span class="k">def</span> <span class="nf">apply</span><span class="p">[</span><span class="nc">R</span><span class="p">](</span><span class="n">rf</span><span class="p">:</span> <span class="nc">RFn</span><span class="p">[</span><span class="nc">A</span><span class="p">,</span> <span class="nc">R</span><span class="p">])</span> <span class="p">:</span> <span class="nc">RFn</span><span class="p">[</span><span class="nc">A</span><span class="p">,</span><span class="nc">R</span><span class="p">]</span><span class="o">=</span> <span class="p">(</span><span class="n">r</span><span class="p">,</span> <span class="n">a</span><span class="p">)</span> <span class="o">=></span> <span class="k">if</span> <span class="p">(</span><span class="n">p</span><span class="p">(</span><span class="n">a</span><span class="p">))</span> <span class="n">rf</span><span class="p">(</span><span class="n">r</span><span class="p">,</span><span class="n">a</span><span class="p">)</span> <span class="k">else</span> <span class="n">r</span> <span class="p">}</span>
</code></pre></div>
<h4>The state of statelessness</h4>
<p>The last, somewhat annoying, piece we're going to need is a way to wrap up the initial
reduction value in layers of <code>RH(...,None)</code> and then to unwrap the final reduction.</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">wrap</span><span class="p">[</span><span class="nc">R</span><span class="p">,</span><span class="nc">S</span><span class="o"><:</span><span class="nc">SList</span><span class="p">](</span><span class="n">r</span><span class="p">:</span><span class="nc">R</span><span class="p">):</span> <span class="nc">S</span><span class="n">#</span><span class="nc">Wrapped</span><span class="p">[</span><span class="nc">R</span><span class="p">]</span>
<span class="k">def</span> <span class="nf">unwrap</span><span class="p">[</span><span class="nc">R</span><span class="p">,</span><span class="nc">S</span><span class="o"><:</span><span class="nc">SList</span><span class="p">](</span><span class="n">fr</span> <span class="p">:</span> <span class="nc">S</span><span class="n">#</span><span class="nc">Wrapped</span><span class="p">[</span><span class="nc">R</span><span class="p">])</span> <span class="p">:</span> <span class="nc">R</span>
</code></pre></div>
<p>The reason these are annoying is that the object wrapping naturally follows the
type wrapping, but we don't have a particularly beautiful way of accessing the wrapped
type at runtime. Instead, I think, we have to use the unbeautiful workaround of manifests,
which are always introduced with the warning that they're experimental and might be
removed from Scala some day but for now you should look at the source file.</p>
<p>If we add to <code>wrap</code> an <code>(implicit ms : Manifest[S])</code>
then an instance of <code>ms</code> will be available at runtime, built according
to a recipe captured at the time of compilation. From the manifest, you
can extract quit a lot of information, but for our purposes, the most important
bit is that <code>typeArguments</code> will return a list of manifests corresponding
to type parameters.</p>
<p>We expect <code>Manifest[SCons]</code> to have two type parameters, of which the second
is <code>Manifest[SList]</code>, and <code>Manifest[SNil]</code> to have no type parameters.
When the following function is called as <code>sdepth[S]</code>, it will implicitly receive
a complete manifest of the nested <code>SList</code>; then it will call itself recursively
with successively peeled back manifests, counting up the number of layers:</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">sdepth</span><span class="p">[</span><span class="nc">S</span> <span class="o"><:</span> <span class="nc">SList</span><span class="p">](</span><span class="k">implicit</span> <span class="n">ms</span> <span class="p">:</span> <span class="nc">Manifest</span><span class="p">[</span><span class="nc">S</span><span class="p">])</span> <span class="p">:</span> <span class="nc">Int</span> <span class="o">=</span> <span class="n">ms</span><span class="p">.</span><span class="n">typeArguments</span> <span class="k">match</span> <span class="p">{</span>
<span class="k">case</span> <span class="nc">Nil</span> <span class="o">=></span> <span class="mi">0</span>
<span class="k">case</span> <span class="n">_</span> <span class="o">=></span> <span class="n">sdepth</span><span class="p">(</span><span class="n">ms</span><span class="p">.</span><span class="n">typeArguments</span><span class="p">(</span><span class="mi">1</span><span class="p">).</span><span class="k">asInstanceOf</span><span class="p">[</span><span class="nc">Manifest</span><span class="p">[</span><span class="nc">S</span><span class="p">]])</span> <span class="o">+</span> <span class="mi">1</span>
<span class="p">}</span>
</code></pre></div>
<p>We can now use <code>sdepth</code> to direct the wrapping and unwrapping festivities. There
are still casts that I wish we could get rid of, but I don't see how:</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">wrap</span><span class="p">[</span><span class="nc">R</span><span class="p">,</span><span class="nc">S</span><span class="o"><:</span><span class="nc">SList</span><span class="p">](</span><span class="n">r</span><span class="p">:</span><span class="nc">R</span><span class="p">)(</span><span class="k">implicit</span> <span class="n">ms</span> <span class="p">:</span> <span class="nc">Manifest</span><span class="p">[</span><span class="nc">S</span><span class="p">]):</span> <span class="nc">S</span><span class="n">#</span><span class="nc">Wrapped</span><span class="p">[</span><span class="nc">R</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span>
<span class="n">sdepth</span><span class="p">[</span><span class="nc">S</span><span class="p">]</span> <span class="k">match</span> <span class="p">{</span>
<span class="k">case</span> <span class="mi">0</span> <span class="o">=></span> <span class="n">r</span><span class="p">.</span><span class="k">asInstanceOf</span><span class="p">[</span><span class="nc">S</span><span class="n">#</span><span class="nc">Wrapped</span><span class="p">[</span><span class="nc">R</span><span class="p">]]</span>
<span class="k">case</span> <span class="n">n</span> <span class="o">=></span> <span class="p">{</span>
<span class="kd">var</span> <span class="n">x</span> <span class="p">:</span> <span class="nc">Object</span> <span class="o">=</span> <span class="nc">RH</span><span class="p">(</span><span class="n">r</span><span class="p">,</span><span class="nc">None</span><span class="p">)</span>
<span class="k">for</span><span class="p">(</span><span class="n">i</span> <span class="o"><-</span> <span class="mi">1</span> <span class="n">until</span> <span class="n">n</span><span class="p">)</span> <span class="p">{</span><span class="n">x</span> <span class="o">=</span> <span class="nc">RH</span><span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="nc">None</span><span class="p">)}</span>
<span class="n">x</span><span class="p">.</span><span class="k">asInstanceOf</span><span class="p">[</span><span class="nc">S</span><span class="n">#</span><span class="nc">Wrapped</span><span class="p">[</span><span class="nc">R</span><span class="p">]]</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">def</span> <span class="nf">unwrap</span><span class="p">[</span><span class="nc">R</span><span class="p">,</span><span class="nc">S</span><span class="o"><:</span><span class="nc">SList</span><span class="p">](</span><span class="n">fd</span> <span class="p">:</span> <span class="nc">S</span><span class="n">#</span><span class="nc">Wrapped</span><span class="p">[</span><span class="nc">R</span><span class="p">])(</span><span class="k">implicit</span> <span class="n">ms</span> <span class="p">:</span> <span class="nc">Manifest</span><span class="p">[</span><span class="nc">S</span><span class="p">])</span> <span class="p">:</span> <span class="nc">R</span> <span class="o">=</span> <span class="p">{</span>
<span class="kd">var</span> <span class="n">x</span> <span class="p">:</span> <span class="nc">Any</span> <span class="o">=</span> <span class="n">fd</span>
<span class="k">for</span><span class="p">(</span><span class="n">i</span> <span class="o"><-</span> <span class="mi">1</span> <span class="n">to</span> <span class="n">sdepth</span><span class="p">[</span><span class="nc">S</span><span class="p">])</span> <span class="p">{</span><span class="n">x</span> <span class="o">=</span> <span class="n">x</span><span class="p">.</span><span class="k">asInstanceOf</span><span class="p">[</span><span class="nc">RH</span><span class="p">[</span><span class="nc">AnyRef</span><span class="p">,</span><span class="nc">AnyRef</span><span class="p">]].</span><span class="n">r</span><span class="p">}</span>
<span class="n">x</span><span class="p">.</span><span class="k">asInstanceOf</span><span class="p">[</span><span class="nc">R</span><span class="p">]</span>
<span class="p">}</span>
</code></pre></div>
<p>The standard <code>sequence</code> function is now a one-liner (assuming that type declarations
don't count, which would in fact be a bit silly in discussion of type programming):</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">sequence</span><span class="p">[</span><span class="nc">A</span><span class="p">,</span><span class="nc">B</span><span class="p">,</span><span class="nc">S</span> <span class="o"><:</span> <span class="nc">SList</span> <span class="p">:</span> <span class="nc">Manifest</span><span class="p">](</span><span class="n">tr</span><span class="p">:</span> <span class="nc">Transducer</span><span class="p">[</span><span class="nc">B</span><span class="p">,</span><span class="nc">A</span><span class="p">,</span><span class="nc">S</span><span class="p">],</span> <span class="n">data</span><span class="p">:</span> <span class="nc">Seq</span><span class="p">[</span><span class="nc">A</span><span class="p">])</span> <span class="p">:</span> <span class="nc">Seq</span><span class="p">[</span><span class="nc">B</span><span class="p">]</span> <span class="o">=</span>
<span class="n">unwrap</span> <span class="p">(</span><span class="n">data</span><span class="p">.</span><span class="n">foldLeft</span><span class="p">(</span><span class="n">wrap</span><span class="p">(</span><span class="n">data</span><span class="p">.</span><span class="n">companion</span><span class="p">.</span><span class="n">empty</span><span class="p">[</span><span class="nc">B</span><span class="p">]))(</span><span class="n">tr</span> <span class="p">{</span><span class="n">_</span> <span class="o">:+</span> <span class="n">_</span><span class="p">}))</span>
</code></pre></div>
<p>It does quite a lot:</p>
<ol>
<li>It initializes an empty <code>Seq[B]</code> of the same container type as the input <code>Seq[A]</code>,</li>
<li><code>wrap</code>s it up as an <code>S#Wrapped[Seq[B]]</code></li>
<li>transduces the <code>(Seq[B],B) => Seq[B])</code> append function <code>_ :+ _</code></li>
<li>into a function <code>(S#Wrapped[Seq[B],A) => S#Wrapped[Seq[B]</code></li>
<li>reduces (folds) into an <code>S#Wrapped[Seq[B]]</code>, which it</li>
<li>unwraps it into a <code>Seq[B]</code>.</li>
</ol>
<p>Along the same lines, it will be handy to have a state-aware <code>foldLeft</code> to do
the same sort of prep work for general reductions:</p>
<div class="highlight"><pre><span></span><code> <span class="k">class</span> <span class="nc">FoldableSA</span><span class="p">[</span><span class="nc">A</span><span class="p">](</span><span class="n">as</span><span class="p">:</span><span class="nc">Seq</span><span class="p">[</span><span class="nc">A</span><span class="p">])</span> <span class="p">{</span>
<span class="k">def</span> <span class="nf">foldLeftSA</span><span class="p">[</span><span class="nc">B</span><span class="p">,</span><span class="nc">S</span><span class="o"><:</span><span class="nc">SList</span><span class="p">:</span><span class="nc">Manifest</span><span class="p">](</span><span class="n">z</span><span class="p">:</span><span class="nc">B</span><span class="p">)(</span><span class="n">rf</span><span class="p">:</span> <span class="nc">RFn</span><span class="p">[</span><span class="nc">A</span><span class="p">,</span> <span class="nc">S</span><span class="n">#</span><span class="nc">Wrapped</span><span class="p">[</span><span class="nc">B</span><span class="p">]])</span> <span class="o">=</span>
<span class="n">unwrap</span><span class="p">(</span><span class="n">as</span><span class="p">.</span><span class="n">foldLeft</span><span class="p">(</span><span class="n">wrap</span><span class="p">(</span><span class="n">z</span><span class="p">))(</span><span class="n">rf</span><span class="p">))</span>
<span class="p">}</span>
<span class="k">implicit</span> <span class="k">def</span> <span class="nf">toFoldableSA</span><span class="p">[</span><span class="nc">A</span><span class="p">](</span><span class="n">as</span><span class="p">:</span><span class="nc">Seq</span><span class="p">[</span><span class="nc">A</span><span class="p">])</span> <span class="p">:</span> <span class="nc">FoldableSA</span><span class="p">[</span><span class="nc">A</span><span class="p">]</span> <span class="o">=</span> <span class="k">new</span> <span class="nc">FoldableSA</span><span class="p">[</span><span class="nc">A</span><span class="p">](</span><span class="n">as</span><span class="p">)</span>
</code></pre></div>
<h4>Rock and Roll</h4>
<p>Let's verify that we don't have the same hidden state problems as the previous implementations:</p>
<div class="highlight"><pre><span></span><code> <span class="kd">val</span> <span class="n">t_parsei</span><span class="p">:</span> <span class="nc">Transducer</span><span class="p">[</span><span class="nc">Int</span><span class="p">,</span> <span class="nc">String</span><span class="p">,</span> <span class="nc">SNil</span><span class="p">]</span> <span class="o">=</span> <span class="n">map</span><span class="p">(</span><span class="n">_</span><span class="p">.</span><span class="n">toInt</span><span class="p">)</span>
<span class="kd">val</span> <span class="n">t_root2</span> <span class="p">:</span> <span class="nc">Transducer</span><span class="p">[</span><span class="nc">Double</span><span class="p">,</span> <span class="nc">Int</span><span class="p">,</span> <span class="nc">SNil</span><span class="p">]</span> <span class="o">=</span> <span class="n">map</span><span class="p">(</span><span class="n">i</span> <span class="o">=></span> <span class="nc">Math</span><span class="p">.</span><span class="n">pow</span><span class="p">(</span><span class="mf">2.0</span><span class="p">,</span> <span class="mf">1.0</span> <span class="o">/</span> <span class="n">i</span><span class="p">))</span>
<span class="kd">val</span> <span class="n">t</span> <span class="o">=</span> <span class="n">t_parsei</span> <span class="n">∘</span> <span class="n">t_dedup</span><span class="p">[</span><span class="nc">Int</span><span class="p">]</span>
<span class="kd">val</span> <span class="n">r</span> <span class="o">=</span> <span class="p">(</span><span class="n">t_parsei</span> <span class="n">∘</span> <span class="n">t_dedup</span><span class="p">[</span><span class="nc">Int</span><span class="p">])</span> <span class="n">⟐</span> <span class="p">{(</span><span class="n">x</span><span class="p">:</span><span class="nc">Int</span><span class="p">,</span> <span class="n">y</span><span class="p">:</span><span class="nc">Int</span><span class="p">)</span> <span class="o">=></span> <span class="n">x</span> <span class="o">+</span> <span class="n">y</span><span class="p">}</span>
</code></pre></div>
<p>Hovering over <code>r</code> in Eclipse shows the expected function type:</p>
<p><img alt="rfn" src="images/ducers4-rfnstate.png"></p>
<p>And,</p>
<div class="highlight"><pre><span></span><code> <span class="n">scala</span><span class="o">></span> <span class="n">println</span><span class="p">(</span><span class="n">sequence</span><span class="p">(</span><span class="n">t</span><span class="p">,</span><span class="nc">List</span><span class="p">(</span><span class="s">"1"</span><span class="p">,</span><span class="s">"2"</span><span class="p">,</span><span class="s">"2"</span><span class="p">,</span><span class="s">"3"</span><span class="p">)))</span>
<span class="nc">List</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">)</span>
<span class="n">scala</span><span class="o">></span> <span class="n">println</span><span class="p">(</span><span class="nc">List</span><span class="p">(</span><span class="s">"1"</span><span class="p">,</span><span class="s">"2"</span><span class="p">,</span><span class="s">"2"</span><span class="p">,</span><span class="s">"3"</span><span class="p">).</span><span class="n">foldLeftSA</span><span class="p">(</span><span class="mi">0</span><span class="p">)(</span><span class="n">r</span><span class="p">))</span>
<span class="nc">List</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
<span class="n">scala</span><span class="o">></span> <span class="n">println</span><span class="p">(</span><span class="n">sequence</span><span class="p">(</span><span class="n">t</span><span class="p">,</span><span class="nc">List</span><span class="p">(</span><span class="s">"3"</span><span class="p">,</span><span class="s">"2"</span><span class="p">,</span><span class="s">"1"</span><span class="p">)))</span>
<span class="mi">6</span>
<span class="n">scala</span><span class="o">></span> <span class="n">println</span><span class="p">(</span><span class="nc">List</span><span class="p">(</span><span class="s">"3"</span><span class="p">,</span><span class="s">"2"</span><span class="p">,</span><span class="s">"1"</span><span class="p">).</span><span class="n">foldLeftSA</span><span class="p">(</span><span class="mi">0</span><span class="p">)(</span><span class="n">r</span><span class="p">))</span>
<span class="mi">6</span>
</code></pre></div>
<p>Tada!</p>
<h4>Well then</h4>
<p><img alt="lene" src="images/StatelessCover.jpg"></p>
<p>I think this came out pretty well. State is where it should be. Type of state, type of reduction and type of transduction are
specified strongly and, while I'm no Scala stylist, the complexity from a user's standpoint is not overbearing.</p>
<p>It is satisfying that the conceptual manner in which transduction type <em>operates on</em> reduction type
can be realized in Scala, and the techniques employed seem quite general. On the other hand, they also feel a bit
tricky, as if we're getting away with something the language doesn't naturally support, and I don't understand the
bounds of their applicability. (It's clearly time to delve into
<a href="https://github.com/milessabin/shapeless">shapeless</a>, which is fortunate because I was
running out of excuses for avoiding Haskell.)</p>
<p>It does not seem advisable to attempt this sort of type transmutation in <code>core.typed</code>;
on the other hand, it appears alluringly simple in <a href="https://github.com/Prismatic/schema">schema</a>, where
you get an entire Turing-complete language for type analysis.</p>Purely functional transducers - where does state belong?2014-11-02T00:00:00-04:002014-11-02T00:00:00-04:00Peter Fraenkeltag:blog.podsnap.com,2014-11-02:/ducers3.html<p>After my <a href="http://blog.podsnap.com/ducers2.html">recent attempt</a> to
provide type annotations for transducers, several people pointed out
that I wasn't accounting for state. The signature of a pure
function transformation, whether in Clojure</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="nf">t/defalias</span> <span class="nv">Transducer</span> <span class="p">(</span><span class="nf">t/TFn</span> <span class="p">[[</span><span class="nv">a</span> <span class="ss">:variance</span> <span class="ss">:covariant</span><span class="p">]</span>
<span class="p">[</span><span class="nv">b</span> <span class="ss">:variance</span> <span class="ss">:contravariant</span><span class="p">]]</span>
<span class="p">(</span><span class="nf">t/All</span> <span class="p">[</span><span class="nv">r</span><span class="p">]</span> <span class="p">[[</span><span class="nv">r</span> <span class="nv">a</span> <span class="nb">-> </span><span class="nv">r</span><span class="p">]</span> <span class="nb">-> </span><span class="p">[</span><span class="nv">r</span> <span class="nv">b</span> <span class="nv">->r</span><span class="p">]])))</span>
</code></pre></div>
<p>or Haskell</p>
<div class="highlight"><pre><span></span><code> <span class="kr">type</span> <span class="kt">Transducer</span> <span class="n">a</span> <span class="n">b</span> <span class="ow">=</span> <span class="n">forall</span> <span class="n">r</span> <span class="o">.</span> <span class="p">(</span><span class="n">r</span> <span class="ow">-></span> <span class="n">a</span> <span class="ow">-></span> <span class="n">r</span><span class="p">)</span> <span class="ow">-></span> <span class="p">(</span><span class="n">r</span> <span class="ow">-></span> <span class="n">b</span> <span class="ow">-></span> <span class="n">r</span><span class="p">)</span>
</code></pre></div>
<p>nowhere acknowledges that the transducer might need to maintain, for example,
the previous value in the series, in order to remove duplicates.</p>
<p>The failure is most obvious to hardcore Haskell programmers, who, as a rule, would …</p><p>After my <a href="http://blog.podsnap.com/ducers2.html">recent attempt</a> to
provide type annotations for transducers, several people pointed out
that I wasn't accounting for state. The signature of a pure
function transformation, whether in Clojure</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="nf">t/defalias</span> <span class="nv">Transducer</span> <span class="p">(</span><span class="nf">t/TFn</span> <span class="p">[[</span><span class="nv">a</span> <span class="ss">:variance</span> <span class="ss">:covariant</span><span class="p">]</span>
<span class="p">[</span><span class="nv">b</span> <span class="ss">:variance</span> <span class="ss">:contravariant</span><span class="p">]]</span>
<span class="p">(</span><span class="nf">t/All</span> <span class="p">[</span><span class="nv">r</span><span class="p">]</span> <span class="p">[[</span><span class="nv">r</span> <span class="nv">a</span> <span class="nb">-> </span><span class="nv">r</span><span class="p">]</span> <span class="nb">-> </span><span class="p">[</span><span class="nv">r</span> <span class="nv">b</span> <span class="nv">->r</span><span class="p">]])))</span>
</code></pre></div>
<p>or Haskell</p>
<div class="highlight"><pre><span></span><code> <span class="kr">type</span> <span class="kt">Transducer</span> <span class="n">a</span> <span class="n">b</span> <span class="ow">=</span> <span class="n">forall</span> <span class="n">r</span> <span class="o">.</span> <span class="p">(</span><span class="n">r</span> <span class="ow">-></span> <span class="n">a</span> <span class="ow">-></span> <span class="n">r</span><span class="p">)</span> <span class="ow">-></span> <span class="p">(</span><span class="n">r</span> <span class="ow">-></span> <span class="n">b</span> <span class="ow">-></span> <span class="n">r</span><span class="p">)</span>
</code></pre></div>
<p>nowhere acknowledges that the transducer might need to maintain, for example,
the previous value in the series, in order to remove duplicates.</p>
<p>The failure is most obvious to hardcore Haskell programmers, who, as a rule, would
rather eat broken glass than modify a hidden state variable in place.
State is kept explicitly, though one might use monad sugar to avoid having
to look at it all the time.</p>
<p>Hardness of core is a relative thing, and Clojure programmers profess more than
a little reverence for referential transparency, so it was a bit surprising to
see this:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">t-dedup</span> <span class="p">[]</span>
<span class="p">(</span><span class="k">fn </span><span class="p">[</span><span class="nv">xf</span><span class="p">]</span>
<span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">prev</span> <span class="p">(</span><span class="nf">volatile!</span> <span class="ss">::none</span><span class="p">)]</span>
<span class="p">(</span><span class="nf">fn</span>
<span class="p">([</span><span class="nv">result</span> <span class="nv">input</span><span class="p">]</span>
<span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">prior</span> <span class="o">@</span><span class="nv">prev</span><span class="p">]</span>
<span class="p">(</span><span class="nf">vreset!</span> <span class="nv">prev</span> <span class="nv">input</span><span class="p">)</span> <span class="c1">;; <= gack!</span>
<span class="p">(</span><span class="k">if </span><span class="p">(</span><span class="nb">= </span><span class="nv">prior</span> <span class="nv">input</span><span class="p">)</span>
<span class="nv">result</span>
<span class="p">(</span><span class="nf">xf</span> <span class="nv">result</span> <span class="nv">input</span><span class="p">))))))))</span>
</code></pre></div>
<p>In case my ostensibly clarifying excisions make it not obvious, this de-duplicator
is the suggested exemplar of a stateful transducer, taken straight from
http://clojure.org/transducers (not including the comment).<sup id="fnref:arity"><a class="footnote-ref" href="#fn:arity">1</a></sup> When applied to a reducing function, it will
produce a new function, closing over a volatile <code>prev</code> variable. Not only is the
resulting function impure, but it is cryptically so. Having no way to detect whether
the "stateful function" has been contaminated by prior use, one must take special care to insure
that it is not used in more than one reduction, to say nothing of more than one thread.</p>
<p>Some of my best friends are Python programmers, but there have to be limits.</p>
<h4>The answer is right in front of us</h4>
<p>Reduction may already be the world's best example of state done right! Novelty
accrues in the reduced quantity, successive versions of which are passed as the left
argument to the reduction function. The quantity is never modified; rather, as usual in functional
programming, new versions are created. If that reduction function is <code>+</code>, state is accumulated
in the form of a sum; that the sum also happens to be the value we're interested in at
the end of the reduction is a useful coincidence.</p>
<p>In fact, Clojure already uses the reduction value to hold state other than result, via the
<code>reduced</code> function, used to indicate that reduction may terminate early (as in
<code>take</code>). What <code>reduced</code> does is wrap the value up in a container class</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="kd">defn </span><span class="nv">reduced</span> <span class="p">[</span><span class="nv">x</span><span class="p">]</span> <span class="p">(</span><span class="nf">clojure.lang.Reduced.</span> <span class="nv">x</span><span class="p">))</span>
</code></pre></div>
<p>Subsequently, we find out whether to bail by calling <code>reduced?</code>, which
ultimately does nothing more than check what class we have:</p>
<div class="highlight"><pre><span></span><code> <span class="kd">static</span> <span class="kd">public</span> <span class="kt">boolean</span> <span class="nf">isReduced</span><span class="p">(</span><span class="n">Object</span> <span class="n">r</span><span class="p">){</span> <span class="k">return</span> <span class="n">r</span> <span class="k">instanceof</span> <span class="n">Reduced</span><span class="p">;</span> <span class="p">}</span>
</code></pre></div>
<p>As <code>Reduced</code> implements <code>IDeref</code>, the wrapped value can ultimately be extracted
with <code>@</code>, which just calls the class's <code>deref</code> method.</p>
<p>This happens to be the one and only way in which state is currently embedded in the reduction,
but if I were king, things would be different.</p>
<h4>L'etat, c'est moi</h4>
<p>In Clojure, the obvious way to attach state to some arbitrary value to the accumulated
reduction is with the inbuilt metadata
facility. The object returned by <code>(with-meta obj {:boffo "foo"}</code>
is identical to <code>obj</code> from all perspectives other than that of
silly individuals who put <code>(:boffo (meta obj))</code> in their code,
so we could have have our state, without the <code>reduce</code> consumer
having to eat it. The only problem is that you can't attach metadata
to scalar primitives (like numbers), which tend to figure prominently in numerical
reductions. This, presumably, is why <code>reduced</code> didn't take that route.</p>
<p>Let's define a type that holds state for the deduplicator. It should hold the previous value,
and of course we still have to keep the accumulated reduction value:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">deftype </span><span class="nv">DedupStateWrapper</span> <span class="p">[</span><span class="nv">acc</span> <span class="nv">prev</span><span class="p">])</span>
</code></pre></div>
<p>In the transducer, we'll check to see whether the result is already
wrapped</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">t-dedup1</span> <span class="p">[</span><span class="nv">rf</span><span class="p">]</span>
<span class="p">(</span><span class="k">fn </span><span class="p">[</span><span class="nv">result</span> <span class="nv">input</span><span class="p">]</span>
<span class="p">(</span><span class="k">if </span><span class="p">(</span><span class="nb">instance? </span><span class="nv">DedupStateWrapper</span> <span class="nv">result</span><span class="p">)</span>
<span class="nv">...</span>
</code></pre></div>
<p>and, if so, check whether the new <code>input</code> matches the previous:</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="k">if </span><span class="p">(</span><span class="nb">= </span><span class="nv">input</span> <span class="p">(</span><span class="nf">.prev</span> <span class="nv">result</span><span class="p">))</span>
<span class="nv">...</span>
</code></pre></div>
<p>If it's a duplicate, the <code>result</code> shouldn't change; otherwise, call the
original reduction function and wrap its return value, setting the state
to new <code>input</code>:</p>
<div class="highlight"><pre><span></span><code> <span class="nv">result</span>
<span class="p">(</span><span class="nf">DedupStateWrapper.</span> <span class="p">(</span><span class="nf">rf</span> <span class="p">(</span><span class="nf">.acc</span> <span class="nv">result</span><span class="p">)</span> <span class="nv">input</span><span class="p">)</span> <span class="nv">input</span><span class="p">))</span>
<span class="nv">...</span>
</code></pre></div>
<p>Finally, if the <code>result</code> hasn't been wrapped yet, <code>input</code>
can't possibly be duplicate, so invoke the reduction function
and wrap it up:</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="nf">DedupStateWrapper.</span> <span class="p">(</span><span class="nf">rf</span> <span class="nv">result</span> <span class="nv">input</span><span class="p">)</span> <span class="nv">input</span><span class="p">))))</span>
</code></pre></div>
<p>Let's try it out:</p>
<div class="highlight"><pre><span></span><code><span class="nv">playground.transducers></span> <span class="p">(</span><span class="nb">reduce </span><span class="p">(</span><span class="nf">t-dedup1</span> <span class="nv">+</span><span class="p">)</span> <span class="mi">0</span> <span class="p">[</span><span class="mi">1</span> <span class="mi">1</span> <span class="mi">2</span> <span class="mi">2</span><span class="p">])</span>
<span class="o">#</span><span class="nv"><DedupStateWrapper</span> <span class="nv">playground.transducers.DedupStateWrapper</span><span class="o">@</span><span class="mi">1</span><span class="nv">bf0ed5c></span>
</code></pre></div>
<p>Oh right.</p>
<div class="highlight"><pre><span></span><code><span class="nv">playground.transducers></span> <span class="p">(</span><span class="nf">.acc</span> <span class="p">(</span><span class="nb">reduce </span><span class="p">(</span><span class="nf">t-dedup1</span> <span class="nv">+</span><span class="p">)</span> <span class="mi">0</span> <span class="p">[</span><span class="mi">1</span> <span class="mi">1</span> <span class="mi">2</span> <span class="mi">2</span><span class="p">]))</span>
<span class="mi">3</span>
</code></pre></div>
<p>I'd hoped to avoid the manual unwrapping -
before realizing that metadata wouldn't work - but we can argue that
it's a good thing, forcing some acknowledgment of recent statefulness.</p>
<p>In any
case, without modifying <code>reduce</code> to do our unwrapping (as it already does for
<code>Reduced</code>), this is something we just have to live with.</p>
<p>So let's clean up. First, we might as well make a more general-sounding <code>StateWrapper</code>,
since all transducers basically want the same thing. We'll make it print all pretty
like,</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">deftype </span><span class="nv">StateWrapper</span> <span class="p">[</span><span class="nv">value</span> <span class="nv">state</span><span class="p">]</span>
<span class="nv">Object</span>
<span class="p">(</span><span class="nf">toString</span> <span class="p">[</span><span class="nv">this</span><span class="p">]</span> <span class="p">(</span><span class="nb">str </span><span class="s">"("</span> <span class="p">(</span><span class="nf">.value</span> <span class="nv">this</span><span class="p">)</span> <span class="s">","</span> <span class="p">(</span><span class="nf">.state</span> <span class="nv">this</span><span class="p">)</span> <span class="s">")"</span><span class="p">)))</span>
</code></pre></div>
<p>and write some convenience functions to extract its innards:<sup id="fnref:arity1"><a class="footnote-ref" href="#fn:arity1">2</a></sup></p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">get-state</span> <span class="p">[</span><span class="nv">r</span><span class="p">]</span>
<span class="p">(</span><span class="nb">when </span><span class="p">(</span><span class="nb">instance? </span><span class="nv">StateWrapper</span> <span class="nv">r</span><span class="p">)</span>
<span class="p">[(</span><span class="nf">.value</span> <span class="nv">r</span><span class="p">)</span> <span class="p">(</span><span class="nf">.state</span> <span class="nv">r</span><span class="p">)]))</span>
<span class="p">(</span><span class="kd">defn </span><span class="nv">unwrap</span> <span class="p">[</span><span class="nv">r</span><span class="p">]</span>
<span class="p">(</span><span class="k">if </span><span class="p">(</span><span class="nb">instance? </span><span class="nv">StateWrapper</span> <span class="nv">r</span><span class="p">)</span> <span class="p">(</span><span class="nf">unwrap</span> <span class="p">(</span><span class="nf">.value</span> <span class="nv">r</span><span class="p">))</span> <span class="nv">r</span><span class="p">))</span>
</code></pre></div>
<p>The deduplicator is now:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">t-dedup</span> <span class="p">[</span><span class="nv">rf</span><span class="p">]</span>
<span class="p">(</span><span class="k">fn </span><span class="p">[</span><span class="nv">result</span> <span class="nv">input</span><span class="p">]</span>
<span class="p">(</span><span class="nf">debug</span> <span class="nv">result</span> <span class="nv">input</span><span class="p">)</span>
<span class="p">(</span><span class="nb">if-let </span><span class="p">[[</span><span class="nv">result-val</span> <span class="nv">input-prev</span><span class="p">]</span> <span class="p">(</span><span class="nf">get-state</span> <span class="nv">result</span><span class="p">)]</span>
<span class="p">(</span><span class="k">if </span><span class="p">(</span><span class="nb">= </span><span class="nv">input</span> <span class="nv">input-prev</span><span class="p">)</span>
<span class="nv">result</span>
<span class="p">(</span><span class="nf">StateWrapper.</span> <span class="p">(</span><span class="nf">rf</span> <span class="nv">result-val</span> <span class="nv">input</span><span class="p">)</span> <span class="nv">input</span><span class="p">))</span>
<span class="p">(</span><span class="nf">StateWrapper.</span> <span class="p">(</span><span class="nf">rf</span> <span class="nv">result</span> <span class="nv">input</span><span class="p">)</span> <span class="nv">input</span><span class="p">))))</span>
</code></pre></div>
<p>and we'll also write an uncontroversially stateless transducer that
truncates numbers to integers, which will be useful for testing.</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">t-trunc</span> <span class="p">[</span><span class="nv">reduction-function</span><span class="p">]</span>
<span class="p">(</span><span class="k">fn </span><span class="p">[</span><span class="nv">result</span> <span class="nv">input</span><span class="p">]</span>
<span class="p">(</span><span class="nf">debug</span> <span class="nv">result</span> <span class="nv">input</span><span class="p">)</span>
<span class="p">(</span><span class="nf">reduction-function</span> <span class="nv">result</span> <span class="p">(</span><span class="nb">int </span><span class="nv">input</span><span class="p">))))</span>
</code></pre></div>
<h4>Composition</h4>
<p>We'll now take a series of floating point numbers <code>[1.0 2.0 2.5 2.5 3.0]</code> and run them through a stack
of composed transducers to deduplicate, truncate and deduplicate again:</p>
<div class="highlight"><pre><span></span><code><span class="nv">playground.transducers></span> <span class="p">(</span><span class="nf">unwrap</span> <span class="p">(</span><span class="nb">reduce </span><span class="p">((</span><span class="nb">comp </span><span class="nv">t-dedup</span> <span class="nv">t-trunc</span> <span class="nv">t-dedup</span><span class="p">)</span> <span class="nv">conj</span><span class="p">)</span> <span class="p">[]</span> <span class="p">[</span><span class="mf">1.0</span> <span class="mf">2.0</span> <span class="mf">2.5</span> <span class="mf">2.5</span> <span class="mf">3.0</span><span class="p">]))</span>
<span class="p">[</span><span class="mi">1</span> <span class="mi">2</span> <span class="mi">3</span><span class="p">]</span>
</code></pre></div>
<p>Had we applied corresponding transformations to the entire series, rather than as composed transducers,
we would see the following steps:</p>
<div class="highlight"><pre><span></span><code><span class="p">[</span><span class="mf">1.0</span> <span class="mf">2.0</span> <span class="mf">2.5</span> <span class="mf">2.5</span> <span class="mf">3.0</span><span class="p">]</span>
<span class="p">[</span><span class="mf">1.0</span> <span class="mf">2.0</span> <span class="mf">2.5</span> <span class="mf">3.0</span><span class="p">]</span> <span class="c1">; duplicates removed</span>
<span class="p">[</span><span class="mi">1</span> <span class="mi">2</span> <span class="mi">2</span> <span class="mi">3</span><span class="p">]</span> <span class="c1">; truncated, revealing more duplication</span>
<span class="p">[</span><span class="mi">1</span> <span class="mi">2</span> <span class="mi">3</span><span class="p">]</span> <span class="c1">; duplicates resulting from truncation removed</span>
</code></pre></div>
<p>In transducer land, we're going to get to this result by a different route, having composed the
transformation to run on each input iteratively. With so much wrapping and unwrapping going on,
it's helpful to turn on debugging. Note that we can
tell whether we're in the first or second deduplication by whether the input is, respectively,
a double or an integer.</p>
<p>On the first round,</p>
<div class="highlight"><pre><span></span><code>t-dedup [] 1.0 ;; first t-dedup receives empty vector and 1.0; passes down to t-trunc
t-trunc [] 1.0 ;; t-trunc gets empty vector and 1.0; truncates and passes to t-dedup
t-dedup [] 1 ;; second t-dedup receives empty vector and truncated value; passes to conj
test-conj [] 1 ;; performs conj, returns [1]
</code></pre></div>
<p>On the way back up the composed function stack, the second <code>t-dedup</code> wraps the conj'd result and
the current input as <code>([1],1)</code> and passes it up to <code>t-trunc</code>, which passes it up, unmodified,
to the first <code>t-dedup</code>, which wraps it again, together with its first input. The current result,
which will be passed to the next iteration is thus <code>(([1],1),1.0)</code>:</p>
<div class="highlight"><pre><span></span><code>t-dedup (([1],1),1.0) 2.0 ;; First t-dedup compares 1.0 and 2.0;
unwraps and invokes t-trunc
t-trunc ([1],1) 2.0 ;; t-trunc converts to int; invokes 2nd t-dedup
t-dedup ([1],1) 2 ;; 2nd t-dedup compares 1 and 2; unwraps and invokes conj
test-conj [1] 2
t-dedup (([1 2],2),2.0) 2.5 ;; First t-dedup compares 2.0, 2.5;
unwraps and invokes t-trunc
t-trunc ([1 2],2) 2.5 ;; t-trunc converts to int; invokes 2nd t-dedup
t-dedup ([1 2],2) 2 ;; 2nd t-dedup finds 2==2; DOESN'T invoke conj
</code></pre></div>
<p>Skipping over <code>conj</code>, we move forward:</p>
<div class="highlight"><pre><span></span><code>t-dedup (([1 2],2),2.5) 2.5 ;; First t-dedup finds 2.5==2.5; DOESN'T invoke t-trunc
t-dedup (([1 2],2),2.5) 3.0 ;; First t-dedup compares 2.5, 3.0; invokes t-trunc
t-trunc ([1 2],2) 3.0 ;; t-trunc converts to int; invokes 2nd t-dedup
t-dedup ([1 2],2) 3 ;; 2nd dedup compares 2,3; invokes conj
test-conj [1 2] 3
</code></pre></div>
<p>The final <code>[1 2 3]</code> passes up through two layers of wrapping, so the value returned by
<code>reduce</code> is <code>(([1 2 3] 3), 3.0)</code>, which <code>unwrap</code> peels apart recursively.</p>
<h4>Unsolved problems</h4>
<ol>
<li>
<p>All this wrapping and unwrapping doesn't come for free, computationally
speaking. It seems possible that the JIT will optimize some of it away,
but I haven't checked yet.</p>
</li>
<li>
<p>Having abolished crypto-state, we're a little closer to solving the type
problem, but not quite there yet.</p>
</li>
</ol>
<p>To be continued...</p>
<div class="footnote">
<hr>
<ol>
<li id="fn:arity">
<p>Yes, I know I've left off the 0 and 1-arity forms for the returned reduction function. While necessary, in real life, they don't affect the state discussion. <a class="footnote-backref" href="#fnref:arity" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
<li id="fn:arity1">
<p>The arity-1 form could in principle embed <code>unwrap</code>, but that wouldn't help for non-terminating processes. <a class="footnote-backref" href="#fnref:arity1" title="Jump back to footnote 2 in the text">↩</a></p>
</li>
</ol>
</div>Type-safe transducers in Clojure. And Scala. And Haskell.2014-10-29T00:00:00-04:002014-10-29T00:00:00-04:00Peter Fraenkeltag:blog.podsnap.com,2014-10-29:/ducers2.html<h4>TL;DR</h4>
<ol>
<li>As noted earlier, transducers can be properly annotated in Clojure using <code>core.typed</code> and they probably should be.</li>
<li>But... there are a few tricks necessary to make it work.</li>
<li>Transducers in Scala require tricks too, but different ones.</li>
<li>Oh, but they're so lovely in Haskell.</li>
</ol>
<h4>Update 2015-01-12</h4>
<p>Were you led here by Clojure Gazette? Eric Normand is usually more discriminating, but don't worry, this
will only waste a little of your time. Per the previous batch of updates, just below, and various subsequent
posts on more or less the same topic, it should be clear this wee bagatelle …</p><h4>TL;DR</h4>
<ol>
<li>As noted earlier, transducers can be properly annotated in Clojure using <code>core.typed</code> and they probably should be.</li>
<li>But... there are a few tricks necessary to make it work.</li>
<li>Transducers in Scala require tricks too, but different ones.</li>
<li>Oh, but they're so lovely in Haskell.</li>
</ol>
<h4>Update 2015-01-12</h4>
<p>Were you led here by Clojure Gazette? Eric Normand is usually more discriminating, but don't worry, this
will only waste a little of your time. Per the previous batch of updates, just below, and various subsequent
posts on more or less the same topic, it should be clear this wee bagatelle is not meant to be
authoritative. In particular, nobody should try the approach I use with Scala here; the more
obvious and better one is at the very beginning of the fourth post in the series; a somewhat zanier one
occupies the remainder of that post.</p>
<p>This series of transducer posts has helped me clarify some
thoughts on
referential transparency, bug detection, type systems and language evolution.
If you are morbidly curious about all the possible ways a person can be wrong
on these topics, you might enjoy reading them:</p>
<ol>
<li>A <a href="http://blog.podsnap.com/ducers.html">glossary</a> of transducerish terms.</li>
<li>(You are here.)</li>
<li><a href="http://blog.podsnap.com/ducers3.html">Initial fretting</a> about stateful transducers.</li>
<li><a href="http://blog.podsnap.com/ducers4.html">State+type</a> for transducers (in Scala).</li>
<li><a href="http://blog.podsnap.com/lost-in-translation.html">Rue and despond</a>.</li>
</ol>
<h4>Updates 10-29 20:00 (thanks, Twittersphere)</h4>
<ol>
<li>
<p>I am apparently confused about the difference between universal and existential types. Happily, I don't
seem to be alone in this, but I promise to figure it out anyway...</p>
</li>
<li>
<p>It would probably be more natural (and certainly more concise)
to stick to the trait/apply solution in Scala than to try to emulate a Haskell style, interesting
though the attempt may have been.
Under the hood, the complicated functions are still classes with an apply method anyway.</p>
</li>
<li>
<p>My Haskell type doesn't acknowledge that transducers might have state. The Scala and Clojure
versions don't either, but that's more acceptable in their cultures.</p>
</li>
</ol>
<h4>Transducers</h4>
<p>I won't explain transducers here. The canonical introduction is
Rich Hickey's <a href="http://blog.cognitect.com/blog/2014/8/6/transducers-are-coming">blog post</a>, with
further explanation in his <a href="https://www.youtube.com/watch?v=6mTbuzafcII&noredirect=1">Strangeloop talk</a>.
I contributed a <a href="http://blog.podsnap.com/ducers.html">brief glossary</a>, which may possibly be helpful.</p>
<h4>Why bother with typed transducers in Clojure</h4>
<p>At the end of an <a href="http://blog.podsnap.com/vanhole.html">earlier post</a>, I noted that, despite
some <a href="http://conscientiousprogrammer.com/blog/2014/08/07/understanding-cloure-transducers-through-types/">controversy</a>
on the subject, transducer's type can be defined with <code>core.typed</code> (I'll walk through this a bit further down,
so don't panic...)</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="nf">t/defalias</span> <span class="nv">ReducingFn</span>
<span class="p">(</span><span class="nf">t/TFn</span> <span class="p">[[</span><span class="nv">a</span> <span class="ss">:variance</span> <span class="ss">:contravariant</span><span class="p">]</span>
<span class="p">[</span><span class="nv">r</span> <span class="ss">:variance</span> <span class="ss">:invariant</span><span class="p">]]</span>
<span class="p">[</span><span class="nv">r</span> <span class="nv">a</span> <span class="nb">-> </span><span class="nv">r</span><span class="p">]))</span>
<span class="p">(</span><span class="nf">t/defalias</span> <span class="nv">Transducer</span> <span class="p">(</span><span class="nf">t/TFn</span> <span class="p">[[</span><span class="nv">a</span> <span class="ss">:variance</span> <span class="ss">:covariant</span><span class="p">]</span>
<span class="p">[</span><span class="nv">b</span> <span class="ss">:variance</span> <span class="ss">:contravariant</span><span class="p">]]</span>
<span class="p">(</span><span class="nf">t/All</span> <span class="p">[</span><span class="nv">r</span><span class="p">]</span> <span class="p">[(</span><span class="nf">ReducingFn</span> <span class="nv">a</span> <span class="nv">r</span><span class="p">)</span> <span class="nb">-> </span><span class="p">(</span><span class="nf">ReducingFn</span> <span class="nv">b</span> <span class="nv">r</span><span class="p">)])))</span>
</code></pre></div>
<p>in a manner fairly evocative of the way you'd do it in Haskell:</p>
<div class="highlight"><pre><span></span><code> <span class="kr">type</span> <span class="kt">ReducingFn</span> <span class="n">a</span> <span class="n">r</span> <span class="ow">=</span> <span class="n">r</span> <span class="ow">-></span> <span class="n">a</span> <span class="ow">-></span> <span class="n">r</span>
<span class="kr">type</span> <span class="kt">Transducer</span> <span class="n">a</span> <span class="n">b</span> <span class="ow">=</span> <span class="n">forall</span> <span class="n">r</span> <span class="o">.</span> <span class="kt">ReducingFn</span> <span class="n">a</span> <span class="n">r</span> <span class="ow">-></span> <span class="kt">ReducingFn</span> <span class="n">b</span> <span class="n">r</span>
</code></pre></div>
<p>While these representations may be more explanatory (to some, anyway) than the graphical illustration</p>
<p><img alt="little boxes" src="images/transducer-graphic-type.png"></p>
<p>in Rich's talk,
explanation is not the main point. Neither is the triumphal riposte that transducers are yet
another thing that isn't a good example of the superiority of dynamic typing.<sup id="fnref:riposte"><a class="footnote-ref" href="#fn:riposte">1</a></sup></p>
<p>With or without types, you're going to figure out transducers
eventually, and I doubt you're going to understand them by types
alone. It might even be better to go untyped, since a good flailing
of trial and error can have educational value.</p>
<p>That's is a less attractive option when writing code that's meant to do something real,
and that's where a type system can be helpful. If you use transducers - and you will, because they're
incredibly powerful - you will at some point be confounded by mysterious bugs of your own creation. You will
get confused by the funny reversed order of composition. And then you will stare, despairingly,
at long stack traces containing multiple anonymous functions. Then you'll festoon your code with more
and more <code>println</code>s (or, if you're fancy, logging macros) until the head-slap moment occurs.</p>
<h4>Slapless</h4>
<p>I'll get into the details of the above annotations in a bit, but for now just take them as given. Accept
also that, for some reason, there's a special composition function <code>compt</code> just for transducers.</p>
<p>Our artificial goal is going to be to take a sequence of strings representing integers, like
<code>["1" "2" "3"]</code>, parse them, multiply them by something and then, for each integer calculate
$\sqrt[n]{2}$, and finally add those roots up. Here are my three transducers (ignoring, for simplicity,
the zero- and one- argument alternatives for the returned function):</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="nf">t/ann</span> <span class="nv">t-parsei</span> <span class="p">(</span><span class="nf">Transducer</span> <span class="nv">t/Int</span> <span class="nv">t/Str</span><span class="p">))</span>
<span class="p">(</span><span class="kd">defn </span><span class="nv">t-parsei</span> <span class="p">[</span><span class="nv">rf</span><span class="p">]</span>
<span class="p">(</span><span class="k">fn </span><span class="p">[</span><span class="nv">result</span> <span class="nv">input</span><span class="p">]</span>
<span class="p">(</span><span class="nf">rf</span> <span class="nv">result</span> <span class="p">(</span><span class="nf">Integer/parseInt</span> <span class="nv">input</span><span class="p">))))</span>
<span class="p">(</span><span class="nf">t/ann</span> <span class="nv">t-repn</span> <span class="p">(</span><span class="nf">Transducer</span> <span class="nv">Number</span> <span class="nv">Number</span><span class="p">))</span>
<span class="p">(</span><span class="kd">defn </span><span class="nv">t-repn</span> <span class="p">[</span><span class="nv">rf</span><span class="p">]</span>
<span class="p">(</span><span class="k">fn </span><span class="p">[</span><span class="nv">result</span> <span class="nv">input</span><span class="p">]</span>
<span class="p">(</span><span class="nf">rf</span> <span class="p">(</span><span class="nf">rf</span> <span class="nv">result</span> <span class="nv">input</span><span class="p">)</span> <span class="nv">input</span><span class="p">)))</span>
<span class="p">(</span><span class="nf">t/ann</span> <span class="nv">t-root</span> <span class="p">(</span><span class="nf">Transducer</span> <span class="nv">Double</span> <span class="nv">Number</span><span class="p">))</span>
<span class="p">(</span><span class="kd">defn </span><span class="nv">t-root</span> <span class="p">[</span><span class="nv">rf</span><span class="p">]</span>
<span class="p">(</span><span class="k">fn </span><span class="p">[</span><span class="nv">acc</span> <span class="nv">in</span><span class="p">]</span>
<span class="p">(</span><span class="nf">rf</span> <span class="nv">acc</span> <span class="p">(</span><span class="nf">pow</span> <span class="mf">2.0</span> <span class="p">(</span><span class="nb">/ </span><span class="mf">1.0</span> <span class="p">(</span><span class="nb">double </span><span class="nv">in</span><span class="p">))))))</span>
</code></pre></div>
<p>Taking the <code>Transducer</code> type function as given, these annotations make sense. The
first transducer transforms a function that reduces over integers to one that reduces over
strings; the last transforms a function that reduces over doubles to one that reduces over
integers; and the one in the middle doesn't change the type at all.</p>
<p>If all goes well, I should be able to compose the transducers, apply them to the
<code>+</code> reducing function and reduce,</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="nb">reduce </span><span class="p">((</span><span class="nf">compt</span> <span class="nv">t-root</span> <span class="nv">t-repn</span> <span class="nv">t-parsei</span><span class="p">)</span> <span class="nv">+</span><span class="p">)</span> <span class="mi">0</span> <span class="p">[</span><span class="s">"1"</span> <span class="s">"2"</span> <span class="s">"3"</span><span class="p">])</span>
</code></pre></div>
<p>but this doesn't get past type-checking:</p>
<div class="highlight"><pre><span></span><code> <span class="n">Domains</span><span class="o">:</span><span class="p">[</span><span class="n">x</span> <span class="o">-></span> <span class="n">y</span><span class="p">]</span> <span class="p">[</span><span class="n">b</span> <span class="p">...</span> <span class="n">b</span> <span class="o">-></span> <span class="n">x</span><span class="p">]</span>
<span class="n">Arguments</span><span class="o">:</span>
<span class="p">[[</span><span class="n">t</span><span class="o">/</span><span class="kr">Any</span> <span class="n">Number</span> <span class="o">-></span> <span class="n">t</span><span class="o">/</span><span class="kr">Any</span><span class="p">]</span> <span class="o">-></span> <span class="p">[</span><span class="n">t</span><span class="o">/</span><span class="kr">Any</span> <span class="n">Number</span> <span class="o">-></span> <span class="n">t</span><span class="o">/</span><span class="kr">Any</span><span class="p">]]</span> <span class="p">[[</span><span class="n">t</span><span class="o">/</span><span class="kr">Any</span> <span class="n">t</span><span class="o">/</span><span class="n">Int</span> <span class="o">-></span> <span class="n">t</span><span class="o">/</span><span class="kr">Any</span><span class="p">]</span> <span class="o">-></span> <span class="p">[</span><span class="n">t</span><span class="o">/</span><span class="kr">Any</span> <span class="n">t</span><span class="o">/</span><span class="n">Str</span> <span class="o">-></span> <span class="n">t</span><span class="o">/</span><span class="kr">Any</span><span class="p">]]</span>
</code></pre></div>
<p>Squinting at the last line slightly,</p>
<div class="highlight"><pre><span></span><code> <span class="p">[</span> <span class="n">Number</span> <span class="p">]</span> <span class="o">-></span> <span class="p">[</span> <span class="n">Number</span> <span class="p">]]</span> <span class="p">[[</span> <span class="n">t</span><span class="o">/</span><span class="n">Int</span> <span class="p">]</span> <span class="o">-></span> <span class="p">[</span> <span class="n">t</span><span class="o">/</span><span class="n">Str</span> <span class="p">]]</span>
</code></pre></div>
<p>we see the problem: the transducers are reversed. That's an easy mistake to make, with all those functions of functions strewn
about, but it's also easy to fix, once we have a timely and specific error. (I won't pretend that it's a particularly elegant error, but,
once you get used to reading it, it's a hell of a lot more timely and specific than an exception and stack trace at runtime.)</p>
<p>Back on the straight and narrow, we get the result we wanted:</p>
<div class="highlight"><pre><span></span><code><span class="nv">user></span> <span class="p">(</span><span class="nf">t/cf</span> <span class="p">(</span><span class="nf">compt</span> <span class="nv">t-parsei</span> <span class="nv">t-repn</span> <span class="nv">t-root</span><span class="p">))</span>
<span class="p">(</span><span class="nf">t/All</span> <span class="p">[</span><span class="nv">r</span><span class="p">]</span> <span class="p">[[</span><span class="nv">r</span> <span class="nv">Double</span> <span class="nb">-> </span><span class="nv">r</span><span class="p">]</span> <span class="nb">-> </span><span class="p">[</span><span class="nv">r</span> <span class="nv">String</span> <span class="nb">-> </span><span class="nv">r</span><span class="p">]])</span>
<span class="nv">user></span> <span class="p">(</span><span class="nb">reduce </span><span class="p">((</span><span class="nf">compt</span> <span class="nv">t-root</span> <span class="nv">t-repn</span> <span class="nv">t-parsei</span><span class="p">)</span> <span class="nv">+</span><span class="p">)</span> <span class="mi">0</span> <span class="p">[</span><span class="s">"1"</span> <span class="s">"2"</span> <span class="s">"3"</span><span class="p">])</span>
<span class="mf">9.348269224535935</span>
</code></pre></div>
<h4>Type functions definitions</h4>
<p>So, <code>ReducingFn</code> and <code>Transducer</code> seem pretty useful. How did we make them?</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="nf">t/defalias</span> <span class="nv">ReducingFn</span>
<span class="p">(</span><span class="nf">t/TFn</span> <span class="p">[[</span><span class="nv">a</span> <span class="ss">:variance</span> <span class="ss">:contravariant</span><span class="p">]</span>
<span class="p">[</span><span class="nv">r</span> <span class="ss">:variance</span> <span class="ss">:invariant</span><span class="p">]]</span>
<span class="p">[</span><span class="nv">r</span> <span class="nv">a</span> <span class="nb">-> </span><span class="nv">r</span><span class="p">]))</span>
</code></pre></div>
<p>The <code>TFn</code> indicates that we're making a type function, i.e. a function of types that returns another type. The two types it takes are <code>a</code> (the type we are
reducing over) and <code>r</code> (the type we're reducing to). Since we ought to be able to substitute a function that knows how to consume <code>Number</code>s in general
for a function that
will encounter only <code>Int</code>s, the <code>ReducingFn</code> is <strong>contravariant</strong> in <code>a</code>, by the Liskov substitution principle. On the other hand, the exact opposite is true for
the value returned by a function: if the recipient wants <code>Int</code>, it's not going to be happy with any old <code>Number</code>, but it could handle a <code>Short</code> or some other
subtype. As <code>r</code> appears both as argument (suggesting contravariance) and return type (suggesting variance), it has to be <strong>invariant</strong>.</p>
<p>The <code>Transducer</code> type function returns the type of a function that consumes one <code>ReducingFn</code> and returns another.</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="nf">t/defalias</span> <span class="nv">Transducer</span> <span class="p">(</span><span class="nf">t/TFn</span> <span class="p">[[</span><span class="nv">a</span> <span class="ss">:variance</span> <span class="ss">:covariant</span><span class="p">]</span>
<span class="p">[</span><span class="nv">b</span> <span class="ss">:variance</span> <span class="ss">:contravariant</span><span class="p">]]</span>
<span class="p">(</span><span class="nf">t/All</span> <span class="p">[</span><span class="nv">r</span><span class="p">]</span> <span class="p">[(</span><span class="nf">ReducingFn</span> <span class="nv">a</span> <span class="nv">r</span><span class="p">)</span> <span class="nb">-> </span><span class="p">(</span><span class="nf">ReducingFn</span> <span class="nv">b</span> <span class="nv">r</span><span class="p">)])))</span>
</code></pre></div>
<p>If someone is expecting a <code>Transducer</code> that
consumes a particular kind of <code>ReducingFn</code>, they should be happy with a <code>Transducer</code> that consumes a supertype of that <code>ReducingFn</code>, i.e. <code>Transducer</code> is
contravariant in the type <code>ReducingFn</code> used as its argument.
But, since <code>ReducingFn</code>s are themselves contravariant in the type they reduce over, the <code>Transducer</code> must be <strong>covariant</strong> in <code>a</code>.
By contrast, the <code>Transducer</code> is covariant in the type of <code>ReducingFn</code> it returns, but since the <code>ReducingFn</code> is contravariant in the
type it consumes, the <code>Transducer</code> must be <strong>contravariant</strong> in <code>b</code>.</p>
<p>Phew. It might come as a relief that the <code>Transducer</code> doesn't give a damn about the type <code>r</code> being reduced to. To advertise our apathy,
while at the same time promising that we won't mess with <code>r</code>, we need the <code>All</code> keyword, indicating a so-called existential type.</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="nf">t/defalias</span> <span class="nv">Transducer</span> <span class="p">(</span><span class="nf">t/TFn</span> <span class="p">[[</span><span class="nv">a</span> <span class="ss">:variance</span> <span class="ss">:covariant</span><span class="p">]</span>
<span class="p">[</span><span class="nv">b</span> <span class="ss">:variance</span> <span class="ss">:contravariant</span><span class="p">]]</span>
<span class="p">(</span><span class="nf">t/All</span> <span class="p">[</span><span class="nv">r</span><span class="p">]</span> <span class="p">[(</span><span class="nf">ReducingFn</span> <span class="nv">a</span> <span class="nv">r</span><span class="p">)</span> <span class="nb">-> </span><span class="p">(</span><span class="nf">ReducingFn</span> <span class="nv">b</span> <span class="nv">r</span><span class="p">)])))</span>
</code></pre></div>
<h4>Tricks and compromises with typed Clojure</h4>
<p>You may have wondered why we had to define a special <code>t-repn</code> for
repeating numbers? We could in fact have created a more general
version</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="nf">t/ann</span> <span class="o">^</span><span class="ss">:no-check</span> <span class="nv">t-rep</span> <span class="p">(</span><span class="nf">t/All</span> <span class="p">[</span><span class="nv">a</span><span class="p">]</span> <span class="p">(</span><span class="nf">Transducer</span> <span class="nv">a</span> <span class="nv">a</span><span class="p">)))</span>
</code></pre></div>
<p>with (since Clojure is still dynamically typed underneath our
annotations) exactly the same definition. However, when
we actually use <code>t-rep</code>, we need to inform typed Clojure exactly which
existential variant we really want, by <code>inst</code>antiating it:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="nf">t/cf</span> <span class="p">(</span><span class="nf">compt</span> <span class="nv">t-parsei</span> <span class="p">(</span><span class="nf">t/inst</span> <span class="nv">t-rep</span> <span class="nv">t/Int</span><span class="p">)</span> <span class="nv">t-root</span><span class="p">))</span>
</code></pre></div>
<p>This is because typed Clojure only performs <strong>local type inference</strong>. You
can read more about the limitation in
<a href="http://frenchy64.github.io/typed/clojure,/core.typed,/clojure/2013/09/02/polymorphic-hof.html">this post</a> and in the references
it contains, but the gist is that nothing is ever inferred by working backwards
from the return type of a function, so you need to provide a crutch. Most
languages with some kind of automatic type inference perform the local variety;
a few, like OCaml and Haskell, do a much fuller job; and of course the vast
majority of languages do none whatsoever.</p>
<p>The other oddity is one I mentioned earlier: we're not using Clojure's
normal <code>comp</code>. Why? Well, consider the type of a simple composition
function:</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="nf">All</span> <span class="p">[</span><span class="nv">a</span> <span class="nv">b</span> <span class="nv">c</span><span class="p">]</span> <span class="p">[[</span><span class="nv">b</span> <span class="nb">-> </span><span class="nv">c</span><span class="p">]</span> <span class="p">[</span><span class="nv">a</span> <span class="nb">-> </span><span class="nv">b</span><span class="p">]</span> <span class="nb">-> </span><span class="p">[</span><span class="nv">a</span> <span class="nb">-> </span><span class="nv">c</span><span class="p">]])</span>
</code></pre></div>
<p>That makes sense. The first function to be applied converts from <code>a</code> to <code>b</code>, and then
the second converts the <code>b</code> to a <code>c</code>. Now, let's compose 3 and 4 functions:</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="nf">All</span> <span class="p">[</span><span class="nv">a</span> <span class="nv">b</span> <span class="nv">c</span> <span class="nv">d</span><span class="p">]</span> <span class="p">[[</span><span class="nv">c</span> <span class="nb">-> </span><span class="nv">d</span><span class="p">]</span> <span class="p">[</span><span class="nv">b</span> <span class="nb">-> </span><span class="nv">c</span><span class="p">]</span> <span class="p">[</span><span class="nv">a</span> <span class="nb">-> </span><span class="nv">b</span><span class="p">]</span> <span class="nb">-> </span><span class="p">[</span><span class="nv">a</span> <span class="nb">-> </span><span class="nv">d</span><span class="p">]])</span>
<span class="p">(</span><span class="nf">All</span> <span class="p">[</span><span class="nv">a</span> <span class="nv">b</span> <span class="nv">c</span> <span class="nv">d</span> <span class="nv">e</span><span class="p">]</span> <span class="p">[[</span><span class="nv">d</span> <span class="nb">-> </span><span class="nv">e</span><span class="p">]</span> <span class="p">[</span><span class="nv">c</span> <span class="nb">-> </span><span class="nv">d</span><span class="p">]</span> <span class="p">[</span><span class="nv">b</span> <span class="nb">-> </span><span class="nv">c</span><span class="p">]</span> <span class="p">[</span><span class="nv">a</span> <span class="nb">-> </span><span class="nv">b</span><span class="p">]</span> <span class="nb">-> </span><span class="p">[</span><span class="nv">a</span> <span class="nb">-> </span><span class="nv">d</span><span class="p">]])</span>
</code></pre></div>
<p>The pattern is pretty clear, but there isn't an obvious annotation that would capture the type
of all variadic possibilities. Instead, <code>core.typed</code> suggests</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="nf">All</span> <span class="p">[</span><span class="nv">x</span> <span class="nv">y</span> <span class="nv">b</span> <span class="nv">...</span><span class="p">]</span> <span class="p">[[</span><span class="nv">x</span> <span class="nb">-> </span><span class="nv">y</span><span class="p">]</span> <span class="p">[</span><span class="nv">b</span> <span class="nv">...</span> <span class="nv">b</span> <span class="nb">-> </span><span class="nv">x</span><span class="p">]</span> <span class="nb">-> </span><span class="p">[</span><span class="nv">b</span> <span class="nv">...</span> <span class="nv">b</span> <span class="nb">-> </span><span class="nv">y</span><span class="p">]])</span>
</code></pre></div>
<p>which means the 2nd and succeeding functions all have the same signature. Even this limited composition
type challenges <code>core.typed</code> if the functions are even slightly polymorphic. E.g.</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="nf">t/cf</span> <span class="p">(</span><span class="nb">comp identity </span><span class="nv">identity</span><span class="p">))</span>
</code></pre></div>
<p>will fail with an error, roughly like:</p>
<div class="highlight"><pre><span></span><code> <span class="nv">user></span> <span class="p">(</span><span class="nf">t/cf</span> <span class="p">(</span><span class="nb">comp identity </span><span class="nv">identity</span><span class="p">)</span>
<span class="nv">Type</span> <span class="nv">Error</span> <span class="nv">polymorphic</span> <span class="nv">function</span> <span class="nb">comp </span><span class="nv">could</span> <span class="nb">not </span><span class="nv">be</span> <span class="nv">applied</span> <span class="nv">to</span> <span class="nv">arguments</span><span class="err">:</span>
<span class="nv">Polymorphic</span> <span class="nv">Variables</span><span class="err">:</span> <span class="nv">a</span> <span class="nv">b</span> <span class="nv">c</span>
<span class="nv">Domains</span><span class="err">:</span> <span class="p">[</span><span class="nv">b</span> <span class="nb">-> </span><span class="nv">c</span><span class="p">]</span> <span class="p">[</span><span class="nv">a</span> <span class="nb">-> </span><span class="nv">b</span><span class="p">]</span>
<span class="nv">Arguments</span><span class="err">:</span> <span class="p">(</span><span class="nf">t/All</span> <span class="p">[</span><span class="nv">x</span><span class="p">]</span> <span class="p">[</span><span class="nv">x</span> <span class="nb">-> </span><span class="nv">x</span><span class="p">])</span> <span class="p">(</span><span class="nf">t/All</span> <span class="p">[</span><span class="nv">x</span><span class="p">]</span> <span class="p">[</span><span class="nv">x</span> <span class="nb">-> </span><span class="nv">x</span><span class="p">])</span>
</code></pre></div>
<p>As noted above, we are allowed to instantiate a specific version of the polymorphic type, so</p>
<div class="highlight"><pre><span></span><code> <span class="nv">user></span> <span class="p">(</span><span class="nf">t/cf</span> <span class="p">(</span><span class="nf">t/inst</span> <span class="nb">identity </span><span class="nv">Long</span><span class="p">))</span>
<span class="p">[</span><span class="nv">Long</span> <span class="nb">-> </span><span class="nv">Long</span><span class="p">]</span>
<span class="nv">user></span> <span class="p">(</span><span class="nf">t/cf</span> <span class="p">(</span><span class="nb">comp </span><span class="p">(</span><span class="nf">t/inst</span> <span class="nb">identity </span><span class="nv">Long</span><span class="p">)</span> <span class="p">(</span><span class="nf">t/inst</span> <span class="nb">identity </span><span class="nv">Long</span><span class="p">)))</span>
<span class="p">[</span><span class="nv">Long</span> <span class="nb">-> </span><span class="nv">Long</span><span class="p">]</span>
</code></pre></div>
<p>In summary:</p>
<ol>
<li><code>(comp identity identity)</code> fails, because identity is polymorphic</li>
<li><code>(comp (t/inst identity Long) (t/inst identity Long))</code> succeeds, because we have
instantiated a specific type.</li>
<li><code>(comp (t/inst identity Long) (t/inst identity Long) (t/inst identity Long))</code>
fails again, because comp is called with three arguments.</li>
</ol>
<p>Haskell's type inference is of course more sophisticated, but it also makes the problem easier
by eschewing variadics in favor of currying. There's one composition function, which takes
one argument and happens to return another function:</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="o">.</span><span class="p">)</span> <span class="ow">::</span> <span class="p">(</span><span class="n">b</span> <span class="ow">-></span> <span class="n">c</span><span class="p">)</span> <span class="ow">-></span> <span class="p">(</span><span class="n">a</span> <span class="ow">-></span> <span class="n">b</span><span class="p">)</span> <span class="ow">-></span> <span class="n">a</span> <span class="ow">-></span> <span class="n">c</span>
</code></pre></div>
<p>There are thus at least two reasons why Haskell can easily deduce:</p>
<div class="highlight"><pre><span></span><code> <span class="n">id</span> <span class="ow">::</span> <span class="n">a</span> <span class="ow">-></span> <span class="n">a</span>
<span class="p">(</span><span class="n">id</span> <span class="o">.</span> <span class="n">id</span> <span class="o">.</span> <span class="n">id</span><span class="p">)</span> <span class="ow">::</span> <span class="n">c</span> <span class="ow">-></span> <span class="n">c</span>
</code></pre></div>
<p>First, it does non-local type inference; second, it doesn't have to deal with variadic functions.</p>
<h4>A slightly better variadic comp</h4>
<p>We can't do much about local type inference, but we can write a comp that lets <code>core.typed</code> check
an arbitrary series of composed transformations. The trick, as usual when we need to go easy on the
type checker, is to use a macro to simplify what it needs to check:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defmacro </span><span class="nv">comp*</span> <span class="p">[</span><span class="o">&</span> <span class="p">[</span><span class="nv">f1</span> <span class="nv">f2</span> <span class="o">&</span> <span class="nv">fs</span><span class="p">]]</span>
<span class="p">(</span><span class="nb">if-not </span><span class="nv">fs</span>
<span class="o">`</span><span class="p">(</span><span class="nb">comp </span><span class="o">~</span><span class="nv">f1</span> <span class="o">~</span><span class="nv">f2</span><span class="p">)</span>
<span class="o">`</span><span class="p">(</span><span class="nb">comp </span><span class="o">~</span><span class="nv">f1</span> <span class="p">(</span><span class="nf">comp*</span> <span class="o">~</span><span class="nv">f2</span> <span class="o">~@</span><span class="nv">fs</span><span class="p">))))</span>
</code></pre></div>
<p>so <code>(comp* c->d b->c a->b)</code> unwinds to <code>(comp c->d (comp b->c a->b))</code>, and failure #3
now succeeds:</p>
<div class="highlight"><pre><span></span><code> <span class="n">user</span><span class="o">></span> <span class="p">(</span><span class="n">t</span><span class="o">/</span><span class="n">cf</span> <span class="p">(</span><span class="n">comp</span><span class="o">*</span> <span class="p">(</span><span class="n">t</span><span class="o">/</span><span class="n">inst</span> <span class="n">identity</span> <span class="n">Long</span><span class="p">)</span> <span class="p">(</span><span class="n">t</span><span class="o">/</span><span class="n">inst</span> <span class="n">identity</span> <span class="n">Long</span><span class="p">)</span> <span class="p">(</span><span class="n">t</span><span class="o">/</span><span class="n">inst</span> <span class="n">identity</span> <span class="n">Long</span><span class="p">)))</span>
<span class="p">[</span><span class="n">Long</span> <span class="o">-></span> <span class="n">Long</span><span class="p">]</span>
</code></pre></div>
<p>Now, the general transducer <code>(Transducer a b)</code> is of course polymorphic, but even a specific-
seeming one like <code>t-repn</code> (which is <code>(Transducer Long Long)</code>), still has that
<code>(All [r] ...)</code>, polymorphism in the type being reduced to.
Thus, <code>(comp t-repn t-repn)</code> will fail with the now familiar "could not be applied to arguments" error.</p>
<p>Fortunately, we know that the transducer doesn't care at all about <code>r</code>, so, without loss of
actual generality, we can lie:</p>
<div class="highlight"><pre><span></span><code> <span class="nv">user></span> <span class="p">(</span><span class="nf">t/cf</span> <span class="p">(</span><span class="nb">comp </span><span class="p">(</span><span class="nf">t/inst</span> <span class="nv">t-repn</span> <span class="nv">Any</span><span class="p">)</span> <span class="p">(</span><span class="nf">t/inst</span> <span class="nv">t-repn</span> <span class="nv">Any</span><span class="p">)))</span>
<span class="p">[[</span><span class="nv">Any</span> <span class="nv">Number</span> <span class="nb">-> </span><span class="nv">Any</span><span class="p">]</span> <span class="nb">-> </span><span class="p">[</span><span class="nv">Any</span> <span class="nv">Number</span> <span class="nb">-> </span><span class="nv">Any</span><span class="p">]]</span>
</code></pre></div>
<p>Having lied, we can make it right again by casting the polymorphism back in:</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="nf">t/ann</span> <span class="o">^</span><span class="ss">:no-check</span> <span class="nv">lie-again</span>
<span class="p">(</span><span class="nf">t/All</span> <span class="p">[</span><span class="nv">a</span> <span class="nv">b</span><span class="p">]</span> <span class="p">[[[</span><span class="nv">t/Any</span> <span class="nv">a</span> <span class="nb">-> </span><span class="nv">t/Any</span><span class="p">]</span> <span class="nb">-> </span><span class="p">[</span><span class="nv">t/Any</span> <span class="nv">b</span> <span class="nb">-> </span><span class="nv">t/Any</span><span class="p">]]</span> <span class="nv">-></span>
<span class="p">(</span><span class="nf">t/All</span> <span class="p">[</span><span class="nv">r</span><span class="p">]</span> <span class="p">[[</span><span class="nv">r</span> <span class="nv">a</span> <span class="nb">-> </span><span class="nv">r</span><span class="p">]</span> <span class="nb">-> </span><span class="p">[</span><span class="nv">r</span> <span class="nv">b</span> <span class="nb">-> </span><span class="nv">r</span><span class="p">]])]))</span>
<span class="p">(</span><span class="k">def </span><span class="nv">lie-again</span> <span class="nv">identity</span><span class="p">)</span>
</code></pre></div>
<p>so that:</p>
<div class="highlight"><pre><span></span><code> <span class="nv">user></span> <span class="p">(</span><span class="nf">t/cf</span> <span class="p">(</span><span class="nf">lie-again</span><span class="p">(</span><span class="nb">comp </span><span class="p">(</span><span class="nf">t/inst</span> <span class="nv">t-repn</span> <span class="nv">t/Any</span><span class="p">)</span> <span class="p">(</span><span class="nf">t/inst</span> <span class="nv">t-repn</span> <span class="nv">t/Any</span><span class="p">))))</span>
<span class="p">(</span><span class="nf">t/All</span> <span class="p">[</span><span class="nv">r</span><span class="p">]</span> <span class="p">[[</span><span class="nv">r</span> <span class="nv">Number</span> <span class="nb">-> </span><span class="nv">r</span><span class="p">]</span> <span class="nb">-> </span><span class="p">[</span><span class="nv">r</span> <span class="nv">Number</span> <span class="nb">-> </span><span class="nv">r</span><span class="p">]])</span>
</code></pre></div>
<p>Now we combine the two lies and the de-variadification into a single macro</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defmacro </span><span class="nv">compt</span> <span class="p">[</span><span class="o">&</span> <span class="nv">tds</span><span class="p">]</span>
<span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">its</span> <span class="p">(</span><span class="nb">map </span><span class="o">#</span><span class="p">(</span><span class="nb">list </span><span class="ss">'t/inst</span> <span class="nv">%</span> <span class="ss">'t/Any</span><span class="p">)</span> <span class="nv">tds</span><span class="p">)]</span>
<span class="o">`</span><span class="p">(</span><span class="nf">lie-again</span> <span class="p">(</span><span class="nf">comp*</span> <span class="o">~@</span><span class="nv">its</span><span class="p">))))</span>
</code></pre></div>
<p>and, as demonstrated way above, we can compose transducers. Now you know why we need <code>compt</code>.</p>
<h4>It's far prettier in Haskell</h4>
<p>There's not too much to say about this. While the <code>Transducer</code> type definition</p>
<div class="highlight"><pre><span></span><code> <span class="kr">type</span> <span class="kt">ReducingFn</span> <span class="n">a</span> <span class="n">r</span> <span class="ow">=</span> <span class="n">r</span> <span class="ow">-></span> <span class="n">a</span> <span class="ow">-></span> <span class="n">r</span>
<span class="kr">type</span> <span class="kt">Transducer</span> <span class="n">a</span> <span class="n">b</span> <span class="ow">=</span> <span class="n">forall</span> <span class="n">r</span> <span class="o">.</span> <span class="kt">ReducingFn</span> <span class="n">a</span> <span class="n">r</span> <span class="ow">-></span> <span class="kt">ReducingFn</span> <span class="n">b</span> <span class="n">r</span>
</code></pre></div>
<p>is essentially the same as in Clojure, everything else is easier. We can write fully general transducers</p>
<div class="highlight"><pre><span></span><code> <span class="n">t_dub</span> <span class="ow">::</span> <span class="kt">Num</span> <span class="n">a</span> <span class="ow">=></span> <span class="kt">Transducer</span> <span class="n">a</span> <span class="n">a</span>
<span class="n">t_dub</span> <span class="n">f</span> <span class="n">r</span> <span class="n">b</span> <span class="ow">=</span> <span class="n">f</span> <span class="n">r</span> <span class="p">(</span><span class="mi">2</span> <span class="o">*</span> <span class="n">b</span><span class="p">)</span>
<span class="n">t_rep</span> <span class="ow">::</span> <span class="kt">Transducer</span> <span class="n">a</span> <span class="n">a</span>
<span class="n">t_rep</span> <span class="n">f</span> <span class="n">r</span> <span class="n">b</span> <span class="ow">=</span> <span class="n">f</span> <span class="p">(</span><span class="n">f</span> <span class="n">r</span> <span class="n">b</span><span class="p">)</span> <span class="n">b</span>
<span class="n">t_parse</span> <span class="ow">::</span> <span class="kt">Read</span> <span class="n">a</span> <span class="ow">=></span> <span class="kt">Transducer</span> <span class="n">a</span> <span class="kt">String</span>
<span class="n">t_parse</span> <span class="n">f</span> <span class="n">r</span> <span class="n">s</span> <span class="ow">=</span> <span class="n">f</span> <span class="n">r</span> <span class="o">$</span> <span class="n">read</span> <span class="n">s</span>
<span class="n">t_root</span> <span class="ow">::</span> <span class="kt">Transducer</span> <span class="kt">Double</span> <span class="kt">Integer</span>
<span class="n">t_root</span> <span class="n">f</span> <span class="n">r</span> <span class="n">i</span> <span class="ow">=</span> <span class="n">f</span> <span class="n">r</span> <span class="o">$</span> <span class="n">pow</span> <span class="mf">2.0</span> <span class="p">(</span><span class="mf">1.0</span><span class="o">/</span><span class="p">(</span><span class="n">fromInteger</span> <span class="n">i</span><span class="p">))</span>
</code></pre></div>
<p>and compose them with no special effort.</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="n">t_parse</span> <span class="o">.</span> <span class="n">t_rep</span> <span class="o">.</span> <span class="n">t_dub</span> <span class="o">.</span> <span class="n">t_root</span><span class="p">)</span> <span class="ow">::</span> <span class="kt">ReducingFn</span> <span class="kt">Double</span> <span class="n">r</span> <span class="ow">-></span> <span class="kt">ReducingFn</span> <span class="kt">String</span> <span class="n">r</span>
<span class="p">(</span><span class="n">foldl</span> <span class="p">((</span><span class="n">t_parse</span> <span class="o">.</span> <span class="n">t_rep</span> <span class="o">.</span> <span class="n">t_dub</span> <span class="o">.</span> <span class="n">t_root</span><span class="p">)</span> <span class="p">(</span><span class="o">+</span><span class="p">))</span> <span class="mf">0.0</span> <span class="p">[</span><span class="s">"1"</span><span class="p">,</span><span class="s">"2"</span><span class="p">,</span><span class="s">"3"</span><span class="p">])</span> <span class="ow">::</span> <span class="kt">Double</span>
</code></pre></div>
<h4>Scala is not Haskell either</h4>
<p>Let's start out unambitiously. Trying to compose the <code>identity</code> function in Scala
seems to run into the same problem as in Clojure</p>
<div class="highlight"><pre><span></span><code> <span class="n">scala</span><span class="o">></span> <span class="n">identity</span> <span class="n">_</span> <span class="n">compose</span> <span class="n">identity</span>
<span class="o"><</span><span class="n">console</span><span class="o">>:</span> <span class="n">error</span><span class="p">:</span> <span class="k">type</span> <span class="nc">mismatch</span><span class="p">;</span>
<span class="n">found</span> <span class="p">:</span> <span class="nc">Nothing</span> <span class="o">=></span> <span class="nc">Nothing</span>
<span class="n">required</span><span class="p">:</span> <span class="nc">A</span> <span class="o">=></span> <span class="nc">Nothing</span>
<span class="n">identity</span> <span class="n">_</span> <span class="n">compose</span> <span class="n">identity</span>
<span class="n">^</span>
</code></pre></div>
<p>but what's going on here is a slightly different problem. While <code>identity</code> is defined
polymorphically as <code>identity[A](a:A):A</code>, by the time we see it in the REPL, all type
information has been erased. (We deliberately erased it, by instantiating the function with
<code>_</code> in a context where no other type information is available.)</p>
<p>If we put it back explicitly, composition works, and the composed
function can itself be used polymorphically:</p>
<div class="highlight"><pre><span></span><code> <span class="n">scala</span><span class="o">></span> <span class="k">def</span> <span class="nf">ia</span><span class="p">[</span><span class="nc">A</span><span class="p">]</span> <span class="o">=</span> <span class="n">identity</span><span class="p">[</span><span class="nc">A</span><span class="p">]</span> <span class="n">_</span> <span class="n">compose</span> <span class="n">identity</span><span class="p">[</span><span class="nc">A</span><span class="p">]</span>
<span class="n">ia</span><span class="p">:</span> <span class="p">[</span><span class="nc">A</span><span class="p">]</span><span class="o">=></span> <span class="nc">A</span> <span class="o">=></span> <span class="nc">A</span>
<span class="n">scala</span><span class="o">></span> <span class="n">ia</span><span class="p">(</span><span class="mi">3</span><span class="p">)</span>
<span class="n">res39</span><span class="p">:</span> <span class="nc">Int</span> <span class="o">=</span> <span class="mi">3</span>
<span class="n">scala</span><span class="o">></span> <span class="n">ia</span><span class="p">(</span><span class="mf">3.0</span><span class="p">)</span>
<span class="n">res40</span><span class="p">:</span> <span class="nc">Double</span> <span class="o">=</span> <span class="mf">3.0</span>
</code></pre></div>
<p>We can chain compositions in a manner that looks a bit like Haskell</p>
<div class="highlight"><pre><span></span><code> <span class="nv">scala></span> <span class="nv">identity</span><span class="p">[</span><span class="nv">Int</span><span class="p">]</span> <span class="nv">_</span> <span class="nv">compose</span> <span class="nv">identity</span><span class="p">[</span><span class="nv">Int</span><span class="p">]</span> <span class="nv">compose</span> <span class="nv">identity</span><span class="p">[</span><span class="nv">Int</span><span class="p">]</span>
<span class="nv">res33</span><span class="err">:</span> <span class="nv">Int</span> <span class="nv">=></span> <span class="nv">Int</span> <span class="nb">= </span><span class="nv"><function1></span>
</code></pre></div>
<p>but is really quite different. Scala's <code>compose</code> is a method of the <code>Function1</code> class rather than
a standalone function, as this less sugary rendition makes clear:</p>
<div class="highlight"><pre><span></span><code><span class="nv">scala></span> <span class="p">(</span><span class="nf">identity</span><span class="p">[</span><span class="nv">Int</span><span class="p">]</span> <span class="nv">_</span><span class="p">)</span><span class="nv">.compose</span><span class="p">(</span><span class="nf">identity</span><span class="p">[</span><span class="nv">Int</span><span class="p">]</span> <span class="nv">_</span><span class="p">)</span><span class="nv">.compose</span><span class="p">(</span><span class="nf">identity</span><span class="p">[</span><span class="nv">Int</span><span class="p">]</span> <span class="nv">_</span><span class="p">)</span>
<span class="nv">res36</span><span class="err">:</span> <span class="nv">Int</span> <span class="nv">=></span> <span class="nv">Int</span> <span class="nb">= </span><span class="nv"><function1></span>
</code></pre></div>
<p>That's OK. Scala's OO nature gives us a set of tools completely different from those we
got from Clojure's homoiconicity, but they can be deployed for qualitatively similar purposes -
in this case, safe and reasonably attractive transducers.</p>
<p>In fact,
I've seen transducers in Scala implemented as a trait, which then delegates to a virtual <code>transform</code> method, e.g.</p>
<div class="highlight"><pre><span></span><code> <span class="k">type</span> <span class="nc">ReducingFn</span><span class="p">[</span><span class="nc">A</span><span class="p">,</span> <span class="nc">R</span><span class="p">]</span> <span class="o">=</span> <span class="p">(</span><span class="nc">R</span><span class="p">,</span><span class="nc">A</span><span class="p">)</span> <span class="o">=></span> <span class="nc">R</span>
<span class="k">trait</span> <span class="nc">TransducerT</span><span class="p">[</span><span class="nc">A</span><span class="p">,</span> <span class="nc">B</span><span class="p">]</span> <span class="p">{</span>
<span class="k">def</span> <span class="nf">transform</span><span class="p">[</span><span class="nc">R</span><span class="p">]:</span> <span class="nc">ReducingFn</span><span class="p">[</span><span class="nc">A</span><span class="p">,</span> <span class="nc">R</span><span class="p">]</span> <span class="o">=></span> <span class="nc">ReducingFn</span><span class="p">[</span><span class="nc">B</span><span class="p">,</span> <span class="nc">R</span><span class="p">]</span>
<span class="p">...</span>
<span class="p">}</span>
</code></pre></div>
<p>To make <code>TransducerT</code> act more like a function, we would add an <code>apply</code> method, and to
make chained composition pretty, a <code>compose</code> method:</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">apply</span><span class="p">[</span><span class="nc">R</span><span class="p">]</span> <span class="o">=</span> <span class="n">transform</span> <span class="n">_</span>
<span class="k">def</span> <span class="nf">compose</span><span class="p">[</span><span class="nc">C</span><span class="p">](</span><span class="n">t2</span><span class="p">:</span> <span class="nc">Transducer</span><span class="p">[</span><span class="nc">C</span><span class="p">,</span> <span class="nc">A</span><span class="p">]):</span> <span class="nc">Transducer</span><span class="p">[</span><span class="nc">C</span><span class="p">,</span> <span class="nc">B</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span>
<span class="kd">val</span> <span class="n">t1</span> <span class="o">=</span> <span class="bp">this</span>
<span class="k">new</span> <span class="nc">Transducer</span><span class="p">[</span><span class="nc">C</span><span class="p">,</span> <span class="nc">B</span><span class="p">]</span> <span class="p">{</span>
<span class="k">override</span> <span class="k">def</span> <span class="nf">transform</span><span class="p">[</span><span class="nc">R</span><span class="p">]:</span> <span class="p">(</span><span class="nc">ReducingFn</span><span class="p">[</span><span class="nc">C</span><span class="p">,</span> <span class="nc">R</span><span class="p">])</span> <span class="o">=></span> <span class="nc">ReducingFn</span><span class="p">[</span><span class="nc">B</span><span class="p">,</span> <span class="nc">R</span><span class="p">]</span> <span class="o">=</span> <span class="n">rf</span> <span class="o">=></span> <span class="n">t1</span><span class="p">(</span><span class="n">t2</span><span class="p">(</span><span class="n">rf</span><span class="p">))</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<p>This will work, but we it's more amusing to try to define transducers as existential types,
using the semi-mystical <code>forSome</code> annotation, which Scala uses for the same purpose as
Haskell's <code>forall</code> and typed Clojure's <code>All</code>:</p>
<div class="highlight"><pre><span></span><code> <span class="k">type</span> <span class="nc">ReducingFn</span><span class="p">[</span><span class="o">-</span><span class="nc">A</span><span class="p">,</span> <span class="nc">R</span><span class="p">]</span> <span class="o">=</span> <span class="p">(</span><span class="nc">R</span><span class="p">,</span><span class="nc">A</span><span class="p">)</span> <span class="o">=></span> <span class="nc">R</span>
<span class="k">type</span> <span class="nc">Transducer3</span><span class="p">[</span><span class="o">+</span><span class="nc">A</span><span class="p">,</span><span class="o">-</span><span class="nc">B</span><span class="p">,</span><span class="nc">R</span><span class="p">]</span> <span class="o">=</span> <span class="nc">ReducingFn</span><span class="p">[</span><span class="nc">A</span><span class="p">,</span><span class="nc">R</span><span class="p">]</span> <span class="o">=></span> <span class="nc">ReducingFn</span><span class="p">[</span><span class="nc">B</span><span class="p">,</span><span class="nc">R</span><span class="p">]</span>
<span class="k">type</span> <span class="nc">Transducer</span><span class="p">[</span><span class="o">+</span><span class="nc">A</span><span class="p">,</span><span class="o">-</span><span class="nc">B</span><span class="p">]</span> <span class="o">=</span> <span class="nc">Transducer3</span><span class="p">[</span><span class="nc">A</span><span class="p">,</span><span class="nc">B</span><span class="p">,</span><span class="nc">R</span> <span class="n">forSome</span> <span class="p">{</span><span class="k">type</span> <span class="nc">R</span><span class="p">}]</span>
</code></pre></div>
<p>(To be honest, I don't know if it's possible to do this without the intermediate ternary type.)</p>
<p>To assist in creating simple transducers that just modify individual elements of cargo, we
write <code>mapping</code>, again with an intermediate ternary type,</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">mapping3</span><span class="p">[</span><span class="nc">A</span><span class="p">,</span><span class="nc">B</span><span class="p">,</span><span class="nc">R</span><span class="p">](</span><span class="n">f</span> <span class="p">:</span> <span class="nc">A</span> <span class="o">=></span> <span class="nc">B</span><span class="p">)</span> <span class="p">:</span> <span class="nc">Transducer3</span><span class="p">[</span><span class="nc">B</span><span class="p">,</span><span class="nc">A</span><span class="p">,</span><span class="nc">R</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span> <span class="n">rf</span> <span class="p">:</span> <span class="nc">ReducingFn</span><span class="p">[</span><span class="nc">B</span><span class="p">,</span><span class="nc">R</span><span class="p">]</span> <span class="o">=></span>
<span class="p">(</span><span class="n">r</span> <span class="p">:</span> <span class="nc">R</span> <span class="p">,</span><span class="n">a</span><span class="p">:</span><span class="nc">A</span><span class="p">)</span> <span class="o">=></span> <span class="n">rf</span><span class="p">(</span><span class="n">r</span><span class="p">,</span><span class="n">f</span><span class="p">(</span><span class="n">a</span><span class="p">))}</span>
<span class="k">def</span> <span class="nf">mapping</span><span class="p">[</span><span class="nc">A</span><span class="p">,</span><span class="nc">B</span><span class="p">]</span> <span class="o">=</span> <span class="n">map3</span><span class="p">[</span><span class="nc">A</span><span class="p">,</span><span class="nc">B</span><span class="p">,</span><span class="nc">R</span> <span class="n">forSome</span> <span class="p">{</span><span class="k">type</span> <span class="nc">R</span><span class="p">}]</span> <span class="n">_</span>
</code></pre></div>
<p>which we use like this:</p>
<div class="highlight"><pre><span></span><code> <span class="kd">val</span> <span class="n">t_parsei</span><span class="p">:</span> <span class="nc">Transducer</span><span class="p">[</span><span class="nc">Int</span><span class="p">,</span> <span class="nc">String</span><span class="p">]</span> <span class="o">=</span> <span class="n">mapping</span> <span class="p">{</span> <span class="n">s</span><span class="p">:</span> <span class="nc">String</span> <span class="o">=></span> <span class="n">s</span><span class="p">.</span><span class="n">toInt</span><span class="p">}</span>
<span class="k">def</span> <span class="nf">t_root2</span> <span class="p">:</span> <span class="nc">Transducer</span><span class="p">[</span><span class="nc">Double</span><span class="p">,</span><span class="nc">Int</span><span class="p">]</span> <span class="o">=</span> <span class="n">mapping</span> <span class="p">{</span> <span class="n">i</span> <span class="p">:</span> <span class="nc">Int</span> <span class="o">=></span> <span class="nc">Math</span><span class="p">.</span><span class="n">pow</span><span class="p">(</span><span class="mf">2.0</span><span class="p">,</span><span class="mf">1.0</span><span class="o">/</span><span class="n">i</span><span class="p">)}</span>
</code></pre></div>
<p>Nice, so far, but let's try reducing something easy:</p>
<div class="highlight"><pre><span></span><code> <span class="n">scala</span><span class="o">></span> <span class="n">println</span><span class="p">(</span><span class="nc">List</span><span class="p">(</span><span class="s">"1"</span><span class="p">,</span><span class="s">"2"</span><span class="p">,</span><span class="s">"3"</span><span class="p">).</span><span class="n">foldLeft</span><span class="p">[</span><span class="nc">Int</span><span class="p">](</span><span class="mi">0</span><span class="p">)(</span><span class="n">t_parsei</span> <span class="p">(</span><span class="n">_+_</span><span class="p">)))</span>
<span class="o"><</span><span class="n">console</span><span class="o">>:</span><span class="mi">12</span><span class="p">:</span> <span class="n">error</span><span class="p">:</span> <span class="k">type</span> <span class="nc">mismatch</span><span class="p">;</span>
<span class="n">found</span> <span class="p">:</span> <span class="nc">Int</span>
<span class="n">required</span><span class="p">:</span> <span class="nc">String</span>
</code></pre></div>
<p>Huh? Maybe it's having trouble understanding <code>_+_</code>:</p>
<div class="highlight"><pre><span></span><code> <span class="n">scala</span><span class="o">></span> <span class="n">println</span><span class="p">(</span><span class="nc">List</span><span class="p">(</span><span class="s">"1"</span><span class="p">,</span><span class="s">"2"</span><span class="p">,</span><span class="s">"3"</span><span class="p">).</span><span class="n">foldLeft</span><span class="p">[</span><span class="nc">Int</span><span class="p">](</span><span class="mi">0</span><span class="p">)(</span><span class="n">t_parsei</span> <span class="p">{(</span><span class="n">i</span><span class="p">:</span><span class="nc">Int</span><span class="p">,</span><span class="n">j</span><span class="p">:</span><span class="nc">Int</span><span class="p">)</span><span class="o">=></span><span class="n">i</span><span class="o">+</span><span class="n">j</span><span class="p">}))</span>
<span class="o"><</span><span class="n">console</span><span class="o">>:</span><span class="mi">12</span><span class="p">:</span> <span class="n">error</span><span class="p">:</span> <span class="k">type</span> <span class="nc">mismatch</span><span class="p">;</span>
<span class="n">found</span> <span class="p">:</span> <span class="p">(</span><span class="nc">Int</span><span class="p">,</span> <span class="nc">Int</span><span class="p">)</span> <span class="o">=></span> <span class="nc">Int</span>
<span class="n">required</span><span class="p">:</span> <span class="nc">TransducerExistential</span><span class="p">.</span><span class="nc">ReducingFn</span><span class="p">[</span><span class="nc">Int</span><span class="p">,</span><span class="nc">R</span> <span class="n">forSome</span> <span class="p">{</span> <span class="k">type</span> <span class="nc">R</span> <span class="p">}]</span>
<span class="p">(</span><span class="n">which</span> <span class="n">expands</span> <span class="n">to</span><span class="p">)</span> <span class="p">(</span><span class="nc">R</span> <span class="n">forSome</span> <span class="p">{</span> <span class="k">type</span> <span class="nc">R</span> <span class="p">},</span> <span class="nc">Int</span><span class="p">)</span> <span class="o">=></span> <span class="nc">R</span> <span class="n">forSome</span> <span class="p">{</span> <span class="k">type</span> <span class="nc">R</span> <span class="p">}</span>
</code></pre></div>
<p>Different but not better. Maybe it will work to cast explicitly to the ternary type:</p>
<div class="highlight"><pre><span></span><code> <span class="n">scala</span><span class="o">></span> <span class="n">println</span><span class="p">(</span><span class="nc">List</span><span class="p">(</span><span class="s">"1"</span><span class="p">,</span><span class="s">"2"</span><span class="p">,</span><span class="s">"3"</span><span class="p">).</span><span class="n">foldLeft</span><span class="p">[</span><span class="nc">Int</span><span class="p">](</span><span class="mi">0</span><span class="p">)(</span><span class="n">t_parsei</span><span class="p">.</span><span class="k">asInstanceOf</span><span class="p">[</span><span class="nc">Transducer3</span><span class="p">[</span><span class="nc">Int</span><span class="p">,</span><span class="nc">String</span><span class="p">,</span><span class="nc">Int</span><span class="p">]]</span> <span class="p">(</span><span class="n">_+_</span><span class="p">)))</span>
<span class="mi">6</span>
</code></pre></div>
<p>But that's a little ugly, and whenever something is even slightly ugly in Scala, you introduce an <code>implicit</code> to
make it confusing instead. Hence</p>
<div class="highlight"><pre><span></span><code> <span class="k">implicit</span> <span class="k">class</span> <span class="nc">TransducerOps</span><span class="p">[</span><span class="nc">A</span><span class="p">,</span><span class="nc">B</span><span class="p">](</span><span class="n">t1</span> <span class="p">:</span> <span class="nc">Transducer</span><span class="p">[</span><span class="nc">A</span><span class="p">,</span><span class="nc">B</span><span class="p">])</span> <span class="p">{</span>
<span class="k">def</span> <span class="nf">transform</span><span class="p">[</span><span class="nc">R</span><span class="p">](</span><span class="n">rf</span> <span class="p">:</span> <span class="nc">ReducingFn</span><span class="p">[</span><span class="nc">A</span><span class="p">,</span><span class="nc">R</span><span class="p">])</span> <span class="o">=</span> <span class="n">t1</span><span class="p">.</span><span class="k">asInstanceOf</span><span class="p">[</span><span class="nc">Transducer3</span><span class="p">[</span><span class="nc">A</span><span class="p">,</span><span class="nc">B</span><span class="p">,</span><span class="nc">R</span><span class="p">]](</span><span class="n">rf</span><span class="p">)</span>
<span class="p">}</span>
</code></pre></div>
<p>to coerce automatically after hoisting the <code>Transducer</code> into
the <code>TransducerOp</code> container class.</p>
<p>Since we've already crossed the Rubicon,
let's bring some slick Unicode along for the ride:</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">⟐</span><span class="p">[</span><span class="nc">R</span><span class="p">]</span> <span class="o">=</span> <span class="n">transform</span><span class="p">[</span><span class="nc">R</span><span class="p">]</span> <span class="n">_</span>
</code></pre></div>
<p>Now</p>
<div class="highlight"><pre><span></span><code> <span class="n">scala</span><span class="o">></span> <span class="n">println</span><span class="p">(</span><span class="nc">List</span><span class="p">(</span><span class="s">"1"</span><span class="p">,</span><span class="s">"2"</span><span class="p">,</span><span class="s">"3"</span><span class="p">).</span><span class="n">foldLeft</span><span class="p">[</span><span class="nc">Int</span><span class="p">](</span><span class="mi">0</span><span class="p">)(</span><span class="n">t_parsei</span> <span class="n">⟐</span> <span class="p">(</span><span class="n">_+_</span><span class="p">)))</span>
<span class="mi">6</span>
</code></pre></div>
<p>Finally, we're going to want chained function composition, so let's put a method
for that, plus a nifty symbol, into the implicit class</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">compose</span><span class="p">[</span><span class="nc">C</span><span class="p">](</span><span class="n">t2</span> <span class="p">:</span> <span class="nc">Transducer</span><span class="p">[</span><span class="nc">C</span><span class="p">,</span><span class="nc">A</span><span class="p">])</span> <span class="p">:</span> <span class="nc">Transducer</span><span class="p">[</span><span class="nc">C</span><span class="p">,</span><span class="nc">B</span><span class="p">]</span> <span class="o">=</span> <span class="n">comp</span><span class="p">(</span><span class="n">t1</span><span class="p">,</span><span class="n">t2</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">∘</span><span class="p">[</span><span class="nc">C</span><span class="p">](</span><span class="n">t2</span><span class="p">:</span> <span class="nc">Transducer</span><span class="p">[</span><span class="nc">C</span><span class="p">,</span> <span class="nc">A</span><span class="p">]):</span> <span class="nc">Transducer</span><span class="p">[</span><span class="nc">C</span><span class="p">,</span> <span class="nc">B</span><span class="p">]</span> <span class="o">=</span> <span class="n">compose</span><span class="p">(</span><span class="n">t2</span><span class="p">)</span>
</code></pre></div>
<p>so that:</p>
<div class="highlight"><pre><span></span><code> <span class="n">scala</span><span class="o">></span> <span class="n">println</span><span class="p">(</span><span class="nc">List</span><span class="p">(</span><span class="s">"1"</span><span class="p">,</span> <span class="s">"2"</span><span class="p">,</span> <span class="s">"3"</span><span class="p">).</span><span class="n">foldLeft</span><span class="p">[</span><span class="nc">Double</span><span class="p">](</span><span class="mf">0.0</span><span class="p">)((</span><span class="n">t_parsei</span> <span class="n">∘</span> <span class="n">t_repeat</span> <span class="n">∘</span> <span class="n">t_root2</span><span class="p">)</span> <span class="n">⟐</span> <span class="p">{(</span><span class="n">x</span><span class="p">:</span><span class="nc">Double</span><span class="p">,</span><span class="n">y</span><span class="p">:</span><span class="nc">Double</span><span class="p">)</span> <span class="o">=></span> <span class="n">x</span><span class="o">+</span><span class="n">y</span><span class="p">}))</span>
<span class="mf">9.348269224535935</span>
</code></pre></div>
<p>I suspect that there will be more trickery further down the road, as we flesh out the standard
library of transducer functions. To get <code>sequence</code> to work, I ended up performing multiple
coercions:</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">sequence</span><span class="p">[</span><span class="nc">A</span><span class="p">,</span> <span class="nc">B</span><span class="p">](</span><span class="n">t</span><span class="p">:</span> <span class="nc">Transducer</span><span class="p">[</span><span class="nc">B</span><span class="p">,</span> <span class="nc">A</span><span class="p">],</span> <span class="n">data</span><span class="p">:</span> <span class="nc">Seq</span><span class="p">[</span><span class="nc">A</span><span class="p">]):</span> <span class="nc">Seq</span><span class="p">[</span><span class="nc">B</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span>
<span class="kd">val</span> <span class="n">rf1</span><span class="p">:</span> <span class="nc">ReducingFn</span><span class="p">[</span><span class="nc">B</span><span class="p">,</span> <span class="nc">Seq</span><span class="p">[</span><span class="nc">B</span><span class="p">]]</span> <span class="o">=</span> <span class="p">{</span> <span class="p">(</span><span class="n">r</span><span class="p">,</span> <span class="n">b</span><span class="p">)</span> <span class="o">=></span> <span class="n">r</span> <span class="o">:+</span> <span class="n">b</span><span class="p">}</span>
<span class="kd">val</span> <span class="n">rf2</span><span class="p">:</span> <span class="nc">ReducingFn</span><span class="p">[</span><span class="nc">A</span><span class="p">,</span> <span class="nc">Seq</span><span class="p">[</span><span class="nc">B</span><span class="p">]]</span> <span class="o">=</span> <span class="n">t</span><span class="p">(</span><span class="n">rf1</span><span class="p">.</span><span class="k">asInstanceOf</span><span class="p">[</span><span class="nc">ReducingFn</span><span class="p">[</span><span class="nc">B</span><span class="p">,</span> <span class="nc">R</span> <span class="n">forSome</span> <span class="p">{</span><span class="k">type</span> <span class="nc">R</span><span class="p">}]]).</span><span class="k">asInstanceOf</span><span class="p">[</span><span class="nc">ReducingFn</span><span class="p">[</span><span class="nc">A</span><span class="p">,</span> <span class="nc">Seq</span><span class="p">[</span><span class="nc">B</span><span class="p">]]]</span>
<span class="n">data</span><span class="p">.</span><span class="n">foldLeft</span><span class="p">[</span><span class="nc">Seq</span><span class="p">[</span><span class="nc">B</span><span class="p">]](</span><span class="n">data</span><span class="p">.</span><span class="n">companion</span><span class="p">.</span><span class="n">empty</span><span class="p">.</span><span class="k">asInstanceOf</span><span class="p">[</span><span class="nc">Seq</span><span class="p">[</span><span class="nc">B</span><span class="p">]])(</span><span class="n">rf2</span><span class="p">)</span>
<span class="p">}</span>
<span class="n">scala</span><span class="o">></span> <span class="n">println</span><span class="p">(</span><span class="n">sequence</span><span class="p">(</span><span class="n">t_parsei</span> <span class="n">∘</span> <span class="n">t_repeat</span> <span class="n">∘</span> <span class="n">t_root2</span><span class="p">,</span> <span class="nc">List</span><span class="p">(</span><span class="s">"1"</span><span class="p">,</span> <span class="s">"2"</span><span class="p">,</span> <span class="s">"3"</span><span class="p">)));</span>
<span class="nc">List</span><span class="p">(</span><span class="mf">2.0</span><span class="p">,</span> <span class="mf">2.0</span><span class="p">,</span> <span class="mf">1.4142135623730951</span><span class="p">,</span> <span class="mf">1.4142135623730951</span><span class="p">,</span> <span class="mf">1.2599210498948732</span><span class="p">,</span> <span class="mf">1.2599210498948732</span><span class="p">)</span>
</code></pre></div>
<h4>Conclusions</h4>
<p>The type of one transducer is not obscure, and
it's not much harder to understand than a callback. However,
once you combine several transducers into a working program, the business of
reconciling and checking their types can be challenging. Of the languages I
know, only Haskell handles it gracefully. Building an entire system in Haskell
might be intimidating, but, the transducer bits will be - bracing ourselves for
a word not normally applied to Haskell - easy.</p>
<p>Transducers were invented for and clearly work in unityped Clojure,
but I find myself wondering if they'll be one function abstraction too
far for projects large enough to require many developers, and the
argument that I find them beautiful might not carry the day. I do
believe that a capable type framework would at least reduce the
frequency of bugs, but typed Clojure is not at the point where
telling someone to use it for transducers will obviously improve her or his life.
It does not seem to be the case that a little macro cleverness can
nudge the problem into
the <code>core.typed</code> sweet spot.</p>
<p>It was interesting to play with transducers in Scala, if only because not
many people have. Given the industrial efforts that have gone into Scala and the
centrality of type checking to the language, it's hardly surprising that it does a
better job than typed Clojure. But the margin of victory is slimmer than I would
have expected. Even with the latest release of the IntelliJ plugin, many type errors
didn't show up until a complete compilation. In general, once you get to
<code>forSome</code> and its ilk, there isn't a wealth of straightforward advice available.
(Hie thee, of course, to the Twitter-curated
<a href="https://twitter.github.io/scala_school/">tutorials</a>, which are about as good as it gets.)</p>
<div class="footnote">
<hr>
<ol>
<li id="fn:riposte">
<p>I'm pretty sure I heard someone say this. <a class="footnote-backref" href="#fnref:riposte" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
</ol>
</div>Yak herding for misers - wrangling hundreds of AWS Instances with Clojure2014-10-16T00:00:00-04:002014-10-16T00:00:00-04:00Peter Fraenkeltag:blog.podsnap.com,2014-10-16:/yakherd.html<p><img alt="yak herding" src="http://gallery.photo.net/photo/10889291-md.jpg"></p>
<p>Back in August, I wrote
<a href="http://blog.podsnap.com/girder.html">two</a>
<a href="http://blog.podsnap.com/girder2.html">posts</a>
about an experimental framework for distributed functional programming
called <a href="http://github.com/pnf/girder">Girder</a>. The idea, in summary,
was to make distributed code look as much as possible like ordinary
Clojure, as opposed to structuring it explicitly around message passing
(as in the actor model) or data flows (as in map/reduce, storm, et al).
As I say, it was an experiment, but it was also something a <em>cri de coeur</em>
(a mini-manifesto, if you will)
against extraneous impingements on my code. Anyway,
it sounds interesting, go back and read the posts, but you don't need to …</p><p><img alt="yak herding" src="http://gallery.photo.net/photo/10889291-md.jpg"></p>
<p>Back in August, I wrote
<a href="http://blog.podsnap.com/girder.html">two</a>
<a href="http://blog.podsnap.com/girder2.html">posts</a>
about an experimental framework for distributed functional programming
called <a href="http://github.com/pnf/girder">Girder</a>. The idea, in summary,
was to make distributed code look as much as possible like ordinary
Clojure, as opposed to structuring it explicitly around message passing
(as in the actor model) or data flows (as in map/reduce, storm, et al).
As I say, it was an experiment, but it was also something a <em>cri de coeur</em>
(a mini-manifesto, if you will)
against extraneous impingements on my code. Anyway,
it sounds interesting, go back and read the posts, but you don't need to
understand them in depth to understand today's post.</p>
<p>While it was illuminating to test this supposedly distributed computation framework
on my little MacBook, I knew that, at some point, I would have to run it in the large.
One of the few advantages of corporate serfdom is that the right supplications will
often win you frolicking privileges on the baronial server fields, but we itinerant friars
are stuck with Amazon Web Services. I know it's not fair to complain, but AWS is not really
optimized for my purposes, or, in general, for situations where the following apply:</p>
<ul>
<li>Your distributed system can only be tested realistically on a large number of machines.</li>
<li>Turnkey solutions do not already exist for the system. I.e. not Hadoop or a web application.</li>
<li>Full on HA automation would be overkill, since you're still experimenting.</li>
<li>You want to pay as little as possible, which means bidding in the spot auction market and killing
instances the second you're done with them.</li>
<li>You insist on working in Clojure to the greatest extent possible.</li>
</ul>
<p>If none of these are true, my cockamamie scheme is certainly not the best approach, but, if all or most
of them are, this is definitely superior to the more obvious mix of web console tools and ssh.</p>
<p>I <a href="http://blog.podsnap.com/girder3.html">previously described</a> the necessary AWS setup.
Other than a few short bash scripts, that
phase didn't require any actual code, and much of the work was unintuitive clicking about on the web
console. (Unavoidably. I tried boiling it down to invocations of the AWS command line utility, but,
since Amazon documents everything in terms of their web interface, you would have been completely lost
were anything to go wrong.)</p>
<p>Today is a Clojure day:</p>
<ol>
<li>We lever the previously discussed AWS setup with a Clojure framework to launch instances and spot auction requests
asynchronously, run specific software on the machines and provide clean notification when they're
up and ready for business.</li>
<li>We use <code>core.async</code> to contain and tame the synchronous interface that Amazon gives us.</li>
<li>We use lens-like constructs to simplify interaction with complex configuration data.</li>
<li>We build a utility wrapper that turns most stand-alone Clojure functions into a CLI app that handles
the most common arguments, logging (including an option to log to Redis) and exceptions,
bundling the result into an EDN compliant structure.</li>
</ol>
<p>Step 3 turned out to be a rather lengthy and obsessive digression for me, leading to
<a href="http://blog.podsnap.com/pinhole.hmtl">the</a>
<a href="http://blog.podsnap.com/pinhole2.hmtl">four</a>
<a href="http://blog.podsnap.com/tinhole.hmtl">posts</a>
<a href="http://blog.podsnap.com/vanhole.hmtl">previous</a>
to this one.</p>
<p>The code mentioned below can be found in one of several places on github:</p>
<ul>
<li><a href="https://github.com/pnf/awstools/blob/master/src/acyclic/awstools/core.clj">The Clojure tooling itself</a>.</li>
<li><a href="https://github.com/pnf/clj-utils/blob/master/src/acyclic/utils/pinhole.clj">Lens functionality</a></li>
<li><a href="https://github.com/pnf/clj-utils/blob/master/src/acyclic/utils/cli.clj">The CLI utility</a></li>
</ul>
<h4>Amazonica</h4>
<p><img alt="penthesileia" src="http://www.theoi.com/image/img_penthesileia.jpg"></p>
<p>The main route to AWS is via their Java SDK, around
which <a href="https://github.com/mcohen01/amazonica">amazonica</a>, which provides a complete Clojure wrapper.
In fact, it's better than a wrapper,
because the plain SDK is mind-bogglingly tedious - model citizen in
the <a href="http://steve-yegge.blogspot.com/2006/03/execution-in-kingdom-of-nouns.html">kingdom of nouns</a>.</p>
<p>For example, to bid on an instance in the spot auction market, you call a static method in the
<code>AmazonEC2</code> class:</p>
<div class="highlight"><pre><span></span><code><span class="n">RequestSpotInstancesResult</span>
<span class="nf">requestSpotInstances</span><span class="p">(</span><span class="n">RequestSpotInstancesRequest</span> <span class="n">requestSpotInstancesRequest</span><span class="p">)</span>
</code></pre></div>
<p>The <code>RequestSpotInstantRequest</code> class has a do-nothing constructor and lots of <code>.setXYZ</code> methods,
the most important of which is</p>
<div class="highlight"><pre><span></span><code><span class="kt">void</span> <span class="nf">setLaunchSpecification</span><span class="p">(</span><span class="n">LaunchSpecification</span> <span class="n">launchSpecification</span><span class="p">)</span>
</code></pre></div>
<p>where <code>LaunchSpecification</code> has yet more <code>.set</code> methods, including</p>
<div class="highlight"><pre><span></span><code><span class="kt">void</span> <span class="nf">setNetworkInterfaces</span><span class="p">(</span><span class="n">Collection</span><span class="o"><</span><span class="n">InstanceNetworkInterfaceSpecification</span><span class="o">></span> <span class="n">networkInterfaces</span><span class="p">)</span>
</code></pre></div>
<p>and the <code>InstanceNetworkInterfaceSpecification</code> is where <code>setSubnetId(String subnetId)</code> lives, so you really
do end up needing all of these classes.</p>
<p>Amazonica, by contrast, turns this logically nested data into explicitly nested hash-maps. So an entire request
can be constructed like this:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="k">def </span><span class="nv">my-req</span>
<span class="p">{</span><span class="ss">:spot-price</span> <span class="mf">0.01</span>,
<span class="ss">:instance-count</span> <span class="mi">1</span>,
<span class="ss">:type</span> <span class="s">"one-time"</span>,
<span class="ss">:launch-specification</span>
<span class="p">{</span><span class="ss">:image-id</span> <span class="s">"ami-something"</span>,
<span class="ss">:instance-type</span> <span class="s">"t1.micro"</span>,
<span class="ss">:placement</span> <span class="p">{</span><span class="ss">:availability-zone</span> <span class="s">"us-east-1a"</span><span class="p">}</span>,
<span class="ss">:key-name</span> <span class="s">"your-key"</span>
<span class="ss">:user-data</span> <span class="s">"THE BASH COMMAND WE WANT TO RUN, IN BASE 64"</span>
<span class="ss">:network-interfaces</span>
<span class="p">[{</span><span class="ss">:device-index</span> <span class="mi">0</span>
<span class="ss">:subnet-id</span> <span class="s">"subnet-yowsa"</span>
<span class="ss">:groups</span> <span class="p">[</span><span class="s">"sg-hubba"</span><span class="p">]}]</span>
<span class="ss">:iam-instance-profile</span>
<span class="p">{</span><span class="ss">:arn</span> <span class="s">"arn:aws:iam::123456789:instance-profile/name-you-chose"</span><span class="p">}}})</span>
<span class="p">(</span><span class="nb">apply </span><span class="nv">request-spot-instances</span> <span class="p">(</span><span class="nb">apply concat </span><span class="p">(</span><span class="nb">seq </span><span class="nv">my-req</span><span class="p">)))</span>
</code></pre></div>
<p>Note:</p>
<ul>
<li>The <code>:user-data</code> field isn't always a bash command, but we set it up that way.</li>
<li>(The <code>(apply (concat (seq ....)))</code> business is necessary, because <code>amazonica</code> functions are declared to take an
arbitrary number of arguments.
I.e. it really wants <code>(request-spot-instances :spot-price 0.01 :instance-count 1 ...)</code>.)</li>
</ul>
<p>This is a great improvement over pages of Java code, but we're not entirely free from the yoke of Amazon's SDK:</p>
<ol>
<li><strong>Complex nested data</strong>: While nested, persistent data structures are easier and safer to deal with than nested,
mutable special-purpose container classes, they're still nested in a complicated way; we would like to encapsulate
the complexity in such a way that we can set and access the data we need, without distributing knowledge of the
entire structure throughout our code.</li>
<li><strong>Synchronous interface</strong>: The Amazon SDK is in fact doubly synchronous, (a) in that calls to its
static methods are blocking, and (b) in that what these blocking methods do is initiate some action in an Amazon
data center and return, with no simple way to track these actions except to repeatedly call other blocking methods
that request status.<sup id="fnref:async"><a class="footnote-ref" href="#fn:async">1</a></sup></li>
</ol>
<h4>Lenses lite, for complex nested data</h4>
<p>It would be nice if, rather than sprinkling our code with long, hard-coded <code>assoc-in</code> paths,
we could specify path aliases in one place, like</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="k">def </span><span class="nv">path-aliases</span>
<span class="p">{</span><span class="ss">:zone</span> <span class="p">[</span><span class="ss">:launch-specification</span> <span class="ss">:placement</span> <span class="ss">:availability-zone</span><span class="p">]</span>
<span class="ss">:itype</span> <span class="p">[</span><span class="ss">:launch-specification</span> <span class="ss">:instance-type</span><span class="p">]</span>
<span class="ss">:subnet</span> <span class="p">[</span><span class="ss">:launch-specification</span> <span class="ss">:network-interfaces</span> <span class="mi">0</span> <span class="ss">:subnet-id</span><span class="p">]</span>
<span class="ss">:group</span> <span class="p">[</span><span class="ss">:launch-specification</span> <span class="ss">:network-interfaces</span> <span class="mi">0</span> <span class="ss">:groups</span> <span class="mi">0</span><span class="p">]</span>
<span class="ss">:public?</span> <span class="p">[</span><span class="ss">:launch-specification</span> <span class="ss">:network-interfaces</span> <span class="mi">0</span> <span class="ss">:associate-public-ip-address</span><span class="p">]</span>
<span class="ss">:price</span> <span class="p">[</span><span class="ss">:spot-price</span><span class="p">]</span>
<span class="ss">:n</span> <span class="p">[</span><span class="ss">:instance-count</span><span class="p">]</span>
<span class="ss">:key</span> <span class="p">[</span><span class="ss">:launch-specification</span> <span class="ss">:key-name</span><span class="p">]</span>
<span class="nv">...</span>
</code></pre></div>
<p>and even nicer if those paths could specify incoming and outgoing transformation functions
when we're required to mess around with encoding:</p>
<div class="highlight"><pre><span></span><code> <span class="ss">:udata</span> <span class="p">[</span><span class="ss">:launch-specification</span> <span class="ss">:user-data</span> <span class="p">[</span><span class="nv">s->b64</span> <span class="nv">b64->s</span><span class="p">]]})</span>
</code></pre></div>
<p>The previously mentioned
<a href="http://blog.podsnap.com/pinhole.hmtl">lens</a>
<a href="http://blog.podsnap.com/tinhole.hmtl">posts</a>
describe tools to do just this. To set, for example, the command we want to run on startup, we
can write</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="nf">ph-assoc</span> <span class="nv">path-aliases</span> <span class="nv">my-dict</span> <span class="ss">:udata</span> <span class="s">"echo howdy"</span><span class="p">)</span>
</code></pre></div>
<p>instead of</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="nf">assoc-in</span> <span class="nv">my-dict</span> <span class="p">[</span><span class="ss">:launch-specification</span> <span class="ss">:user-data</span><span class="p">]</span> <span class="p">(</span><span class="nf">s->b64</span> <span class="s">"echo howdy"</span><span class="p">))</span>
</code></pre></div>
<h4>Taming the synchronous interface</h4>
<p><img alt="lion taming" src="http://upload.wikimedia.org/wikipedia/commons/thumb/2/2e/Lion_tamer_%28LOC_pga.03749%29.jpg/375px-Lion_tamer_%28LOC_pga.03749%29.jpg"></p>
<h5>Sending notifications</h5>
<p>Remember that an authorized user on EC2 can publish a simple-notification-service message from the
command-line</p>
<div class="highlight"><pre><span></span><code>aws --region us-east-1 sns publish --topic-arn arn:aws:sns:us-east-1:yourtopic <span class="se">\</span>
--message yowsa
</code></pre></div>
<p>and that we can configure such that these messages in turn get published to SQS, the Simple Query Service,
which in turn we can tap from from inside and outside of EC2.</p>
<p>Remembering also that we've set up our instances so that, on boot, they'll run commands that have been
stuck into user-data, you may guess the basic strategy. The command we'll really run
uses the <code>ec2-metadata</code> utility that we set up earlier to extract information about the
host</p>
<div class="highlight"><pre><span></span><code>aws --region us-east-1 sns publish --topic-arn arn:aws:sns:us-east-1:12345678:instance-up <span class="se">\</span>
--message <span class="se">\</span>
<span class="sb">`</span><span class="o">(</span><span class="nb">echo</span> <span class="s2">"id: 549a49cd-a64c-44c9-b567-4901c373dc0b"</span><span class="p">;</span>bin/ec2-metadata -i -p -o<span class="o">)</span> base64 -w <span class="m">0</span><span class="sb">`</span>
</code></pre></div>
<p>resulting in key-value pairs like</p>
<div class="highlight"><pre><span></span><code> id: 549a49cd-a64c-44c9-b567-4901c373dc0b
instance-id: i-123456
public-hostname: ec2-12-345-67-890.compute-1.amazonaws.com
local-ipv4: 10.0.0.23
</code></pre></div>
<p>crammed into a base 64 encoded string. We will later
decode this message with the help of <code>clojure.data.codec.base64</code> and some
hideous regular expressions</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">s->b64</span> <span class="p">[</span><span class="nv">s</span><span class="p">]</span> <span class="p">(</span><span class="nf">String.</span> <span class="p">(</span><span class="nf">b64/encode</span> <span class="p">(</span><span class="nf">.getBytes</span> <span class="nv">s</span><span class="p">))))</span>
<span class="p">(</span><span class="kd">defn- </span><span class="nv">matcher->pair</span> <span class="p">[[</span><span class="nv">_</span> <span class="nv">k</span> <span class="nv">v</span><span class="p">]]</span> <span class="p">[(</span><span class="nb">keyword </span><span class="nv">k</span><span class="p">)</span> <span class="nv">v</span><span class="p">])</span>
<span class="p">(</span><span class="kd">defn </span><span class="nv">b64->ec2-data</span> <span class="p">[</span><span class="nv">s</span><span class="p">]</span>
<span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">xs</span> <span class="p">(</span><span class="nb">-> </span><span class="nv">s</span> <span class="nv">b64->s</span> <span class="nv">clojure.string/split-lines</span><span class="p">)</span>
<span class="nv">es</span> <span class="p">(</span><span class="nb">map </span><span class="o">#</span><span class="p">(</span><span class="nf">->></span> <span class="nv">%</span>
<span class="p">(</span><span class="nb">re-matches </span><span class="o">#</span><span class="s">"^([\w\-]+):\s*(.*)"</span><span class="p">)</span>
<span class="nv">matcher->pair</span><span class="p">)</span> <span class="nv">xs</span><span class="p">)]</span>
<span class="p">(</span><span class="nb">into </span><span class="p">{}</span> <span class="nv">es</span><span class="p">)))</span>
</code></pre></div>
<p>into a map of keywords to strings:</p>
<div class="highlight"><pre><span></span><code><span class="p">{</span><span class="ss">:id</span> <span class="s">"549a49cd-a64c-44c9-b567-4901c373dc0b"</span>
<span class="ss">:instance-id</span> <span class="s">"i-123456"</span>
<span class="ss">:public-hostname</span> <span class="s">"ec2-12-345-67-890.compute-1.amazonaws.com"</span>
<span class="ss">:local-ipv4</span> <span class="s">"10.0.0.23"</span><span class="p">}</span>
</code></pre></div>
<p>The <code>:id</code> is a UUID we'll create when we request the instance, allowing us to
track the results of that request via notifications.</p>
<h5>Receiving notifications</h5>
<p><img alt="gotmail" src="http://starringthecomputer.com/snapshots/youve_got_mail_powerbook_3400.jpg"></p>
<p>Back at the home front (i.e. the trusty laptop), we set up a single listener that transfers
all messages from an SQS endpoint (as we previously set up) onto <code>core.async</code> channel:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">sqs-listen</span> <span class="p">[</span><span class="nv">url</span><span class="p">]</span>
<span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">c</span> <span class="p">(</span><span class="nf">chan</span> <span class="mi">100</span><span class="p">)]</span>
<span class="p">(</span><span class="nf">async/go-loop</span> <span class="p">[]</span>
<span class="p">(</span><span class="k">if </span><span class="p">(</span><span class="nf">pimpl/closed?</span> <span class="nv">c</span><span class="p">)</span>
<span class="p">(</span><span class="nf">debug</span> <span class="s">"Shutting down sqs-listen"</span> <span class="nv">url</span><span class="p">)</span>
<span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">messages</span> <span class="p">(</span><span class="ss">:messages</span> <span class="p">(</span><span class="nf">receive-message</span> <span class="ss">:queue-url</span> <span class="nv">url</span> <span class="ss">:wait-time-seconds</span> <span class="mi">20</span><span class="p">))]</span>
<span class="p">(</span><span class="nb">doseq </span><span class="p">[{</span><span class="nv">r</span> <span class="ss">:receipt-handle</span> <span class="nv">b</span> <span class="ss">:body</span><span class="p">}</span> <span class="nv">messages</span><span class="p">]</span>
<span class="p">(</span><span class="nf">try</span> <span class="p">(</span><span class="nf">>!</span> <span class="nv">c</span> <span class="p">(</span><span class="nb">get </span><span class="p">(</span><span class="nf">json/read-str</span> <span class="nv">b</span><span class="p">)</span> <span class="s">"Message"</span><span class="p">))</span>
<span class="p">(</span><span class="nf">catch</span> <span class="nv">Exception</span> <span class="nv">e</span> <span class="p">(</span><span class="nf">info</span> <span class="s">"sqs-listen"</span> <span class="p">(</span><span class="nf">stack-trace</span> <span class="nv">e</span><span class="p">))))</span>
<span class="p">(</span><span class="nf">try</span> <span class="p">(</span><span class="nf">delete-message</span> <span class="ss">:queue-url</span> <span class="nv">url</span> <span class="ss">:receipt-handle</span> <span class="nv">r</span><span class="p">)</span>
<span class="p">(</span><span class="nf">catch</span> <span class="nv">Exception</span> <span class="nv">e</span> <span class="p">(</span><span class="nf">info</span> <span class="s">"sqs-listen"</span> <span class="p">(</span><span class="nf">stack-trace</span> <span class="nv">e</span><span class="p">)))))</span>
<span class="p">(</span><span class="nf">recur</span><span class="p">))))</span>
<span class="nv">c</span><span class="p">))</span>
</code></pre></div>
<p>The heart of this is <code>(receive-message :queue-url url :wait-time-seconds 20)</code>, which is a blocking
call to AWS that eventually returns one or more messages, or times out (20 seconds being the maximum
timeout you're allowed to specify).
While we can't force Amazon to give us an asynchronous alternative, we can isolate the synchronicity
in this one function and not have to worry about it elsewhere. This pattern seems to come up a lot when
interfacing with asynchronous communication libraries.</p>
<p>We're handed messages in a rather funny form. The hash map over <code>:receipt-handle</code> and
<code>:body</code> is pleasant enough, except that the <code>:body</code> value is a JSON-encoded string,
containing the real message under the key, <code>"Message"</code>, and of course that
"Message" is itself the base-64 encoded list of name/value pairs we sent ourselves.
Malformed messages occur frequently enough that it's worth doing the decoding under <code>try/catch</code>,
for which the <code>stack-trace</code> function is not particularly deep, but still quite useful:</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="kd">defn </span><span class="nv">stack-trace</span> <span class="p">[</span><span class="nv">e</span><span class="p">]</span> <span class="p">(</span><span class="nb">map str </span><span class="p">(</span><span class="nb">into </span><span class="p">[]</span> <span class="p">(</span><span class="nf">.getStackTrace</span> <span class="nv">e</span><span class="p">))))</span>
</code></pre></div>
<h5>Demultiplexing notifications</h5>
<p>All notifications, from every host, come back over the <code>sqs-listen</code> channel
in the form of base-64 encoded name/value pairs created by the remote
<code>aws sns</code> command. One of those values is a unique <code>:id</code> code, which,
as we'll see below, is created at the time of the original request, so
we can distribute messages out to specifically interested listeners:</p>
<p>We maintain a map of ids to channels in an atom</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defonce </span><span class="nv">ids->chs</span> <span class="p">(</span><span class="nf">atom</span> <span class="p">{}))</span>
</code></pre></div>
<p>and subscribe by associating a new channel to each new id:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">notify-chan</span>
<span class="p">[</span><span class="nv">id</span><span class="p">]</span>
<span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">c</span> <span class="p">(</span><span class="nf">chan</span><span class="p">)]</span>
<span class="p">(</span><span class="nf">swap!</span> <span class="nv">ids->chs</span> <span class="nb">assoc </span><span class="nv">id</span> <span class="nv">c</span><span class="p">)</span>
<span class="nv">c</span><span class="p">))</span>
</code></pre></div>
<p>Meanwhile, there's a thread listening on our SQS channel, decoding the messages
and dispatching them onward:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">start-up-id-listener</span> <span class="p">[]</span>
<span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">kill-switch</span> <span class="p">(</span><span class="nf">chan</span><span class="p">)</span>
<span class="nv">cl</span> <span class="p">(</span><span class="nf">sqs-listen</span><span class="p">)]</span>
<span class="p">(</span><span class="nf">async/go-loop</span> <span class="p">[]</span>
<span class="p">(</span><span class="k">let </span><span class="p">[[</span><span class="nv">v</span> <span class="nv">c</span><span class="p">]</span> <span class="p">(</span><span class="nf">async/alts!</span> <span class="p">[</span><span class="nv">kill-switch</span> <span class="nv">cl</span><span class="p">])]</span>
<span class="p">(</span><span class="nf">cond</span>
<span class="p">(</span><span class="nb">= </span><span class="nv">c</span> <span class="nv">kill-switch</span><span class="p">)</span>
<span class="p">(</span><span class="k">do </span><span class="p">(</span><span class="nf">reset!</span> <span class="nv">ids->chs</span> <span class="p">{})</span> <span class="p">(</span><span class="nf">close!</span> <span class="nv">cl</span><span class="p">))</span>
<span class="p">(</span><span class="nb">= </span><span class="nv">c</span> <span class="nv">cl</span><span class="p">)</span>
<span class="p">(</span><span class="k">do </span><span class="p">(</span><span class="nf">try</span> <span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">s</span> <span class="p">(</span><span class="nf">b64->ec2-data</span> <span class="nv">v</span><span class="p">)</span>
<span class="nv">id</span> <span class="p">(</span><span class="ss">:id</span> <span class="nv">s</span><span class="p">)</span>
<span class="nv">c</span> <span class="p">(</span><span class="nb">get </span><span class="o">@</span><span class="nv">ids->chs</span> <span class="nv">id</span><span class="p">)]</span>
<span class="p">(</span><span class="nb">when </span><span class="nv">c</span> <span class="p">(</span><span class="nf">>!</span> <span class="nv">c</span> <span class="nv">s</span><span class="p">)))</span>
<span class="p">(</span><span class="nf">catch</span> <span class="nv">Exception</span> <span class="nv">e</span> <span class="p">(</span><span class="nf">info</span> <span class="nv">e</span> <span class="p">{</span><span class="ss">:stack-trace</span> <span class="p">(</span><span class="nf">stack-trace</span> <span class="nv">e</span><span class="p">)})))</span>
<span class="p">(</span><span class="nf">recur</span><span class="p">)))))</span>
<span class="nv">kill-switch</span><span class="p">))</span>
</code></pre></div>
<p>Typically, we would use this machinery like</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">id</span> <span class="p">(</span><span class="nf">.toString</span> <span class="p">(</span><span class="nf">java.util.UUID/randomUUID</span><span class="p">))]</span>
<span class="p">(</span><span class="nf">start-up-something-on-aws-using</span> <span class="nv">id</span><span class="p">)</span>
<span class="p">(</span><span class="nf">notify-chan</span> <span class="nv">id</span><span class="p">))</span>
</code></pre></div>
<h4>Actually running something</h4>
<p>Let's start by bringing up a single medium sized host and run the Redis
server on it.</p>
<p>Here are the instructions that masochists use:</p>
<ol>
<li>Go to the <a href="https://console.aws.amazon.com/ec2/v2/home?region=us-east-1#Images:">list of images</a>
including the AMI we set up last time.</li>
<li>Select "Spot Request" from the Action dropdown.</li>
<li>Select the <code>m3.medium</code> instance type.</li>
<li>Type in the maximum price we'll pay per hour.</li>
<li>Override the default networking to the VPC we created.</li>
<li>Override the default IAM user to the one we created.</li>
<li>Override the default security group to the one we created.</li>
<li>Click Launch.</li>
<li>Repeatedly refresh the <a href="https://console.aws.amazon.com/ec2/v2/home?region=us-east-1#SpotInstances:">Spot Request monitor</a>
until it shows an instance id.</li>
<li>When it does, after 1 to 10 minutes, examine the instance on the <a href="https://console.aws.amazon.com/ec2/v2/home?region=us-east-1#Instances:sort=desc:tag:Name">Instances Page</a></li>
<li>Copy its public IP address and ssh to it from a terminal</li>
<li>Run your program.</li>
<li>Repeat as necessary.</li>
</ol>
<p>What we would like instead is a simple function, to which we provide our request map
(in which most of the manually entered values above are already present), a
few extra stipulations about the machine, and the command we want to run</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="nf">bring-up-one-spot</span> <span class="nv">my-req</span> <span class="p">[</span><span class="s">"bin/redis-server"</span><span class="p">]</span> <span class="ss">:subnet</span> <span class="nv">my-sub-public</span> <span class="ss">:itype</span> <span class="s">"m3.medium"</span> <span class="ss">:price</span> <span class="mf">0.05</span><span class="p">)</span>
</code></pre></div>
<p>and get back a channel that will deliver information about the instance once it's up. If the
instance doesn't come up properly, we want our function to eradicate any bits of it that might still
be costing us money.</p>
<p>Let's build that function. For brevity I've stripped out
configuration details and optional arguments present in the real code,
so there will some hard-coding, as well as references throughout to
global variables like <code>my-sns-topic</code>, which you'll have to take it
on faith are defined somewhere.</p>
<p>The function will start by <code>ph-assoc</code>-ing any specified stipulations (such as the subnet,
type and maximum hourly price, in this case) into the default request map:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">bring-up-one-spot</span>
<span class="p">[</span><span class="nv">req</span> <span class="nv">cmds</span> <span class="o">&</span> <span class="nv">opts</span><span class="p">]</span>
<span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">req</span> <span class="p">(</span><span class="nb">apply </span><span class="nv">ph-assoc</span> <span class="nv">req</span> <span class="nv">path-aliases</span> <span class="nv">opts</span><span class="p">)</span>
</code></pre></div>
<p>Then it constructs the full command for the spot to run and uses <code>ph-assoc</code> to encode it
as user data</p>
<div class="highlight"><pre><span></span><code> <span class="nv">id</span> <span class="p">(</span><span class="nf">.toString</span> <span class="p">(</span><span class="nf">UUID/randomUUID</span><span class="p">))</span>
<span class="nv">cmd</span> <span class="p">(</span><span class="nb">str </span><span class="p">(</span><span class="nf">send-up</span> <span class="nv">id</span><span class="p">)</span> <span class="p">(</span><span class="nf">clojure.string/join</span> <span class="s">"\n"</span> <span class="nv">cmds</span><span class="p">)</span> <span class="s">"\n"</span><span class="p">)</span>
<span class="nv">req</span> <span class="p">(</span><span class="nf">ph-assoc</span> <span class="nv">req</span> <span class="nv">paths</span> <span class="ss">:udata</span> <span class="nv">cmd</span><span class="p">)</span>
</code></pre></div>
<p>where <code>send-up</code> constructs the previously described <code>aws sns publish</code> command embedding <code>id</code>.</p>
<p>Next, we register for notifications on that <code>id</code>,
make the call to initiate the spot auction request, and extract the identifier
that EC2 gives to the request (it will be "sir-" something or other, which consistently
makes me giggle):</p>
<div class="highlight"><pre><span></span><code> <span class="nv">cl</span> <span class="p">(</span><span class="nf">notify-chan</span> <span class="nv">id</span><span class="p">)</span>
<span class="nv">rs</span> <span class="p">(</span><span class="nb">apply </span><span class="nv">request-spot-instances</span> <span class="p">(</span><span class="nb">apply concat </span><span class="p">(</span><span class="nb">seq </span><span class="nv">my-req</span><span class="p">)))</span>
<span class="nv">rs</span> <span class="p">(</span><span class="nb">map </span><span class="ss">:spot-instance-request-id</span> <span class="p">(</span><span class="ss">:spot-instance-requests</span> <span class="nv">rs</span><span class="p">))]</span>
</code></pre></div>
<p>Now we wait for either notification or a timeout:</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="nf">go</span>
<span class="k">let </span><span class="p">[</span><span class="nv">to</span> <span class="p">(</span><span class="nf">timeout</span> <span class="p">(</span><span class="nb">* </span><span class="mi">10</span> <span class="mi">60</span> <span class="mi">1000</span><span class="p">))</span>
<span class="p">[</span><span class="nv">v</span> <span class="nv">c</span><span class="p">]</span> <span class="p">(</span><span class="nf">async/alts!</span> <span class="p">[</span><span class="nv">cl</span> <span class="nv">to</span><span class="p">])</span>
</code></pre></div>
<p>Either way, we're going to want some information about the request, including the instance id
if an instance was actually created:</p>
<div class="highlight"><pre><span></span><code> <span class="nv">ds</span> <span class="p">(</span><span class="ss">:spot-instance-requests</span>
<span class="p">(</span><span class="nf">describe-spot-instance-requests</span> <span class="ss">:spot-instance-request-ids</span> <span class="nv">rs</span><span class="p">))</span>
<span class="nv">is</span> <span class="p">(</span><span class="nb">filter </span><span class="p">(</span><span class="nb">complement </span><span class="nv">nil?</span><span class="p">)</span> <span class="p">(</span><span class="nb">map </span><span class="ss">:instance-ids</span> <span class="nv">ds</span><span class="p">))]</span>
</code></pre></div>
<p>In the event of a timeout, it's still possible that the request was fulfilled, but for some reason or another
our notification didn't occur, so we should cancel the requests as well as terminate
the instance, if it exists:</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="nf">condp</span> <span class="nb">= </span><span class="nv">c</span>
<span class="nv">to</span> <span class="p">(</span><span class="nf">do</span>
<span class="p">(</span><span class="nf">cancel-spot-instance-requests</span> <span class="ss">:spot-instance-request-ids</span> <span class="nv">rs</span><span class="p">)</span>
<span class="p">(</span><span class="nb">when </span><span class="p">(</span><span class="nb">seq </span><span class="nv">is</span><span class="p">)</span> <span class="p">(</span><span class="nf">terminate-instances</span> <span class="ss">:instance-ids</span> <span class="p">(</span><span class="nf">vec</span> <span class="nv">is</span><span class="p">)))</span>
<span class="nv">nil</span><span class="p">)</span>
</code></pre></div>
<p>If we do receive notification, we can just return the request and instance ids</p>
<div class="highlight"><pre><span></span><code> <span class="nv">cl</span> <span class="p">[</span><span class="nv">rs</span> <span class="nv">is</span><span class="p">]))))</span>
</code></pre></div>
<p>so they'll be delivered on the channel returned by the <code>go</code> block.</p>
<h5>Refactoring and generalization</h5>
<p>The code above makes a few compromises for clarity. I won't go over all the details, but
the real <code>bring-up-spots</code> function is a bit fancier:</p>
<ol>
<li>As per its plural name, you can request an arbitrary number of identically configured instances,
along with a minimum that you require to have come up before the timeout. If fewer than the
minimum come up, we cancel the whole lot of them.</li>
<li>All live spot request and instance ids are stored in a global atom, so we can quickly tear down
the entire session with a single call if we need to go home.</li>
<li>The API boilerplate is pulled into functions that simplify the interface and clean up the output.</li>
<li>The channel eventually returns not just the request and instance ids, but a map from request id to
maps containing more information, e.g.:</li>
</ol>
<div class="highlight"><pre><span></span><code> <span class="p">{</span><span class="s">"sir-12345"</span> <span class="p">{</span><span class="ss">:instance-id</span> <span class="s">"abc123"</span>
<span class="ss">:ip</span> <span class="s">"10.0.7.100"</span> <span class="c1">;; private IP on the VPC</span>
<span class="ss">:host</span> <span class="s">"ec2-54-101-165-9-98-compute1.amazonaws.com"</span> <span class="c1">;; public</span>
<span class="ss">:state</span> <span class="s">"running"</span>
<span class="ss">:request-id</span> <span class="s">"sir-12345"</span><span class="p">}</span> <span class="nv">...</span> <span class="p">}</span>
</code></pre></div>
<h4>A generic CLI runnable for Clojure functions</h4>
<p><img alt="eden-cranach" src="http://upload.wikimedia.org/wikipedia/commons/thumb/6/65/Lucas_Cranach_d._%C3%84._035.jpg/450px-Lucas_Cranach_d._%C3%84._035.jpg"></p>
<p>So far, the only thing we've run is the Redis server, whose executable we pre-loaded earlier onto
the image. To run Clojure code, we need wrap it in a class with a <code>-main</code>, bundle it up
with <code>lein-uberjar</code>, copy it to S3 from our external host, and copy it from S3 down to all the
internal instances that need it.<sup id="fnref:s3"><a class="footnote-ref" href="#fn:s3">2</a></sup></p>
<p>In the case of girder, the guts of the runnable will be the the <code>launch-worker</code>, <code>launch-distributor</code>,
etc. surrounded by a tremendous amount of common functionality. It is useful to have a generic
wrapper that is given a function to invoke and an option specification in <code>clojure.tools.cli</code> form,
provides a several core niceties:</p>
<ol>
<li>Augments the option specification for the common functionality.</li>
<li>Sets up <code>timbre</code> logging, including an option to log to Redis.</li>
<li>Wraps the function call in a try/catch.</li>
<li>Invoke the function, passing it a map of parsed options to values.</li>
<li>Allow that map to be specified fully in a single EDN string (useful when constructing the
command from within Clojure).</li>
<li>Package the return value in a map containing either the <code>:result</code> or an <code>:exception</code>.</li>
<li>Possibly include a vector of <code>:log</code> entries in the output.</li>
<li>And timing information.</li>
<li>Dump it all to stdout in machine-readable <code>pr-str</code> form.</li>
<li>Optionally hang around for a while before exiting.</li>
</ol>
<p>None of that is particularly difficult, but it is tedious to do over and over again, so I wrote
a <a href="https://github.com/pnf/clj-utils/blob/master/src/acyclic/utils/cli.clj">utility</a>
function, allowing one-line definition of <code>-main</code>:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">-main</span> <span class="p">[</span><span class="o">&</span> <span class="nv">args</span><span class="p">]</span> <span class="p">(</span><span class="nf">cli/edn-app</span> <span class="nv">args</span> <span class="nv">cli-options</span> <span class="nv">doit</span><span class="p">))</span>
</code></pre></div>
<p>For Girder, the <code>doit</code> starts by extracting a few options:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">doit</span> <span class="p">[</span><span class="nv">opts</span><span class="p">]</span>
<span class="p">(</span><span class="nf">acyclic.girder.grid.redis/init!</span> <span class="p">(</span><span class="ss">:host</span> <span class="nv">opts</span><span class="p">)</span> <span class="p">(</span><span class="ss">:port</span> <span class="nv">opts</span><span class="p">))</span>
<span class="p">(</span><span class="k">let </span><span class="p">[{</span><span class="ss">:keys</span> <span class="p">[</span><span class="nv">numrecjobs</span> <span class="nv">reclevel</span> <span class="nv">cmt</span> <span class="nv">help</span> <span class="nv">pool</span> <span class="nv">worker</span> <span class="nv">distributor</span> <span class="nv">host</span> <span class="nv">port</span>
<span class="nv">jobmsecs</span> <span class="nv">jobs</span> <span class="nv">jobtimeout</span> <span class="nv">helper</span> <span class="nv">cleanup</span><span class="p">]}</span> <span class="nv">opts</span><span class="p">]</span>
</code></pre></div>
<p>and returns a map, containing entries for one or more services we've chosen to launch
in this instance. (The specific services won't make much sense if you didn't
read the Girder posts, but you get the idea.)</p>
<p>We optionally launch one or more distributors:</p>
<div class="highlight"><pre><span></span><code> <span class="p">{</span><span class="ss">:distributor</span> <span class="c1">;; --distributor POOLID1[,POOLID2,...]</span>
<span class="p">(</span><span class="nb">when </span><span class="nv">distributor</span>
<span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">ds</span> <span class="p">(</span><span class="nf">clojure.string/split</span> <span class="nv">distributor</span> <span class="o">#</span><span class="s">","</span><span class="p">)]</span>
<span class="p">(</span><span class="nb">for </span><span class="p">[</span><span class="nv">d</span> <span class="nv">ds</span><span class="p">]</span>
<span class="p">(</span><span class="nf">grid/launch-distributor</span> <span class="nv">d</span> <span class="nv">pool</span><span class="p">))))</span>
</code></pre></div>
<p>A helper thread stealing work every <code>MSECS</code>:</p>
<div class="highlight"><pre><span></span><code> <span class="ss">:helper</span> <span class="c1">;; --helper MSECS</span>
<span class="p">(</span><span class="nb">when </span><span class="nv">helper</span>
<span class="p">(</span><span class="nf">grid/launch-helper</span> <span class="nv">distributor</span> <span class="nv">helper</span><span class="p">))</span>
</code></pre></div>
<p>One or more workers (in the same process), either identified explicitly or automatically named:</p>
<div class="highlight"><pre><span></span><code> <span class="ss">:worker</span> <span class="c1">;; --worker (N-WORKERS | WID1[,WID2,...])</span>
<span class="p">(</span><span class="nb">when </span><span class="nv">worker</span>
<span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">u</span> <span class="p">(</span><span class="nf">java.util.UUID/randomUUID</span><span class="p">)</span>
<span class="nv">n</span> <span class="p">(</span><span class="nf">try</span> <span class="p">(</span><span class="nf">Integer/parseInt</span> <span class="nv">worker</span><span class="p">)</span> <span class="p">(</span><span class="nf">catch</span> <span class="nv">Exception</span> <span class="nv">e</span> <span class="nv">nil</span><span class="p">))</span>
<span class="nv">ws</span> <span class="p">(</span><span class="k">if </span><span class="nv">n</span>
<span class="p">(</span><span class="nb">map </span><span class="o">#</span><span class="p">(</span><span class="nb">str </span><span class="s">"w"</span> <span class="nv">%</span> <span class="s">"-"</span> <span class="nv">u</span><span class="p">)</span> <span class="p">(</span><span class="nb">range </span><span class="nv">n</span><span class="p">))</span>
<span class="p">(</span><span class="nf">clojure.string/split</span> <span class="nv">worker</span> <span class="o">#</span><span class="s">","</span><span class="p">))]</span>
<span class="p">(</span><span class="nb">for </span><span class="p">[</span><span class="nv">w</span> <span class="nv">ws</span><span class="p">]</span>
<span class="p">(</span><span class="nf">grid/launch-worker</span> <span class="nv">w</span> <span class="nv">pool</span><span class="p">))))</span>
</code></pre></div>
<p>The results from <code>N</code> parallel job requests:</p>
<div class="highlight"><pre><span></span><code> <span class="ss">:jobs</span> <span class="c1">;; --jobs N</span>
<span class="p">(</span><span class="nb">when </span><span class="p">(</span><span class="nb">and </span><span class="nv">pool</span> <span class="nv">jobs</span> <span class="p">(</span><span class="nb">pos? </span><span class="nv">jobs</span><span class="p">))</span>
<span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">f</span> <span class="p">(</span><span class="k">fn </span><span class="p">[</span><span class="nv">i</span><span class="p">]</span> <span class="p">(</span><span class="nf">grid/enqueue</span> <span class="nv">pool</span> <span class="p">[</span><span class="nv">recbog</span> <span class="nv">jobmsecs</span> <span class="nv">i</span> <span class="nv">reclevel</span> <span class="nv">numrecjobs</span> <span class="nv">cmt</span><span class="p">]</span> <span class="nv">false</span> <span class="s">"cli"</span><span class="p">))</span>
<span class="nv">c</span> <span class="p">(</span><span class="nf">async/map</span> <span class="nb">vector </span><span class="p">(</span><span class="nb">map </span><span class="nv">f</span> <span class="p">(</span><span class="nb">range </span><span class="nv">jobs</span><span class="p">)))</span>
<span class="nv">t</span> <span class="p">(</span><span class="nf">async/timeout</span> <span class="p">(</span><span class="nb">* </span><span class="mi">1000</span> <span class="nv">jobtimeout</span><span class="p">))]</span>
<span class="p">(</span><span class="k">let </span><span class="p">[[</span><span class="nv">v</span> <span class="nv">ch</span><span class="p">]</span> <span class="p">(</span><span class="nf">async/alts!!</span> <span class="p">[</span><span class="nv">c</span> <span class="nv">t</span><span class="p">])]</span>
<span class="p">(</span><span class="nb">or </span><span class="nv">v</span> <span class="s">"timeout"</span><span class="p">))))}</span>
<span class="p">))</span>
</code></pre></div>
<h4>Putting it all together</h4>
<p><img src="http://ecmbloggen.files.wordpress.com/2013/05/ikea-assembly.jpg" width=450px></p>
<p>We can run an experiment by running a "script" in the REPL.</p>
<p>Since all the Girder services work via Redis, we start by bringing up the Redis server and extracting
some information about it. It will be handy to have the uberjar on this machine, so we download
it first:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">cmd-getjar</span> <span class="p">[]</span> <span class="p">(</span><span class="nb">str </span><span class="s">"aws s3 cp s3://dist-ec2/girder.jar ."</span><span class="p">))</span>
<span class="p">(</span><span class="k">def </span><span class="nv">c-redis</span> <span class="p">(</span><span class="nf">bring-up-spots</span> <span class="nv">my-req</span> <span class="mi">1</span> <span class="p">[(</span><span class="nf">cmds-getjar</span><span class="p">)</span> <span class="s">"bin/redis"</span><span class="p">]</span> <span class="ss">:subnet</span> <span class="nv">my-sub-public</span> <span class="ss">:itype</span> <span class="s">"m3.medium"</span> <span class="ss">:price</span> <span class="mf">0.05</span> <span class="ss">:log</span> <span class="s">"debug"</span><span class="p">))</span>
</code></pre></div>
<p>Since we need to extract some information about the instance before we can start
other instances, it's necessary to wait:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="k">def </span><span class="nv">r-redis</span> <span class="p">(</span><span class="nf"><!!</span> <span class="nv">c-redis</span><span class="p">)</span>
<span class="p">(</span><span class="k">def </span><span class="nv">redis</span> <span class="p">(</span><span class="nb">first </span><span class="p">(</span><span class="nb">vals </span><span class="nv">r-redis</span><span class="p">)))</span>
<span class="p">(</span><span class="k">def </span><span class="nv">redis-ip</span> <span class="p">(</span><span class="ss">:ip</span> <span class="nv">redis</span><span class="p">))</span>
<span class="p">(</span><span class="k">def </span><span class="nv">redis-host</span> <span class="p">(</span><span class="ss">:host</span> <span class="nv">redis</span><span class="p">))</span>
<span class="p">(</span><span class="k">def </span><span class="nv">redis-log</span> <span class="p">(</span><span class="nb">str </span><span class="s">"debug:"</span> <span class="nv">redis-ip</span> <span class="s">":6379"</span><span class="p">))</span>
</code></pre></div>
<p>We're going to pass the private <code>redis-ip</code> address
and the log specification in which its embedded
into the CLI invocations for the various server processes.
With the public <code>redis-host</code>, we can, if we want, use <code>clj-ssh</code> and
<code>carmine</code> to interrogate the server:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="k">def </span><span class="nv">s-redis</span> <span class="p">(</span><span class="nf">ssh/session</span> <span class="nb">agent </span><span class="nv">redis-host</span><span class="p">))</span>
<span class="p">(</span><span class="nf">ssh/forward-local-port</span> <span class="nv">s-redis</span> <span class="mi">8379</span> <span class="mi">6379</span><span class="p">)</span>
<span class="p">(</span><span class="k">def </span><span class="nv">car-redis</span> <span class="p">{</span><span class="ss">:pool</span> <span class="p">{}</span> <span class="ss">:spec</span> <span class="p">{</span><span class="ss">:host</span> <span class="s">"localhost"</span> <span class="ss">:port</span> <span class="mi">8379</span><span class="p">}})</span>
<span class="p">(</span><span class="nf">wcar</span> <span class="nv">car-redis</span> <span class="p">(</span><span class="nf">car/keys</span> <span class="ss">'*</span><span class="o">'</span><span class="p">))</span>
</code></pre></div>
<p>Next, we bring up a distributor and a helper in the same process, starting by downloading
the uberjar:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">cmds-dist</span> <span class="p">[</span><span class="nv">pool</span> <span class="nv">ip</span> <span class="nv">msec</span> <span class="nv">log</span><span class="p">]</span>
<span class="p">[(</span><span class="nf">cmd-getjar</span><span class="p">)</span>
<span class="p">(</span><span class="nb">str </span><span class="s">"java -cp girder.jar acyclic.girder.testutils.grid --distributor "</span> <span class="nv">pool</span>
<span class="s">" --helper "</span> <span class="nv">msec</span>
<span class="s">" --host "</span> <span class="nv">ip</span> <span class="s">" --hang 1000 --log "</span> <span class="nv">log</span><span class="p">)])</span>
<span class="p">(</span><span class="k">def </span><span class="nv">c-dist</span>
<span class="p">(</span><span class="nf">bring-up-spots</span> <span class="nv">my-req</span> <span class="mi">1</span> <span class="p">(</span><span class="nf">cmds-dist</span> <span class="s">"pool"</span> <span class="nv">redis-ip</span> <span class="mi">100</span> <span class="nv">redis-log</span><span class="p">)</span>
<span class="ss">:subnet</span> <span class="nv">my-sub-private</span> <span class="ss">:itype</span> <span class="s">"t1.micro"</span> <span class="ss">:price</span> <span class="mf">0.01</span> <span class="ss">:key</span> <span class="s">"girder"</span> <span class="ss">:minutes</span> <span class="mi">100</span><span class="p">))</span>
</code></pre></div>
<p>Now, the fun part. Spin up 100 hosts, running a worker on each:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">cmds-workers</span> <span class="p">[</span><span class="nv">n</span> <span class="nv">pool</span> <span class="nv">ip</span> <span class="nv">log</span><span class="p">]</span>
<span class="p">[(</span><span class="nf">cmd-getjar</span><span class="p">)</span>
<span class="p">(</span><span class="nb">str </span><span class="s">"java -cp girder.jar acyclic.girder.testutils.grid --worker "</span> <span class="nv">n</span>
<span class="s">" --pool "</span> <span class="nv">pool</span> <span class="s">" --host "</span> <span class="nv">ip</span> <span class="s">" --hang 1000 --log "</span> <span class="nv">log</span><span class="p">)])</span>
<span class="p">(</span><span class="k">def </span><span class="nv">c-workers</span>
<span class="p">(</span><span class="nf">bring-up-spots</span> <span class="nv">my-req</span> <span class="mi">100</span> <span class="p">(</span><span class="nf">cmds-workers</span> <span class="mi">1</span> <span class="s">"pool"</span> <span class="nv">redis-ip</span> <span class="nv">redis-log</span><span class="p">)</span>
<span class="ss">:subnet</span> <span class="nv">my-sub-private</span> <span class="ss">:itype</span> <span class="s">"t1.micro"</span> <span class="ss">:price</span> <span class="mf">0.01</span> <span class="ss">:key</span> <span class="s">"girder"</span> <span class="ss">:minutes</span> <span class="mi">100</span><span class="p">))</span>
</code></pre></div>
<p>Wait until they're all up:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="k">def </span><span class="nv">r-dist</span> <span class="p">(</span><span class="nf"><!!</span> <span class="nv">c-dist</span><span class="p">))</span>
<span class="p">(</span><span class="k">def </span><span class="nv">r-workers</span> <span class="p">(</span><span class="nf"><!!</span> <span class="nv">c-workers</span><span class="p">))</span>
</code></pre></div>
<p>Since the <code>log</code> argument is of the form <code>(str "debug:" redis-ip ":6379")</code>, messages
at <code>:debug</code> level will be sent to Redis, where we may peruse them at leisure:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="nf">car-appender/query-entries</span> <span class="nv">car-redis</span> <span class="ss">:debug</span><span class="p">)</span>
</code></pre></div>
<p>We're about ready to launch jobs. We'll do so from the Redis server, since it has a public IP, to
which we can ssh.</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">cmd-job</span> <span class="p">[</span><span class="nv">pool</span> <span class="nv">ip</span> <span class="nv">jobs</span> <span class="nv">reclevel</span> <span class="nv">log</span><span class="p">]</span>
<span class="p">(</span><span class="nb">str </span> <span class="s">"java -cp girder.jar acyclic.girder.testutils.grid --pool "</span> <span class="nv">pool</span>
<span class="s">" --host "</span> <span class="nv">ip</span> <span class="s">" --jobs "</span> <span class="nv">jobs</span> <span class="s">" --reclevel "</span> <span class="nv">reclevel</span> <span class="s">" --log "</span> <span class="nv">log</span><span class="p">)</span>
<span class="p">(</span><span class="nf">ssh/ssh-exec</span> <span class="nv">s-redis</span> <span class="p">(</span><span class="nf">cmd-job</span> <span class="s">"pool"</span> <span class="nv">redis-ip</span> <span class="mi">50</span> <span class="mi">3</span> <span class="nv">redis-log</span><span class="p">))</span>
</code></pre></div>
<p>If all goes well (and it did), this returns the same sort of array of "Bogosity" strings that
we got locally in the first Girder post, except a lot more of them. Since not much computation is
going on, the job of the workers is really to harass the distributor and Redis server as relentlessly
as possible, so we get an idea of what they can take. And that seems to be quite a lot. Without
any time spent on network optimization (and using <code>t1.micro</code> instances advertised for their
cruddy network performance) system latency seems to be about 4-5ms per job, which is about 50 times
better than I've seen on commercial systems running on much better (and better controlled) hardware.</p>
<h4>Teardown</h4>
<p>Cashflow from Girder is strictly negative, in the form of payments to Amazon, and those are
largely proportional to the number of machines and the time that they're up. As noted,
we keep track of <code>all-requests</code> in an atom, so they can be canceled en masse in a hurry
with a single function call:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="nf">clean-up</span><span class="p">)</span>
</code></pre></div>
<h3>Clojure vs Bash</h3>
<p><img alt="orcs" src="http://img1.wikia.nocookie.net/__cb20130418141001/merp/images/7/7a/Orcs_attacking.jpg"></p>
<p>I know they're fairy tales, but still I am haunted by stories my
grandmother use to tell me, about dirty, evil monsters who do not
recognize the superiority of the Clojure REPL for nearly any purpose.
If only they would listen, or read blogs, surely the surging armies
of darkness would lay down their arms and turn on <code>paredit</code> mode.</p>
<p>For interactive control of many server instances, the Clojure REPL has
distinct advantages over the Bourne Shell. In addition to a boundless
library of tools to prepare and analyze data, we have vastly better
concurrency control via <code>core.async</code>. Plus it's more fun.</p>
<div class="footnote">
<hr>
<ol>
<li id="fn:async">
<p>Note: AWS does make a stab at (a) with various classes ending in <code>Async</code>, whose methods return <code>Future</code>s. Unfortunately, these are the silly sort of futures, which allow you to poll and block on completion but don't support callbacks or any way of fmap-ing on followup behavior. Anyway, type (b) is much more important, as the timescales are much longer. <a class="footnote-backref" href="#fnref:async" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
<li id="fn:s3">
<p>Technically, we could have ssh'd onto every instance separately, but that would have taken longer, cost more and required each instance to have a public IP address. <a class="footnote-backref" href="#fnref:s3" title="Jump back to footnote 2 in the text">↩</a></p>
</li>
</ol>
</div>vanholes - Van Laarhoven Lenses in Clojure2014-10-08T00:00:00-04:002014-10-08T00:00:00-04:00Peter Fraenkeltag:blog.podsnap.com,2014-10-08:/vanhole.html<p>In two previous posts, I went on about lenses in Clojure.
<a href="http://blog.podsnap.com/pinhole.html">Pinholes</a> comprised
a small library of higher order functions to formalize and simplify
the viewing and manipulation of complex nested structures.
<a href="http://blog.podsnap.com/tinhole.html">Tinholes</a> did essentially the same
thing, but with macros instead. In both cases, there's recursion going on,
as we burrow through layers of nesting, but macros had the advantage
of doing it all before compilation, giving <code>core.typed</code> a chance to
check our work.</p>
<p>The macro post was inexplicably popular, catapulting me to levels of
fame I never expected to achieve without consciously emulating an
early De Niro …</p><p>In two previous posts, I went on about lenses in Clojure.
<a href="http://blog.podsnap.com/pinhole.html">Pinholes</a> comprised
a small library of higher order functions to formalize and simplify
the viewing and manipulation of complex nested structures.
<a href="http://blog.podsnap.com/tinhole.html">Tinholes</a> did essentially the same
thing, but with macros instead. In both cases, there's recursion going on,
as we burrow through layers of nesting, but macros had the advantage
of doing it all before compilation, giving <code>core.typed</code> a chance to
check our work.</p>
<p>The macro post was inexplicably popular, catapulting me to levels of
fame I never expected to achieve without consciously emulating an
early De Niro character. It even made the low 20's
on Hacker News,
where <a href="https://news.ycombinator.com/item?id=8381463">this comment</a>
was made:</p>
<div class="highlight"><pre><span></span><code>Using macros to pre-compile the lenses is clever, but
</code></pre></div>
<p>Punch drunk on Google Analytics, I was inclined to stop reading at "clever,"
but, reminding myself that self doubt is the mother of all virtue, I pressed on:</p>
<div class="highlight"><pre><span></span><code>feels like a hack around typed-clojure instead of being
aligned to it. All of the information needed to determine a lens'
action is available at compile time even prior to expansion. Can a
van Laarhoven representation be made in typed-clojure that
recovers this information?
</code></pre></div>
<p>Interesting. (The bit further down, where I'm advised to show a "little
humility," was less interesting. Mea culpa, already, mea maxima culpa.)
Can we, in fact, make the van Laarhoven formulation be made to work in
typed Clojure?</p>
<h4>TL;DR</h4>
<ol>
<li>Yes we can!</li>
<li>Though with less than perfect grace.</li>
<li>Typed clojure is amazing and important.</li>
<li>But it's a work in progress.</li>
<li>Even when it's finished, coding styles that work well in Haskell
will still be awkward in Clojure.</li>
<li>And vice versa. </li>
<li>Either way, strong typing is both crucial and achievable.</li>
</ol>
<p>While not originally intended as such, the latter points may constitute a feeble response
to</p>
<p><img alt="" src="images/chiusano-tweet.png"></p>
<p>and the
<a href="http://existentialtype.wordpress.com/2011/03/19/dynamic-languages-are-static-languages/">post</a>
to which it refers.</p>
<p>Continuing the tradition of awesome nomenclature, I am honored to present <strong>vanholes</strong>.</p>
<h4>The van Laarhoven representation of lenses</h4>
<p>If you don't know much about van Laarhoven lenses, but do know a little Haskell
I strongly recommend
<a href="http://blog.jakubarnold.cz/2014/07/14/lens-tutorial-introduction-part-1.html">this tutorial</a>,
with which I would never try to compete.
There's also a
<a href="https://skillsmatter.com/skillscasts/4251-lenses-compositional-data-access-and-manipulation">great talk by Simon Peyton Jones</a>,
but, as you'll discover if you try the link, its permissions were recently
locked down, roughly coincident with my post going up,
making last Monday a banner day for the forces of ignorance regarding lenses.</p>
<p>The basic, mind-blowing idea of the van Laarhoven representation is that,
rather than specifying separate getter and setter functions for some piece
of data within a structure, you can write just one function, with a, seemingly,
very weird type. In Haskell,</p>
<div class="highlight"><pre><span></span><code> <span class="kr">type</span> <span class="kt">Lens</span> <span class="n">s</span> <span class="n">a</span> <span class="ow">=</span> <span class="kt">Functor</span> <span class="n">f</span> <span class="ow">=></span> <span class="p">(</span><span class="n">a</span> <span class="ow">-></span> <span class="n">f</span> <span class="n">a</span><span class="p">)</span> <span class="ow">-></span> <span class="n">s</span> <span class="ow">-></span> <span class="n">f</span> <span class="n">s</span>
</code></pre></div>
<p>i.e. a function that takes two arguments</p>
<ul>
<li>first a function from some type <code>a</code> to a <code>f</code>unctor over that type</li>
<li>the second a <code>s</code>tructure</li>
</ul>
<p>and returns the same <code>f</code>unctor, but over the <code>s</code>tructure type.</p>
<h4>Untyped van Laarhoven lenses</h4>
<p>To some, this section is totally backwards and probably sacrilege, since the
representation is usually derived by reasoning about types, but exploring the
mechanics in conventional Clojure can be interesting.</p>
<p>The basic "trick" is that, since lenses will be written take arbitrary functors,
we will be able to pick specific ones that twist the lens function into
the right sort of accessor function.</p>
<p>A really simplistic functor can be built in Clojure using protocols. Since
protocol methods are dispatched based on their first argument, we'll need
to implement a backwards <code>p-fmap</code> method
that takes the container-like thing
first, and then wrap the method call in an <code>fmap</code> function to reverse the
arguments.</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="nf">t/defprotocol</span> <span class="nv">IFunctor</span>
<span class="p">(</span><span class="nf">p-fmap</span> <span class="p">[</span><span class="nv">this</span> <span class="nv">fun</span><span class="p">]))</span>
<span class="p">(</span><span class="kd">defn </span><span class="nv">fmap</span> <span class="p">[</span><span class="nv">fun</span> <span class="nv">c</span><span class="p">]</span> <span class="p">(</span><span class="nf">p-fmap</span> <span class="nv">c</span> <span class="nv">fun</span><span class="p">))</span>
</code></pre></div>
<p>We could implement a very boring functor,</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="kd">defrecord </span><span class="nv">Holder</span> <span class="p">[</span><span class="nv">thing</span><span class="p">]</span>
<span class="nv">IFunctor</span>
<span class="p">(</span><span class="nf">p-fmap</span> <span class="p">[{</span><span class="nv">thing</span> <span class="ss">:thing</span><span class="p">}</span> <span class="nv">fun</span><span class="p">]</span> <span class="p">(</span><span class="nf">->Holder</span> <span class="p">(</span><span class="nf">fun</span> <span class="nv">thing</span><span class="p">)))</span>
</code></pre></div>
<p>or, for the ultimate in boring,</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="kd">defrecord </span><span class="nv">Const</span> <span class="p">[</span><span class="nv">getConst</span><span class="p">]</span>
<span class="nv">IFunctor</span>
<span class="p">(</span><span class="nf">p-fmap</span> <span class="p">[</span><span class="nv">this</span> <span class="nv">_</span><span class="p">]</span> <span class="nv">this</span><span class="p">))</span>
</code></pre></div>
<p>which totally ignores the function, leaving the <code>getConst</code> field unchanged.</p>
<p>While boring, <code>Const</code> is not completely useless, because we can define</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="kd">defn </span><span class="nv">view</span> <span class="p">[</span><span class="nv">lens</span> <span class="nv">s</span><span class="p">]</span> <span class="p">(</span><span class="ss">:getConst</span> <span class="p">(</span><span class="nf">lens</span> <span class="nv">->Const</span> <span class="nv">s</span><span class="p">)))</span>
</code></pre></div>
<p>where, happily, the lens is being passed exactly the right arguments for
a <code>Lens</code>:</p>
<ol>
<li><code>->Const</code>, which is a function from something to a <code>Const</code> functor over it,</li>
<li>a structure, <code>s</code>,</li>
</ol>
<p>and we expect to get back a <code>Const</code> functor over the structure, which we can
extract with <code>:getConst</code>.</p>
<p>For concreteness, let's use as our structure a vector pair, like <code>[1 2]</code>,
and write a lens to get at the first element:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">l-1</span> <span class="p">[</span><span class="nv">x->Fx</span> <span class="nv">pair</span><span class="p">]</span> <span class="p">(</span><span class="nf">fmap</span> <span class="o">#</span><span class="p">(</span><span class="nb">vector </span><span class="nv">%</span> <span class="p">(</span><span class="nb">second </span><span class="nv">pair</span><span class="p">))</span>
<span class="p">(</span><span class="nf">x->Fx</span> <span class="p">(</span><span class="nb">first </span><span class="nv">pair</span><span class="p">))))</span>
</code></pre></div>
<p>Then <code>(view l-1 [3 4])</code> evaluates as:</p>
<ol>
<li><code>(:getConst (l-1 ->Const [3 4]))</code></li>
<li><code>(:getConst (fmap #(vector % 4) (->Const 3)))</code></li>
<li><code>(:getConst (p-fmap (->Const 3) #(vector % 4)))</code></li>
<li><code>(:getConst (->Const 3))</code></li>
<li><code>3</code></li>
</ol>
<p>If were only going to use <code>l-1</code> with the <code>Const</code> functor, you'd wonder
why we bothered typing out <code>#(vector % (second pair))</code>,
as it was destined to be ignored. Fortunately, there's another functor we can
throw at it:</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="kd">defrecord </span><span class="nv">Identity</span> <span class="p">[</span><span class="nv">runIdentity</span><span class="p">]</span>
<span class="nv">IFunctor</span>
<span class="p">(</span><span class="nf">p-fmap</span> <span class="p">[{</span><span class="nv">x</span> <span class="ss">:runIdentity</span><span class="p">}</span> <span class="nv">fun</span><span class="p">]</span> <span class="p">(</span><span class="nf">->Identity</span> <span class="p">(</span><span class="nf">fun</span> <span class="nv">x</span><span class="p">))))</span>
<span class="p">(</span><span class="kd">defn </span><span class="nv">lset</span> <span class="p">[</span><span class="nv">lens</span> <span class="nv">x</span> <span class="nv">s</span><span class="p">]</span> <span class="p">(</span><span class="ss">:runIdentity</span> <span class="p">(</span><span class="nf">lens</span> <span class="p">(</span><span class="nb">constantly </span><span class="p">(</span><span class="nf">->Identity</span> <span class="nv">x</span><span class="p">))</span> <span class="nv">s</span><span class="p">)))</span>
</code></pre></div>
<p>(We can't use <code>set</code>, because of the existing <code>clojure.core/set</code>.)
Here, <code>constantly</code> has the effect of ignoring the original occupant, while
we no longer ignore the mapping function.
<code>(lset l-1 5 [3 4])</code> evaluates as:</p>
<ol>
<li><code>(:runIdentity (l-1 (constantly (-> Identity 5)) [3 4]))</code></li>
<li><code>(:runIdentity (fmap #(vector % 4) ((constantly (-> Identity 5)) 3)))</code></li>
<li><code>(:runIdentity (p-fmap (->Identity 5) #(vector % 4)))</code></li>
<li><code>(:runIdentity (->Identity [5 4]))</code></li>
<li><code>[5 4]</code></li>
</ol>
<p>In fact, the set operation us usually defined in terms of something called <code>over</code>,
which lets you apply a function to the focal point:</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="kd">defn </span><span class="nv">over</span> <span class="p">[</span><span class="nv">ln</span> <span class="nv">f</span> <span class="nv">s</span><span class="p">]</span> <span class="p">(</span><span class="ss">:runIdentity</span> <span class="p">(</span><span class="nf">ln</span> <span class="o">#</span><span class="p">(</span><span class="nf">->Identity</span> <span class="p">(</span><span class="nf">f</span> <span class="nv">%</span><span class="p">))</span> <span class="nv">s</span><span class="p">)))</span>
<span class="p">(</span><span class="kd">defn </span><span class="nv">lset</span> <span class="p">[</span><span class="nv">ln</span> <span class="nv">x</span> <span class="nv">s</span><span class="p">]</span> <span class="p">(</span><span class="nf">over</span> <span class="nv">ln</span> <span class="p">(</span><span class="nb">constantly </span><span class="nv">x</span><span class="p">)</span> <span class="nv">s</span><span class="p">))</span>
</code></pre></div>
<p>You can verify that <code>(over l-1 inc [3 4])</code> returns <code>[4 4]</code>.</p>
<h4>Composition of lenses</h4>
<p>One nice aspect of this representation is that, the lenses being ordinary
functions, they can be composed. Let's say we have another lens, for
accessing the <code>:foo</code> element of a hashmap. It's the same pattern as before.
The first argument of <code>fmap</code> is a function that implants an element in the
structure, and the second argument is the extraction of that element, wrapped
in the input function:</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="kd">defn </span><span class="nv">l</span><span class="ss">:foo</span> <span class="p">[</span><span class="nv">x->Fx</span> <span class="nv">hm</span><span class="p">]</span>
<span class="p">(</span><span class="nf">fmap</span> <span class="o">#</span><span class="p">(</span><span class="nb">assoc </span><span class="nv">hm</span> <span class="ss">:foo</span> <span class="nv">%</span><span class="p">)</span>
<span class="p">(</span><span class="nf">x->Fx</span> <span class="p">(</span><span class="ss">:foo</span> <span class="nv">hm</span><span class="p">))))</span>
</code></pre></div>
<p>It's tempting to write a macro for building these things out of traditional
getters and setters</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defmacro </span><span class="nv">deflens</span> <span class="p">[</span><span class="nv">lname</span> <span class="nv">implant</span> <span class="nv">extract</span><span class="p">]</span>
<span class="o">`</span><span class="p">(</span><span class="kd">defn </span><span class="o">~</span><span class="nv">lname</span> <span class="p">[</span><span class="nv">x->Fx#</span> <span class="nv">s#</span><span class="p">]</span>
<span class="p">(</span><span class="nf">fmap</span> <span class="p">(</span><span class="k">fn </span><span class="p">[</span><span class="nv">x#</span><span class="p">]</span> <span class="p">(</span><span class="o">~</span><span class="nv">implant</span> <span class="nv">s#</span> <span class="nv">x#</span><span class="p">))</span>
<span class="p">(</span><span class="nf">x->Fx#</span> <span class="p">(</span><span class="o">~</span><span class="nv">extract</span> <span class="nv">s#</span><span class="p">)))))</span>
</code></pre></div>
<p>with which we might have written:</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="nf">deflens</span> <span class="nv">l</span><span class="ss">:foo</span> <span class="o">#</span><span class="p">(</span><span class="nb">assoc </span><span class="nv">%1</span> <span class="ss">:foo</span> <span class="nv">%2</span><span class="p">)</span> <span class="ss">:foo</span><span class="p">)</span>
</code></pre></div>
<p>The alert smartass will now be asking why, having made such a big deal about
representing the bidirectional lens in a single function, rather than as
separate functions for each direction, we're now writing convenience tools
for building the single function out of the separate functions. The riposte
to this question is that lenses, being functions, can be composed.</p>
<p>That is, they could be composed if they were unary rather than binary
functions, which they would be in Haskell, since everything is curried
there: <code>(a -> f a) -> s -> f s</code> is equivalent to
<code>(a -> f a) -> (s -> f s)</code>. To get the same effect in Clojure, we need
to curry and uncurry explicitly, e.g.</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="kd">defn </span><span class="nv">curry</span> <span class="p">[</span><span class="nv">f2</span><span class="p">]</span>
<span class="p">(</span><span class="k">fn </span><span class="p">[</span><span class="nv">x</span><span class="p">]</span> <span class="p">(</span><span class="k">fn </span><span class="p">[</span><span class="nv">y</span><span class="p">]</span> <span class="p">(</span><span class="nf">f2</span> <span class="nv">x</span> <span class="nv">y</span><span class="p">))))</span>
<span class="p">(</span><span class="kd">defn </span><span class="nv">uncurry</span> <span class="p">[</span><span class="nv">f1</span><span class="p">]</span>
<span class="p">(</span><span class="k">fn </span><span class="p">[</span><span class="nv">x</span> <span class="nv">y</span><span class="p">]</span> <span class="p">((</span><span class="nf">f1</span> <span class="nv">x</span><span class="p">)</span> <span class="nv">y</span><span class="p">)))</span>
</code></pre></div>
<p>with which we can now compose our two lenses like this:</p>
<div class="highlight"><pre><span></span><code> <span class="nv">user></span> <span class="p">(</span><span class="nf">lset</span> <span class="p">(</span><span class="nf">uncurry</span> <span class="p">(</span><span class="nb">comp </span><span class="p">(</span><span class="nf">curry</span> <span class="nv">l</span><span class="ss">:foo</span><span class="p">)</span> <span class="p">(</span><span class="nf">curry</span> <span class="nv">l-1</span><span class="p">)))</span> <span class="mi">9</span> <span class="p">{</span><span class="ss">:foo</span> <span class="p">[</span><span class="mi">1</span> <span class="mi">2</span><span class="p">]})</span>
<span class="p">{</span><span class="ss">:foo</span> <span class="p">[</span><span class="mi">9</span> <span class="mi">2</span><span class="p">]}</span>
</code></pre></div>
<p>The process can be streamlined in all sorts of ways, like</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="kd">defn </span><span class="nv">lcomp</span> <span class="p">[</span><span class="o">&</span> <span class="nv">ls</span><span class="p">]</span> <span class="p">(</span><span class="nf">uncurry</span> <span class="p">(</span><span class="nb">apply comp </span><span class="p">(</span><span class="nb">map </span><span class="nv">curry</span> <span class="nv">ls</span><span class="p">))))</span>
</code></pre></div>
<p>so we can just do <code>(lset (lcomp l:foo l-1) ...)</code>. One might be (or maybe was)
tempted to write more general utilities along these lines, but that would
needlessly complicate the next section of this post.</p>
<h4>Typed var Laarhoven lenses</h4>
<p>The original mission was to explore lenses in typed Clojure. The mission would be easier
were <code>core.typed</code> fully implemented, documented and tested, but it's sort of not,
especially in the mad interzone of protocols and higher kinded types.
(<strong>Important:</strong> This isn't to disparage the project in any way. It's a colossal achievement for
<a href="https://github.com/clojure/core.typed/graphs/contributors">about 1.03 people</a>, who
are the first to admit that it's not done yet.)</p>
<p>There are at least a few examples of people not quite getting functors to work. The protocol declaration is
lifted from a <a href="https://groups.google.com/d/msg/clojure-core-typed/H_1mHmB7sc8/NsekVwkb2HsJ">Google groups response by Ambrose</a> to one of them:</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="nf">t/ann-protocol</span> <span class="p">[</span> <span class="p">[</span><span class="nv">a</span> <span class="ss">:variance</span> <span class="ss">:covariant</span><span class="p">]</span> <span class="p">]</span> <span class="nv">IFunctor</span>
<span class="nv">p-fmap</span>
<span class="p">(</span><span class="nf">t/All</span> <span class="p">[</span><span class="nv">b</span><span class="p">]</span> <span class="p">(</span><span class="nf">t/IFn</span> <span class="p">[(</span><span class="nf">IFunctor</span> <span class="nv">a</span><span class="p">)</span> <span class="p">[</span><span class="nv">a</span> <span class="nb">-> </span><span class="nv">b</span><span class="p">]</span> <span class="nb">-> </span><span class="p">(</span><span class="nf">IFunctor</span> <span class="nv">b</span><span class="p">)])))</span>
<span class="p">(</span><span class="nf">t/defprotocol</span> <span class="nv">IFunctor</span>
<span class="p">(</span><span class="nf">p-fmap</span> <span class="p">[</span><span class="nv">this</span> <span class="nv">fun</span><span class="p">]))</span>
</code></pre></div>
<p>This is reassuringly similar to the Haskell declaration</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="nf">t/All</span> <span class="p">[</span><span class="nv">b</span><span class="p">]</span> <span class="p">(</span><span class="nf">t/IFn</span> <span class="p">[(</span><span class="nf">IFunctor</span> <span class="nv">a</span><span class="p">)</span> <span class="p">[</span><span class="nv">a</span> <span class="nb">-> </span><span class="nv">b</span><span class="p">]</span> <span class="nb">-> </span> <span class="p">(</span><span class="nf">IFunctor</span> <span class="nv">b</span><span class="p">)])))</span>
<span class="c1">;; fmap :: Functor f => (a -> b) -> (f a) -> (f b)</span>
</code></pre></div>
<p>but the differences are important.</p>
<ol>
<li>Foremost is that the specific implementation
of <code>fmap</code> will be determined at runtime by dynamic dispatch<sup id="fnref:dispatch"><a class="footnote-ref" href="#fn:dispatch">1</a></sup> on the subtype,
rather than chosen by matching type at compile time. We <em>cannot possibly</em> do the latter,
because Clojure typing is completely separate from compilation. This is why we
get/have to specify variance; subtyping and therefore variance are innate to JVM
languages but not to Haskell.</li>
<li>Dynamic dispatch is also behind the reversal of the arguments to <code>fmap</code>, as noted earlier.</li>
<li>And by arguments we mean actual multiple arguments to a single function, rather than, multiple
functions, each of one argument, returning another function of one argument: by conscious design
choice, Clojure does not automatically curry, so we have one less <code>-></code>.</li>
<li>We don't take malicious pleasure in using the symbol <code>f</code> to mean both
function and functor.</li>
</ol>
<p>Ambrose warned in the aforementioned response that it "gets messier if you want to abstract over the Functor,"
but, that being the whole point of this exercise, we soldier on. <code>Const</code> is not that hard</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="nf">t/ann-record</span> <span class="p">[</span> <span class="p">[</span><span class="nv">a</span> <span class="ss">:variance</span> <span class="ss">:covariant</span><span class="p">]</span> <span class="p">]</span> <span class="nv">Const</span> <span class="p">[</span><span class="nv">getConst</span> <span class="ss">:-</span> <span class="nv">a</span><span class="p">])</span>
<span class="p">(</span><span class="kd">defrecord </span><span class="nv">Const</span> <span class="p">[</span><span class="nv">getConst</span><span class="p">]</span>
<span class="nv">IFunctor</span>
<span class="p">(</span><span class="nf">p-fmap</span> <span class="p">[</span><span class="nv">this</span> <span class="nv">fun</span><span class="p">]</span> <span class="nv">this</span><span class="p">))</span>
</code></pre></div>
<p>because, as we completely ignore the <code>fun</code>ction, we don't have to worry about specifying its type.</p>
<p>But in <code>Identity</code> we do use the function,</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="nf">t/ann-record</span> <span class="p">[</span> <span class="p">[</span><span class="nv">a</span> <span class="ss">:variance</span> <span class="ss">:covariant</span><span class="p">]</span> <span class="p">]</span> <span class="nv">Identity</span> <span class="p">[</span><span class="nv">runIdentity</span> <span class="ss">:-</span> <span class="nv">a</span><span class="p">])</span>
<span class="p">(</span><span class="kd">defrecord </span><span class="nv">Identity</span> <span class="p">[</span><span class="nv">runIdentity</span><span class="p">]</span>
<span class="nv">IFunctor</span>
<span class="c1">;; Can't actually do this, because fun hasn't been </span>
<span class="p">(</span><span class="nf">p-fmap</span> <span class="p">[</span><span class="nv">this</span> <span class="nv">fun</span><span class="p">]</span> <span class="p">(</span><span class="nf">->Identity</span> <span class="p">(</span><span class="nf">fun</span> <span class="p">(</span><span class="ss">:runIdentity</span> <span class="nv">this</span><span class="p">)))))</span>
</code></pre></div>
<p>so the above won't typecheck.</p>
<p>We need to define and annotate <code>fun</code> before it's used, but after
<code>->Identity</code> has been defined, which requires extending <code>Identity</code> explicitly, rather than inline:</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="nf">t/ann-record</span> <span class="p">[</span> <span class="p">[</span><span class="nv">a</span> <span class="ss">:variance</span> <span class="ss">:covariant</span><span class="p">]</span> <span class="p">]</span> <span class="nv">Identity</span> <span class="p">[</span><span class="nv">runIdentity</span> <span class="ss">:-</span> <span class="nv">a</span><span class="p">])</span>
<span class="p">(</span><span class="kd">defrecord </span><span class="nv">Identity</span> <span class="p">[</span><span class="nv">runIdentity</span><span class="p">])</span>
<span class="c1">;; Now ->Identity exists</span>
<span class="p">(</span><span class="nf">t/ann</span> <span class="nv">identity-fmap</span> <span class="p">(</span><span class="nf">t/All</span> <span class="p">[</span><span class="nv">a</span> <span class="nv">b</span><span class="p">]</span> <span class="p">(</span><span class="nf">t/IFn</span> <span class="p">[(</span><span class="nf">Identity</span> <span class="nv">a</span><span class="p">)</span> <span class="p">[</span><span class="nv">a</span> <span class="nb">-> </span><span class="nv">b</span><span class="p">]</span> <span class="nb">-> </span><span class="p">(</span><span class="nf">Identity</span> <span class="nv">b</span><span class="p">)])))</span>
<span class="p">(</span><span class="kd">defn </span> <span class="nv">identity-fmap</span> <span class="p">[</span><span class="nv">this</span> <span class="nv">f</span><span class="p">]</span> <span class="p">(</span><span class="nf">->Identity</span> <span class="p">(</span><span class="nf">f</span> <span class="p">(</span><span class="ss">:runIdentity</span> <span class="nv">this</span><span class="p">))))</span>
<span class="c1">;; Now we have an fmap with a type.</span>
<span class="p">(</span><span class="nf">extend</span> <span class="nv">Identity</span>
<span class="nv">IFunctor</span>
<span class="p">{</span><span class="ss">:p-fmap</span> <span class="nv">identity-fmap</span><span class="p">}</span> <span class="p">)</span>
</code></pre></div>
<p>The annotation for <code>identity-fmap</code> is pretty much the same as for <code>p-fmap</code> in the <code>IFunctor</code>
declaration, except specific to <code>Identity</code>.</p>
<p>Unfortunately, it still doesn't work:</p>
<div class="highlight"><pre><span></span><code><span class="nl">Expected</span><span class="p">:</span> <span class="p">(</span><span class="n">clojure</span><span class="p">.</span><span class="n">core</span><span class="p">.</span><span class="n">typed</span><span class="o">/</span><span class="nl">HMap</span> <span class="p">:</span><span class="n">optional</span> <span class="p">{</span><span class="o">:</span><span class="n">p</span><span class="o">-</span><span class="n">fmap</span> <span class="p">[(</span><span class="n">Identity</span> <span class="n">clojure</span><span class="p">.</span><span class="n">core</span><span class="p">.</span><span class="n">typed</span><span class="o">/</span><span class="n">Any</span><span class="p">)</span> <span class="n">clojure</span><span class="p">.</span><span class="n">core</span><span class="p">.</span><span class="n">typed</span><span class="o">/</span><span class="n">Any</span> <span class="o">-></span> <span class="n">clojure</span><span class="p">.</span><span class="n">core</span><span class="p">.</span><span class="n">typed</span><span class="o">/</span><span class="n">Any</span><span class="p">]})</span>
<span class="nl">Actual</span><span class="p">:</span> <span class="p">(</span><span class="n">clojure</span><span class="p">.</span><span class="n">core</span><span class="p">.</span><span class="n">typed</span><span class="o">/</span><span class="nl">HMap</span> <span class="p">:</span><span class="n">mandatory</span> <span class="p">{</span><span class="o">:</span><span class="n">p</span><span class="o">-</span><span class="n">fmap</span> <span class="p">(</span><span class="n">clojure</span><span class="p">.</span><span class="n">core</span><span class="p">.</span><span class="n">typed</span><span class="o">/</span><span class="n">All</span> <span class="p">[</span><span class="n">a</span> <span class="n">b</span><span class="p">]</span> <span class="p">[(</span><span class="n">Identity</span> <span class="n">a</span><span class="p">)</span> <span class="p">[</span><span class="n">a</span> <span class="o">-></span> <span class="n">b</span><span class="p">]</span> <span class="o">-></span> <span class="p">(</span><span class="n">Identity</span> <span class="n">b</span><span class="p">)])}</span> <span class="o">:</span><span class="n">complete</span><span class="o">?</span> <span class="nb">true</span><span class="p">)</span>
</code></pre></div>
<p>This is frightening but, I believe, spurious. The <code>:mandatory</code> vs <code>:optional</code> shouldn't be important; that
just means that we weren't required to have implemented <code>p-fmap</code> in this <code>extend</code> but could have done so
later. More worryingly,</p>
<ol>
<li>The type-checker is expecting four <code>Any</code>s, rather than a constrained arrangement of two types,
as if we had declared <code>p-fmap</code> to be
<code>(t/IFn [(IFunctor a) [b -> c] -> (IFunctor d)])))</code> rather than
<code>(t/IFn [(IFunctor a) [a -> b] -> (IFunctor b)])))</code>.</li>
<li>Even so, the more specific case ought, I think, to be palatable to the more general one.</li>
</ol>
<p>So, we take the batteries out of the smoke detector and go back to sleep</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="nf">t/ann</span> <span class="o">^</span><span class="ss">:no-check</span> <span class="nv">identity-fmap</span> <span class="p">(</span><span class="nf">t/All</span> <span class="p">[</span><span class="nv">a</span><span class="p">]</span> <span class="p">(</span><span class="nf">t/IFn</span> <span class="p">[(</span><span class="nf">Identity</span> <span class="nv">a</span><span class="p">)</span> <span class="nv">t/Any</span> <span class="nb">-> </span><span class="nv">t/Any</span><span class="p">])))</span>
</code></pre></div>
<p>only to be rudely awakened when we try to annotate <code>fmap</code>:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="nf">t/ann</span> <span class="nv">fmap</span> <span class="p">(</span><span class="nf">t/All</span> <span class="p">[</span><span class="nv">a</span> <span class="nv">b</span><span class="p">]</span> <span class="p">(</span><span class="nf">t/IFn</span> <span class="p">[[</span><span class="nv">a</span> <span class="nb">-> </span><span class="nv">b</span><span class="p">]</span> <span class="p">(</span><span class="nf">IFunctor</span> <span class="nv">a</span><span class="p">)</span> <span class="nb">-> </span><span class="p">(</span><span class="nf">IFunctor</span> <span class="nv">b</span><span class="p">)])</span> <span class="p">))</span>
<span class="p">(</span><span class="kd">defn </span> <span class="nv">fmap</span> <span class="p">[</span><span class="nv">fun</span> <span class="nv">c</span><span class="p">]</span> <span class="p">(</span><span class="nf">p-fmap</span> <span class="nv">c</span> <span class="nv">fun</span><span class="p">))</span>
</code></pre></div>
<p>The first problem is that protocols cannot, apparently, be used as type functions, and we get
an error to this effect. It's a little surprising, since we were able to use <code>(IFunctor a)</code>
within the definition of the <code>IFunctor</code> protocol itself. Trawling the mailing list,
we somehow find a posting containing a link to a
<a href="https://gist.githubusercontent.com/leonardoborges/7347cfb0684d5b76a264/raw/1d0b26dae0d06e1e76dbfa7fc9abb4319ccee6bd/example_typed.clj">gist</a> that defines a type function explicitly</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="nf">t/defalias</span> <span class="nv">Functor</span> <span class="p">(</span><span class="nf">t/TFn</span> <span class="p">[</span> <span class="p">[</span><span class="nv">a</span> <span class="ss">:variance</span> <span class="ss">:covariant</span><span class="p">]</span> <span class="p">]</span> <span class="p">(</span><span class="nf">Extends</span> <span class="p">[(</span><span class="nf">IFunctor</span> <span class="nv">a</span><span class="p">)])</span> <span class="p">))</span>
</code></pre></div>
<p>using the <code>Extends</code> keyword, which is <code>ack</code>able in the <code>core.typed</code> source but not otherwise
advertised. With the legitimate <code>Functor</code> typefunction we make it past another few lines of code</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="nf">t/ann</span> <span class="nv">fmap</span> <span class="p">(</span><span class="nf">t/All</span> <span class="p">[</span><span class="nv">a</span> <span class="nv">b</span><span class="p">]</span> <span class="p">(</span><span class="nf">t/IFn</span> <span class="p">[[</span><span class="nv">a</span> <span class="nb">-> </span><span class="nv">b</span><span class="p">]</span> <span class="p">(</span><span class="nf">Functor</span> <span class="nv">a</span><span class="p">)</span> <span class="nb">-> </span><span class="p">(</span><span class="nf">Functor</span> <span class="nv">b</span><span class="p">)])))</span>
<span class="p">(</span><span class="kd">defn </span> <span class="nv">fmap</span> <span class="p">[</span><span class="nv">f</span> <span class="nv">c</span><span class="p">]</span> <span class="p">(</span><span class="nf">p-fmap</span> <span class="nv">c</span> <span class="nv">f</span><span class="p">))</span>
</code></pre></div>
<p>and define the <code>Lens</code> type</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="nf">t/defalias</span> <span class="nv">Lens</span> <span class="p">(</span><span class="nf">t/TFn</span> <span class="p">[[</span><span class="nv">s</span> <span class="ss">:variance</span> <span class="ss">:invariant</span><span class="p">]</span>
<span class="p">[</span><span class="nv">a</span> <span class="ss">:variance</span> <span class="ss">:invariant</span><span class="p">]]</span>
<span class="p">[[</span><span class="nv">a</span> <span class="nb">-> </span><span class="p">(</span><span class="nf">Functor</span> <span class="nv">a</span><span class="p">)]</span> <span class="nv">s</span> <span class="nb">-> </span><span class="p">(</span><span class="nf">Functor</span> <span class="nv">s</span><span class="p">)]</span> <span class="p">))</span>
</code></pre></div>
<p>and write the <code>l-1</code> lens</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="s s-Atom">t</span><span class="o">/</span><span class="s s-Atom">ann</span> <span class="s s-Atom">l</span><span class="o">-</span><span class="mi">1</span> <span class="p">(</span><span class="nv">Lens</span> <span class="p">(</span><span class="s s-Atom">t</span><span class="o">/</span><span class="nv">HVec</span> <span class="p">[</span><span class="s s-Atom">t</span><span class="o">/</span><span class="nv">Int</span> <span class="s s-Atom">t</span><span class="o">/</span><span class="nv">Int</span><span class="p">])</span> <span class="s s-Atom">t</span><span class="o">/</span><span class="nv">Int</span><span class="p">))</span>
<span class="p">(</span><span class="s s-Atom">defn</span> <span class="s s-Atom">l</span><span class="o">-</span><span class="mi">1</span> <span class="p">[</span><span class="s s-Atom">f</span> <span class="s s-Atom">xy</span><span class="p">]</span>
<span class="p">(</span><span class="nf">fmap</span> <span class="p">(</span><span class="s s-Atom">t</span><span class="o">/</span><span class="s s-Atom">fn</span> <span class="p">[</span><span class="nf">x</span> <span class="o">:-</span> <span class="s s-Atom">t</span><span class="o">/</span><span class="nv">Int</span><span class="p">]</span> <span class="p">:-</span> <span class="p">(</span><span class="s s-Atom">t</span><span class="o">/</span><span class="nv">HVec</span> <span class="p">[</span><span class="s s-Atom">t</span><span class="o">/</span><span class="nv">Int</span> <span class="s s-Atom">t</span><span class="o">/</span><span class="nv">Int</span><span class="p">])</span>
<span class="p">(</span><span class="s s-Atom">vector</span> <span class="nf">x</span> <span class="p">(</span><span class="s s-Atom">isecond</span> <span class="s s-Atom">xy</span><span class="p">)))</span>
<span class="p">(</span><span class="nf">f</span> <span class="p">(</span><span class="s s-Atom">ifirst</span> <span class="s s-Atom">xy</span><span class="p">))))</span>
</code></pre></div>
<p>where <code>ifirst</code> and <code>isecond</code> are specialized to pairs of integers:</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="nf">t/ann</span> <span class="o">^</span><span class="ss">:no-check</span> <span class="nv">isecond</span> <span class="p">[(</span><span class="nf">t/HVec</span> <span class="p">[</span><span class="nv">t/Int</span> <span class="nv">t/Int</span><span class="p">])</span> <span class="nb">-> </span><span class="nv">t/Int</span><span class="p">]</span> <span class="p">)</span>
<span class="p">(</span><span class="k">def </span><span class="nv">isecond</span> <span class="nv">second</span><span class="p">)</span>
<span class="p">(</span><span class="nf">t/ann</span> <span class="o">^</span><span class="ss">:no-check</span> <span class="nv">ifirst</span> <span class="p">[(</span><span class="nf">t/HVec</span> <span class="p">[</span><span class="nv">t/Int</span> <span class="nv">t/Int</span><span class="p">])</span> <span class="nb">-> </span><span class="nv">t/Int</span><span class="p">])</span>
<span class="p">(</span><span class="k">def </span><span class="nv">ifirst</span> <span class="nv">first</span><span class="p">)</span>
</code></pre></div>
<p>Getting back to our story,</p>
<div class="highlight"><pre><span></span><code> A few speeches in this vein - and evil counsels carried the day.
They undid the bag, the Winds all rushed out, and in an instant
the tempest was upon them, carrying them headlong out to sea.
They had good reason for their tears: Ithaca was vanishing
astern. As for myself, when I awoke to this, my spirit failed me
and I had half a mind to jump overboard and drown myself in
the sea rather than stay alive and quietly accept such a calamity.
However, I steeled myself to bear it, and covering my head with
my cloak I lay where I was in the ship. So the whole fleet was
driven back again to the Aeolian Isle by that accursed storm, and
in it my repentant crews.
</code></pre></div>
<p>or, in more modern translation,</p>
<div class="highlight"><pre><span></span><code>Type Error ... Invalid operator to type application: (IFunctor a)
ExceptionInfo Type Checker: Found 1 error clojure.core/ex-info (core.clj:4403)
</code></pre></div>
<p>I thought we had well and truly buried <code>IFunctor</code>, but since</p>
<ol>
<li>it's still causing trouble,</li>
<li>we know in advance that <code>Identity</code> and <code>Const</code> are the only functors we'll actually use,</li>
<li>typed clojure supports union types,</li>
<li>and I have no shame,</li>
</ol>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="nf">t/defalias</span> <span class="nv">DumbFunctor</span>
<span class="p">(</span><span class="nf">t/TFn</span> <span class="p">[</span> <span class="p">[</span><span class="nv">a</span> <span class="ss">:variance</span> <span class="ss">:covariant</span><span class="p">]</span> <span class="p">]</span> <span class="p">(</span><span class="nf">t/U</span> <span class="p">(</span><span class="nf">Identity</span> <span class="nv">a</span><span class="p">)</span> <span class="p">(</span><span class="nf">Const</span> <span class="nv">a</span><span class="p">))))</span>
</code></pre></div>
<p>With <code>Functor</code> so redefined, everything type-checks, even</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="nf">t/ann</span> <span class="nv">l</span><span class="ss">:foo</span> <span class="p">(</span><span class="nf">Lens</span> <span class="p">(</span><span class="nf">t/HMap</span> <span class="ss">:mandatory</span> <span class="p">{</span><span class="ss">:foo</span> <span class="p">(</span><span class="nf">t/HVec</span> <span class="p">[</span><span class="nv">t/Int</span> <span class="nv">t/Int</span><span class="p">])})</span>
<span class="p">(</span><span class="nf">t/HVec</span> <span class="p">[</span><span class="nv">t/Int</span> <span class="nv">t/Int</span><span class="p">])))</span>
<span class="p">(</span><span class="kd">defn </span><span class="nv">l</span><span class="ss">:foo</span> <span class="p">[</span><span class="nv">f</span> <span class="nv">m</span><span class="p">]</span>
<span class="p">(</span><span class="nf">fmap</span>
<span class="p">(</span><span class="nf">t/fn</span> <span class="p">[</span><span class="nv">x</span> <span class="ss">:-</span> <span class="p">(</span><span class="nf">t/HVec</span> <span class="p">[</span><span class="nv">t/Int</span> <span class="nv">t/Int</span><span class="p">])]</span> <span class="ss">:-</span> <span class="p">(</span><span class="nf">t/HMap</span> <span class="ss">:mandatory</span> <span class="p">{</span><span class="ss">:foo</span> <span class="p">(</span><span class="nf">t/HVec</span> <span class="p">[</span><span class="nv">t/Int</span> <span class="nv">t/Int</span><span class="p">])})</span>
<span class="p">(</span><span class="nb">assoc </span><span class="nv">m</span> <span class="ss">:foo</span> <span class="nv">x</span><span class="p">))</span>
<span class="p">(</span><span class="nf">f</span> <span class="p">(</span><span class="ss">:foo</span> <span class="nv">m</span><span class="p">))))</span>
</code></pre></div>
<p>Thanks be to Athena, the annotations of <code>curry</code>, <code>uncurry</code> and <code>lcomp</code>, while verbose,
work perfectly</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="nf">t/ann</span> <span class="nv">curry</span> <span class="p">(</span><span class="nf">t/All</span> <span class="p">[</span><span class="nv">a</span> <span class="nv">b</span> <span class="nv">c</span><span class="p">]</span>
<span class="p">[[</span><span class="nv">a</span> <span class="nv">b</span> <span class="nb">-> </span><span class="nv">c</span><span class="p">]</span> <span class="nb">-> </span><span class="p">[</span><span class="nv">a</span> <span class="nb">-> </span><span class="p">[</span><span class="nv">b</span> <span class="nb">-> </span><span class="nv">c</span><span class="p">]]]))</span>
<span class="p">(</span><span class="nf">t/ann</span> <span class="nv">uncurry</span> <span class="p">(</span><span class="nf">t/All</span> <span class="p">[</span><span class="nv">a</span> <span class="nv">b</span> <span class="nv">c</span><span class="p">]</span>
<span class="p">[</span> <span class="p">[</span><span class="nv">a</span> <span class="nb">-> </span><span class="p">[</span><span class="nv">b</span> <span class="nb">-> </span><span class="nv">c</span><span class="p">]]</span> <span class="nb">-> </span><span class="p">[</span><span class="nv">a</span> <span class="nv">b</span> <span class="nb">-> </span><span class="nv">c</span><span class="p">]]))</span>
<span class="p">(</span><span class="nf">t/ann</span> <span class="nv">lcomp</span> <span class="p">(</span><span class="nf">t/All</span> <span class="p">[</span><span class="nv">a</span> <span class="nv">b</span> <span class="nv">c</span> <span class="nv">d</span> <span class="nv">e</span><span class="p">]</span>
<span class="p">[[[</span><span class="nv">b</span> <span class="nb">-> </span><span class="nv">c</span><span class="p">]</span> <span class="nv">d</span> <span class="nb">-> </span><span class="nv">e</span><span class="p">]</span>
<span class="p">[</span><span class="nv">a</span> <span class="nv">b</span> <span class="nb">-> </span><span class="nv">c</span><span class="p">]</span>
<span class="nv">-></span>
<span class="p">[</span><span class="nv">a</span> <span class="nv">d</span> <span class="nb">-> </span><span class="nv">e</span><span class="p">]]))</span>
</code></pre></div>
<p>and, drumroll please,</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="nf">view</span> <span class="p">(</span><span class="nf">lcomp</span> <span class="nv">l</span><span class="ss">:foo</span> <span class="nv">l-1</span><span class="p">)</span> <span class="p">{</span><span class="ss">:foo</span> <span class="p">[</span><span class="mi">1</span> <span class="mi">2</span><span class="p">]})</span>
</code></pre></div>
<p>typechecks! More importantly, infelicitous modifications of this do not, which suggests that,
notwithstanding compromises along the way, we can provide assurances of type safety to future
van Laarhoveners.</p>
<h4>But was it worth the trouble?</h4>
<p>Yes, for me. I learned a few things, and I'll probably learn more when you correct
my errors.</p>
<p>From a practical standpoint, I couldn't possibly recommend that anyone take this route in
real, production code today, and I'm not sure I would recommend it even after the
<code>core.typed</code> kinks have been worked out. The <code>tinhole</code> approach is, for
Clojure, more intuitive and concise, while offering no less type safety.</p>
<p>Quite simply, <code>Monad</code> and company are not natural fits for any language that
doesn't have sophisticated static typing fully integrated with its compiler. By
"fully integrated," I mean that the compiler produces different code for different types
signatures and arguments.</p>
<p>On the other hand, fancy macros <em>are</em> a natural fit for languages that are
meaningfully homoiconic. By "meaningfully homoiconic," I mean that it's
practical for normal people to write code-generating code, that such code is concise and
that it doesn't appear to be in a different language from the code it generates.</p>
<p>Homoiconicity does not substitute for type-checking, but, if it is innate to the language,
the type checker may not need to be, thus enabling the sinful pleasures of
dynamic typing while stll allowing a stricter regimen to be enforced.</p>
<h4>P.S.</h4>
<p>Transducers can easily described polymorphically with <code>core.typed</code> as</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="nf">t/defalias</span> <span class="nv">ReducingFn</span>
<span class="p">(</span><span class="nf">t/TFn</span> <span class="p">[[</span><span class="nv">a</span> <span class="ss">:variance</span> <span class="ss">:contravariant</span><span class="p">]</span>
<span class="p">[</span><span class="nv">r</span> <span class="ss">:variance</span> <span class="ss">:invariant</span><span class="p">]]</span>
<span class="p">[</span><span class="nv">r</span> <span class="nv">a</span> <span class="nb">-> </span><span class="nv">r</span><span class="p">]))</span>
<span class="p">(</span><span class="nf">t/defalias</span> <span class="nv">Transducer</span> <span class="p">(</span><span class="nf">t/TFn</span> <span class="p">[[</span><span class="nv">a</span> <span class="ss">:variance</span> <span class="ss">:covariant</span><span class="p">]</span>
<span class="p">[</span><span class="nv">b</span> <span class="ss">:variance</span> <span class="ss">:contravariant</span><span class="p">]]</span>
<span class="p">(</span><span class="nf">t/All</span> <span class="p">[</span><span class="nv">r</span><span class="p">]</span> <span class="p">[(</span><span class="nf">ReducingFn</span> <span class="nv">a</span> <span class="nv">r</span><span class="p">)</span> <span class="nb">-> </span><span class="p">(</span><span class="nf">ReducingFn</span> <span class="nv">b</span> <span class="nv">r</span><span class="p">)])))</span>
</code></pre></div>
<p>which is almost literally transcribed from
<a href="http://conscientiousprogrammer.com/blog/2014/08/07/understanding-cloure-transducers-through-types/">a blog post</a>
explaining transducers with Haskell.
(Note, however, the use of <code>ReducingFn</code> rather than <code>Reducer</code>, as in the post.
The reducer is the collection-like thing, not the function with which it is reduced.
The beauty of transducers is that they can be defined independently of the reducer.)</p>
<p>And they work:</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="nf">t/ann</span> <span class="nv">double-me</span> <span class="p">(</span><span class="nf">Transducer</span> <span class="nv">t/Int</span> <span class="nv">t/Int</span> <span class="nv">t/Int</span><span class="p">))</span>
<span class="p">(</span><span class="kd">defn </span> <span class="nv">double-me</span> <span class="p">[</span><span class="nv">reduction-function</span><span class="p">]</span>
<span class="p">(</span><span class="k">fn </span><span class="p">[</span><span class="nv">result</span> <span class="nv">input</span><span class="p">]</span>
<span class="p">(</span><span class="nf">reduction-function</span> <span class="nv">result</span> <span class="p">(</span><span class="nb">* </span><span class="mi">2</span> <span class="nv">input</span><span class="p">))))</span>
<span class="p">(</span><span class="nf">t/ann</span> <span class="nv">plus</span> <span class="p">(</span><span class="nf">ReducingFn</span> <span class="nv">t/Int</span> <span class="nv">t/Int</span><span class="p">))</span>
<span class="p">(</span><span class="k">def </span><span class="nv">plus</span> <span class="nv">+</span><span class="p">)</span>
<span class="p">(</span><span class="nb">println </span> <span class="p">(</span><span class="nb">reduce </span><span class="p">(</span><span class="nf">double-me</span> <span class="nv">plus</span><span class="p">)</span> <span class="mi">0</span> <span class="p">[</span><span class="mi">1</span> <span class="mi">2</span> <span class="mi">3</span><span class="p">]))</span>
<span class="c1">;; 12</span>
</code></pre></div>
<div class="footnote">
<hr>
<ol>
<li id="fn:dispatch">
<p>Single dispatch, specifically. <a class="footnote-backref" href="#fnref:dispatch" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
</ol>
</div>tinholes - performant, strongly typed lenses in Clojure2014-09-28T00:00:00-04:002014-09-28T00:00:00-04:00Peter Fraenkeltag:blog.podsnap.com,2014-09-28:/tinhole.html<p>In a <a href="http://blog.podsnap.com/pinhole.html">previous post</a>, I built up a framework
for lens-like constructs in Clojure: essentially some fancified versions of
<code>assoc-in</code> and <code>get-in</code> to allow for bidirectional transformations along the
nesting path and some utilities to generate special-purpose getter/setter functions.
The name, "pinhole," is supposed to suggest a more primitive, utilitarian mechanism for
achieving focus.</p>
<p>While still ruing (sort of)
<a href="http:/blog.podsnap.com/pinhole2.html">other mistakes</a>, I found myself worrying
that a triumphal sentence near the end of the piece</p>
<div class="highlight"><pre><span></span><code>What's more, thanks to the expressive power of dynamic Clojure,
and higher order functions, these lenses are not just simple to
use but …</code></pre></div><p>In a <a href="http://blog.podsnap.com/pinhole.html">previous post</a>, I built up a framework
for lens-like constructs in Clojure: essentially some fancified versions of
<code>assoc-in</code> and <code>get-in</code> to allow for bidirectional transformations along the
nesting path and some utilities to generate special-purpose getter/setter functions.
The name, "pinhole," is supposed to suggest a more primitive, utilitarian mechanism for
achieving focus.</p>
<p>While still ruing (sort of)
<a href="http:/blog.podsnap.com/pinhole2.html">other mistakes</a>, I found myself worrying
that a triumphal sentence near the end of the piece</p>
<div class="highlight"><pre><span></span><code>What's more, thanks to the expressive power of dynamic Clojure,
and higher order functions, these lenses are not just simple to
use but simple to create.
</code></pre></div>
<p>was somewhat off the mark. Thanks to shoddy writing, one can't be sure, but
if "dynamic Clojure" was referring to "dynamically typed Clojure," the sentence is
not just vague, but precisely wrong. Evidence that this is indeed what I meant is provided
by comparative references elsewhere in the piece to "type goodness" in the
Scalaz implementation.</p>
<p>The fact is that dynamic typing is not at all necessary for lens operations.
Moreover, it probably isn't
even necessary for most uses of the core <code>-in</code> functions.</p>
<h4>Want, want, want, want, want, want</h4>
<p>Last week, what I wanted was:</p>
<div class="highlight"><pre><span></span><code>1. Paths that allow arbitrary transformations along the lookup/retrieval path.
2. A convenient way to specify a dictionary of aliases to such paths.
3. The usual lensy guff of special-purpose getters, setters and updaters.
</code></pre></div>
<p>Let's add a fourth and fifth:</p>
<div class="highlight"><pre><span></span><code>4. Compile time type-checking with core.typed.
5. Better performance than corresponding pinhole and core functions.
</code></pre></div>
<p>Those sound hard, so let's add an easy one.</p>
<div class="highlight"><pre><span></span><code>6. A really stupid name.
</code></pre></div>
<p>So, "tinholes." The "t" is for type safety, and people make pinhole cameras with tin foil,
so it kind of makes sense. I was a little worried that it might be an obscenity in
some corner of the internet, but this seems not to be the case. It's merely stupid.</p>
<h4>Statically typed Clojure</h4>
<p>I'm a huge fan of
<a href="https://github.com/clojure/core.typed">core.typed</a>,
with which, for the purposes of this post, I will assume you're familiar. If you're not,
there are links to resources at
<a href="http://typedclojure.org/">typedclojure.org</a>, and
I once wrote a tour/introduction in
<a href="http://blog.podsnap.com/hollywood-typecasting.html">two</a>
<a href="http://blog.podsnap.com/imdb-part2.html">posts</a> that, some claim to have found
helpful.</p>
<p>If you're not familiar with <code>core.typed</code> and you don't have the time to
make yourself so right now, the main things to know are</p>
<ol>
<li>That it's an <strong>optional</strong> type
system that works outside the language, using <strong>annotations</strong> that have absolutely no
impact on the compiled code.</li>
<li>You can check out the legality of a namespace with <code>(t/check-ns)</code> or of
an individual form with <code>t/cf</code>.</li>
<li>Also, if it's not obvious, I've <code>:require</code>d <code>[clojure.core.typed :as t]</code>.</li>
</ol>
<p>Type-checking is somewhat difficult with the current implementations of <code>ph-whatever</code>,
as, intuitively, it would be with anything
implemented as a recursive function that
might consume and return values of different
types at different levels of the stack.
For example, <code>core/assoc-in</code> has a classically recursive definition</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="kd">defn </span><span class="nv">assoc-in</span> <span class="p">[</span><span class="nv">m</span> <span class="p">[</span><span class="nv">k</span> <span class="o">&</span> <span class="nv">ks</span><span class="p">]</span> <span class="nv">v</span><span class="p">]</span>
<span class="p">(</span><span class="k">if </span><span class="nv">ks</span> <span class="p">(</span><span class="nb">assoc </span><span class="nv">m</span> <span class="nv">k</span> <span class="p">(</span><span class="nf">assoc-in</span> <span class="p">(</span><span class="nb">get </span><span class="nv">m</span> <span class="nv">k</span><span class="p">)</span> <span class="nv">ks</span> <span class="nv">v</span><span class="p">))</span>
<span class="p">(</span><span class="nb">assoc </span><span class="nv">m</span> <span class="nv">k</span> <span class="nv">v</span><span class="p">)))</span>
</code></pre></div>
<p>and a conventional type annotation of </p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="nf">t/IFn</span> <span class="p">[(</span><span class="nf">t/U</span> <span class="p">(</span><span class="nf">clojure.lang.Associative</span> <span class="nv">t/Any</span> <span class="nv">t/Any</span><span class="p">)</span> <span class="nv">nil</span><span class="p">)</span>
<span class="p">(</span><span class="nf">clojure.lang.Seqable</span> <span class="nv">t/Any</span><span class="p">)</span>
<span class="nv">t/Any</span>
<span class="nb">-> </span><span class="nv">t/Any</span><span class="p">])</span>
</code></pre></div>
<p>which is sub-microns away from total uselessness.
It might be of some help if you were at risk of providing utterly
random arguments, (say scalars or functions), but truly it's a fiesta
of type <code>Any</code>, and there isn't anything to be done about it.</p>
<p>Let's recall the <code>Turtle</code> example from last time, but be pious little children and
add some type annotations:</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="nf">t/ann-record</span> <span class="nv">Point</span> <span class="p">[</span><span class="nv">x</span> <span class="ss">:-</span> <span class="nv">Number</span>, <span class="nv">y</span> <span class="ss">:-</span> <span class="nv">Number</span><span class="p">])</span>
<span class="p">(</span><span class="nf">t/ann-record</span> <span class="nv">Color</span> <span class="p">[</span><span class="nv">r</span> <span class="ss">:-</span> <span class="nv">Short</span>, <span class="nv">g</span> <span class="ss">:-</span> <span class="nv">Short</span>, <span class="nv">b</span> <span class="ss">:-</span> <span class="nv">Short</span><span class="p">])</span>
<span class="p">(</span><span class="nf">t/ann-record</span> <span class="nv">Turtle</span> <span class="p">[</span><span class="nv">position</span> <span class="ss">:-</span> <span class="nv">Point</span>,
<span class="nv">color</span> <span class="ss">:-</span> <span class="nv">Color</span>
<span class="nv">heading</span> <span class="ss">:-</span> <span class="nv">Number</span><span class="p">])</span>
<span class="p">(</span><span class="kd">defrecord </span><span class="nv">Point</span> <span class="p">[</span><span class="nv">x</span> <span class="nv">y</span><span class="p">])</span>
<span class="p">(</span><span class="kd">defrecord </span><span class="nv">Color</span> <span class="p">[</span><span class="nv">r</span> <span class="nv">g</span> <span class="nv">b</span><span class="p">])</span>
<span class="p">(</span><span class="kd">defrecord </span><span class="nv">Turtle</span> <span class="p">[</span><span class="nv">position</span> <span class="nv">heading</span> <span class="nv">color</span><span class="p">])</span>
<span class="p">(</span><span class="k">def </span><span class="nv">bruce</span> <span class="p">(</span><span class="nf">->Turtle</span> <span class="p">(</span><span class="nf">->Point</span> <span class="mf">1.0</span> <span class="mf">2.0</span><span class="p">)</span> <span class="p">(</span><span class="nb">/ </span><span class="nv">Math/PI</span> <span class="mi">4</span><span class="p">)</span> <span class="p">(</span><span class="nf">->Color</span> <span class="mi">255</span> <span class="mi">0</span> <span class="mi">0</span><span class="p">)))</span>
</code></pre></div>
<p>All of this fastidious typing is unfortunately of limited use once we peek beneath
the shell:</p>
<div class="highlight"><pre><span></span><code> <span class="nv">user></span> <span class="p">(</span><span class="nf">t/cf</span> <span class="p">(</span><span class="nf">get-in</span> <span class="nv">bruce</span> <span class="p">[</span><span class="ss">:position</span> <span class="ss">:x</span><span class="p">]))</span>
<span class="nv">t/Any</span>
</code></pre></div>
<p>That's disappointing; <code>core.typed</code> can't figure out that we're going to get back a <code>Number</code>.
But it gets even worse. The type checker will let us get away with horrors like this:</p>
<div class="highlight"><pre><span></span><code> <span class="nv">user></span> <span class="p">(</span><span class="nf">t/cf</span> <span class="p">(</span><span class="nf">get-in</span> <span class="nv">bruce</span> <span class="p">[</span><span class="ss">:hey</span> <span class="ss">:ho</span><span class="p">]))</span>
<span class="nv">t/Any</span>
</code></pre></div>
<p>The more complicated pinhole lenses are just as bad.
(If the following example makes no sense,
you really might want to go back and read the <a href="http://blog.podsnap.com/pinhole.html">pinhole post</a>.)
Following standard
advice, we could annotate a generated function</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="k">def </span><span class="nv">turtle-forward</span> <span class="p">(</span><span class="nf">mk-ph-mod</span> <span class="nv">movexy</span> <span class="p">[</span><span class="ss">:position</span> <span class="ss">:x</span><span class="p">]</span> <span class="p">[</span><span class="ss">:position</span> <span class="ss">:y</span><span class="p">]</span> <span class="p">[</span><span class="ss">:heading</span><span class="p">]))</span>
</code></pre></div>
<p>using the <code>^:no-check</code> provision</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="n">t</span><span class="o">/</span><span class="n">ann</span> <span class="o">^:</span><span class="n">no</span><span class="o">-</span><span class="n">check</span> <span class="n">turtle</span><span class="o">-</span><span class="n">forward</span> <span class="p">(</span><span class="n">t</span><span class="o">/</span><span class="n">IFn</span> <span class="p">[</span><span class="n">Turtle</span> <span class="n">Number</span> <span class="o">-></span> <span class="n">Turtle</span><span class="p">]))</span>
</code></pre></div>
<p>meaning that misuse of <code>turtle-forward</code> in subsequent code will be preventable,
but there's no assurance that we got it right in the first place.</p>
<h4>Macros to the rescue</h4>
<p>If the problem is that <code>core.typed</code> has no visibility into types that are determined
at runtime, let's try to determine them at compile time.
Were we to rewrite
<code>(get-in bruce [:position :x]))</code> explicitly as <code>(get (get bruce :position) :x)</code>,
then the type inference engine would have no trouble at all:</p>
<div class="highlight"><pre><span></span><code><span class="nv">user></span> <span class="p">(</span><span class="nf">t/cf</span> <span class="p">(</span><span class="nb">get </span><span class="p">(</span><span class="nb">get </span><span class="nv">bruce</span> <span class="ss">:position</span><span class="p">)</span> <span class="ss">:x</span><span class="p">))</span>
<span class="nv">java.lang.Number</span>
</code></pre></div>
<p>It would be irritating to lose the convenience provided by <code>get-in</code>, but fortunately
we don't have to. A macro<sup id="fnref:others"><a class="footnote-ref" href="#fn:others">1</a></sup> can do the rewriting for us,</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="kd">defmacro </span><span class="nv">th-get-in</span> <span class="p">[</span><span class="nv">m</span> <span class="nv">path</span><span class="p">]</span>
<span class="p">(</span><span class="nb">reduce </span><span class="p">(</span><span class="k">fn </span><span class="p">[</span><span class="nv">acc</span> <span class="nv">k</span><span class="p">]</span> <span class="p">(</span><span class="nb">concat </span><span class="nv">acc</span> <span class="p">(</span><span class="nb">list </span><span class="p">(</span><span class="k">if </span><span class="p">(</span><span class="nb">vector? </span><span class="nv">k</span><span class="p">)</span>
<span class="o">`</span><span class="p">(</span><span class="o">~</span><span class="p">(</span><span class="nb">second </span><span class="nv">k</span><span class="p">))</span>
<span class="o">`</span><span class="p">(</span><span class="nb">get </span><span class="o">~</span><span class="nv">k</span><span class="p">)))))</span>
<span class="o">`</span><span class="p">(</span><span class="nb">-> </span><span class="o">~</span><span class="nv">m</span><span class="p">)</span> <span class="nv">path</span><span class="p">))</span>
</code></pre></div>
<p>trivially throwing in bidirectional transforms as well:</p>
<div class="highlight"><pre><span></span><code><span class="nv">user></span> <span class="p">(</span><span class="nb">macroexpand-1 </span><span class="o">'</span><span class="p">(</span><span class="nf">th-get-in</span> <span class="nv">bruce</span> <span class="p">[</span><span class="ss">:position</span> <span class="ss">:x</span> <span class="p">[</span><span class="nb">inc </span><span class="nv">dec</span><span class="p">]]))</span>
<span class="p">(</span><span class="nf">clojure.core/-></span> <span class="nv">bruce</span> <span class="p">(</span><span class="nf">clojure.core/get</span> <span class="ss">:position</span><span class="p">)</span> <span class="p">(</span><span class="nf">clojure.core/get</span> <span class="ss">:x</span><span class="p">)</span> <span class="p">(</span><span class="nf">dec</span><span class="p">))</span>
</code></pre></div>
<p>Now, we know what we're dealing with:</p>
<div class="highlight"><pre><span></span><code> <span class="nv">user></span> <span class="p">(</span><span class="nf">t/cf</span> <span class="p">(</span><span class="nf">th-get-in</span> <span class="nv">bruce</span> <span class="p">[</span><span class="ss">:position</span> <span class="ss">:x</span> <span class="p">[</span><span class="nb">inc </span><span class="nv">dec</span><span class="p">]]))</span>
<span class="nv">java.lang.Number</span>
</code></pre></div>
<p>and if we try any funny stuff</p>
<div class="highlight"><pre><span></span><code> <span class="nv">user></span> <span class="p">(</span><span class="nf">t/cf</span> <span class="p">(</span><span class="nf">th-get-in</span> <span class="nv">bruce</span> <span class="p">[</span><span class="ss">:posn</span> <span class="ss">:x</span> <span class="p">[</span><span class="nb">inc </span><span class="nv">dec</span><span class="p">]]))</span>
<span class="nv">Type</span> <span class="nv">Error</span> <span class="p">(</span><span class="nf">acyclic/utils/tinhole.clj</span><span class="ss">:1:7</span><span class="p">)</span> <span class="nv">Static</span> <span class="nv">method</span> <span class="nv">clojure.lang.Numbers/dec</span> <span class="nv">could</span> <span class="nb">not </span><span class="nv">be</span> <span class="nv">applied</span> <span class="nv">to</span> <span class="nv">arguments</span><span class="err">:</span>
<span class="nv">...</span>
</code></pre></div>
<p>we get totally smacked. What's more, there are not insignificant performance gains from
expanding nested gets at compile time:</p>
<div class="highlight"><pre><span></span><code> <span class="nv">user></span> <span class="p">(</span><span class="nb">time </span><span class="p">(</span><span class="nb">dotimes </span><span class="p">[</span><span class="nv">n</span> <span class="mi">10000000</span><span class="p">]</span> <span class="p">(</span><span class="nf">th-get-in</span> <span class="nv">bruce</span> <span class="p">[</span><span class="ss">:position</span> <span class="ss">:x</span> <span class="p">[</span><span class="nb">inc </span><span class="nv">dec</span><span class="p">]])))</span>
<span class="s">"Elapsed time: 1237.301 msecs"</span>
<span class="nv">nil</span>
<span class="nv">user></span> <span class="p">(</span><span class="nb">time </span><span class="p">(</span><span class="nb">dotimes </span><span class="p">[</span><span class="nv">n</span> <span class="mi">10000000</span><span class="p">]</span> <span class="p">(</span><span class="nf">get-in</span> <span class="nv">bruce</span> <span class="p">[</span><span class="ss">:position</span> <span class="ss">:x</span><span class="p">])))</span>
<span class="s">"Elapsed time: 2076.322 msecs"</span>
<span class="nv">nil</span>
</code></pre></div>
<p>The better performance is related to a trade-off in flexibility, but it's a trade-off that
you probably don't mind. You could in principle want to pass in
a different path every time you call <code>assoc-in</code>, but with <code>th-assoc-in</code>, the path
is burned in as constants at compile time. This also means you'll run into trouble if
you try</p>
<div class="highlight"><pre><span></span><code><span class="nv">user></span> <span class="p">(</span><span class="k">def </span><span class="nv">p</span> <span class="p">[</span><span class="ss">:position</span> <span class="ss">:x</span><span class="p">])</span>
<span class="nv">user></span> <span class="p">(</span><span class="nf">th-get-in</span> <span class="nv">bruce</span> <span class="nv">p</span><span class="p">)</span>
<span class="nv">IllegalArgumentException</span> <span class="nv">Don</span><span class="ss">'t</span> <span class="nv">know</span> <span class="nv">how</span> <span class="nv">to</span> <span class="nv">create</span> <span class="nv">ISeq</span> <span class="nv">from</span><span class="err">:</span> <span class="nv">clojure.lang.Symbol</span> <span class="nv">clojure.lang.RT.seqFrom</span> <span class="p">(</span><span class="nf">RT.java</span><span class="ss">:505</span><span class="p">)</span>
</code></pre></div>
<p>because the macro is receiving the symbol <code>p</code>
instead of the expected vector of stuff and has no idea what to do with it.</p>
<h4>th-assoc-in is more complicated</h4>
<p>As in pinhole-land, the related bidirectional, transforming association is more complicated,
because we need to apply the outbound transformation functions while unwrapping a
structure, before calling the inbound transformations when putting it all back
together.</p>
<p>In this case, it was a bit more pleasant to implement the code-emitting within
an actual
recursive <em>function</em>, which is then invoked by a macro. (As opposed, critically,
to being invoked by code generated by a macro; all recursion here takes place at
compile time and runs only once.) Among other things, I could
avail myself of at least a little type checking while working:
the <code>Any</code>s are unavoidable,
but at least I know I won't try to recur with the wrong number of arguments.</p>
<p>The code generation code,</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="nf">t/ann</span> <span class="nv">th-assoc-in-gen</span> <span class="p">(</span><span class="nf">t/IFn</span> <span class="p">[</span><span class="nv">t/Any</span> <span class="p">(</span><span class="nf">t/NonEmptySeq</span> <span class="nv">t/Any</span><span class="p">)</span> <span class="nv">t/Any</span> <span class="nb">-> </span><span class="nv">t/Any</span><span class="p">]))</span>
<span class="p">(</span><span class="kd">defn- </span><span class="nv">th-assoc-in-gen</span> <span class="p">[</span><span class="nv">m</span> <span class="nv">ks</span> <span class="nv">v</span><span class="p">]</span>
<span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">k</span> <span class="p">(</span><span class="nb">first </span><span class="nv">ks</span><span class="p">)</span>
<span class="nv">ks</span> <span class="p">(</span><span class="nb">next </span><span class="nv">ks</span><span class="p">)]</span>
<span class="p">(</span><span class="nf">cond</span>
<span class="p">(</span><span class="nb">vector? </span><span class="nv">k</span><span class="p">)</span> <span class="p">(</span><span class="k">let </span><span class="p">[[</span><span class="nv">f-in</span> <span class="nv">f-out</span><span class="p">]</span> <span class="nv">k</span><span class="p">]</span>
<span class="p">(</span><span class="nb">list </span><span class="nv">f-in</span>
<span class="p">(</span><span class="nb">if-not </span><span class="nv">ks</span> <span class="nv">v</span> <span class="p">(</span><span class="nf">th-assoc-in-gen</span> <span class="p">(</span><span class="nb">list </span><span class="nv">f-out</span> <span class="nv">m</span><span class="p">)</span> <span class="nv">ks</span> <span class="nv">v</span><span class="p">))))</span>
<span class="nv">ks</span> <span class="p">(</span><span class="nb">list </span><span class="ss">'assoc</span> <span class="nv">m</span> <span class="nv">k</span> <span class="p">(</span><span class="nf">th-assoc-in-gen</span> <span class="p">(</span><span class="nb">list </span><span class="ss">'get</span> <span class="nv">m</span> <span class="nv">k</span><span class="p">)</span> <span class="nv">ks</span> <span class="nv">v</span> <span class="p">))</span>
<span class="ss">:else</span> <span class="p">(</span><span class="nb">list </span><span class="ss">'assoc</span> <span class="nv">m</span> <span class="nv">k</span> <span class="nv">v</span><span class="p">))))</span>
<span class="p">(</span><span class="kd">defmacro </span><span class="nv">th-assoc-in</span> <span class="p">[</span><span class="nv">m</span> <span class="nv">ks</span> <span class="nv">v</span><span class="p">]</span> <span class="p">(</span><span class="nf">th-assoc-in-gen</span> <span class="nv">m</span> <span class="nv">ks</span> <span class="nv">v</span><span class="p">))</span>
</code></pre></div>
<p><em>almost</em> parallels <code>ph-assoc-in</code>,</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">ph-assoc-in</span> <span class="p">[</span><span class="nv">m</span> <span class="p">[</span><span class="nv">k</span> <span class="o">&</span> <span class="nv">ks</span><span class="p">]</span> <span class="nv">v</span><span class="p">]</span>
<span class="p">(</span><span class="nf">cond</span>
<span class="p">(</span><span class="nb">vector? </span><span class="nv">k</span><span class="p">)</span> <span class="p">(</span><span class="k">let </span><span class="p">[[</span><span class="nv">f-in</span> <span class="nv">f-out</span><span class="p">]</span> <span class="nv">k</span><span class="p">]</span>
<span class="p">(</span><span class="nf">f-in</span> <span class="p">(</span><span class="nb">if-not </span><span class="nv">ks</span> <span class="nv">v</span> <span class="p">(</span><span class="nf">ph-assoc-in</span> <span class="p">(</span><span class="nf">f-out</span> <span class="nv">m</span><span class="p">)</span> <span class="nv">ks</span> <span class="nv">v</span><span class="p">))))</span>
<span class="nv">ks</span> <span class="p">(</span><span class="nb">assoc </span><span class="nv">m</span> <span class="nv">k</span> <span class="p">(</span><span class="nf">ph-assoc-in</span> <span class="p">(</span><span class="nb">get </span><span class="nv">m</span> <span class="nv">k</span><span class="p">)</span> <span class="nv">ks</span> <span class="nv">v</span><span class="p">))</span>
<span class="ss">:else</span> <span class="p">(</span><span class="nb">assoc </span><span class="nv">m</span> <span class="nv">k</span> <span class="nv">v</span><span class="p">)))</span>
</code></pre></div>
<p>except that that s-expressions of the form <code>(something ...)</code> are now <code>(list 'something ...)</code>,
so the output is unexecuted code, e.g.</p>
<div class="highlight"><pre><span></span><code><span class="nv">user></span> <span class="p">(</span><span class="nb">macroexpand </span><span class="o">'</span><span class="p">(</span><span class="nf">th-assoc-in</span> <span class="nv">bruce</span> <span class="p">[</span><span class="ss">:position</span> <span class="ss">:x</span> <span class="p">[</span><span class="nb">inc </span><span class="nv">dec</span><span class="p">]]</span> <span class="mi">5</span><span class="p">))</span>
<span class="p">(</span><span class="nb">assoc </span><span class="nv">bruce</span> <span class="ss">:position</span> <span class="p">(</span><span class="nb">assoc </span><span class="p">(</span><span class="nb">get </span><span class="nv">bruce</span> <span class="ss">:position</span><span class="p">)</span> <span class="ss">:x</span> <span class="p">(</span><span class="nb">inc </span><span class="mi">5</span><span class="p">)))</span>
</code></pre></div>
<h4>Complications with the dictionary of aliases</h4>
<p>Naively coding the macros that take a dictionary argument,
we will run into the same problem we
saw when passing path as a variable rather than as a literal vector.
We can get around the problem
by forcibly <code>eval</code>ing the <code>path-dict</code> within the macro, thus, during pre-compilation,
expanding the symbol into (presumably) a map of aliases:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defmacro </span><span class="nv">th-get</span> <span class="p">[</span><span class="nv">path-dict</span> <span class="nv">m</span> <span class="nv">k</span><span class="p">]</span>
<span class="o">`</span><span class="p">(</span><span class="nf">th-get-in</span> <span class="o">~</span><span class="nv">m</span> <span class="o">~</span><span class="p">(</span><span class="nf">ph/condition-key</span> <span class="p">(</span><span class="nb">eval </span><span class="nv">path-dict</span><span class="p">)</span> <span class="nv">k</span><span class="p">)))</span>
</code></pre></div>
<p>In the function-based implementation, <code>eval</code> is unnecessary, because arguments are not passed as symbolic literals.</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">ph-get</span> <span class="p">[</span><span class="nv">path-dict</span> <span class="nv">m</span> <span class="nv">k</span><span class="p">]</span>
<span class="p">(</span><span class="nf">ph-get-in</span> <span class="nv">m</span> <span class="p">(</span><span class="nf">condition-key</span> <span class="nv">path-dict</span> <span class="nv">k</span><span class="p">)))</span>
</code></pre></div>
<p>This is a trick you want to use conservatively, since many times the arguments that get passed to
macros can't possibly be evaluated at compile time So, for example, while this</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="k">def </span><span class="nv">p</span> <span class="p">{</span><span class="ss">:bar</span> <span class="p">[</span><span class="ss">:position</span> <span class="ss">:x</span><span class="p">]})</span>
<span class="p">(</span><span class="kd">defn </span><span class="nv">foo</span> <span class="p">[</span><span class="nv">m</span><span class="p">]</span> <span class="p">(</span><span class="nf">th-get</span> <span class="nv">p</span> <span class="nv">m</span> <span class="ss">:bar</span><span class="p">))</span>
</code></pre></div>
<p>works, this</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">foo</span> <span class="p">[</span><span class="nv">m</span><span class="p">]</span> <span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">p</span> <span class="p">(</span><span class="nb">assoc </span><span class="p">{}</span> <span class="ss">:bar</span> <span class="p">[</span><span class="ss">:position</span> <span class="ss">:x</span><span class="p">])]</span> <span class="p">(</span><span class="nf">th-get</span> <span class="nv">p</span> <span class="nv">m</span> <span class="ss">:bar</span><span class="p">)))</span>
</code></pre></div>
<p>will bomb, with the message that you "Can't eval locals."</p>
<h4>Getters, setters and modifiers</h4>
<p>The macros for creating getters are setters are very straightforward,</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defmacro </span><span class="nv">mk-th-set</span> <span class="p">([</span><span class="nv">ks</span><span class="p">]</span> <span class="o">`</span><span class="p">(</span><span class="k">fn </span><span class="p">[</span><span class="nv">o#</span> <span class="nv">v#</span><span class="p">]</span> <span class="p">(</span><span class="nf">th-assoc-in</span> <span class="nv">o#</span> <span class="o">~</span><span class="nv">ks</span> <span class="nv">v#</span><span class="p">))))</span>
<span class="p">(</span><span class="kd">defmacro </span><span class="nv">mk-th-get</span> <span class="p">[</span><span class="nv">ks</span><span class="p">]</span> <span class="o">`</span><span class="p">(</span><span class="k">fn </span><span class="p">[</span><span class="nv">o#</span><span class="p">]</span> <span class="p">(</span><span class="nf">th-get-in</span> <span class="nv">o#</span> <span class="o">~</span><span class="nv">ks</span><span class="p">)))</span>
</code></pre></div>
<p>but it's interesting to think about how type checking works with very complex nesting.
For example,</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="nf">th-assoc-in</span> <span class="p">{</span><span class="ss">:a</span> <span class="p">{</span><span class="ss">:b</span> <span class="s">"{:c 3}"</span><span class="p">}}</span> <span class="p">[</span><span class="ss">:a</span> <span class="ss">:b</span> <span class="p">[</span><span class="nb">pr-str </span><span class="nv">read-string</span><span class="p">]</span> <span class="ss">:c</span><span class="p">]</span> <span class="mi">5</span><span class="p">)</span>
</code></pre></div>
<p>will correctly return <code>{:a {:b "{:c 5}"}}</code>, but it doesn't type check, because
<code>read-string</code> returns <code>t/Any</code>, to which there's no guarantee that one
can <code>assoc</code> anything.</p>
<p>In a case like this, the idea is to confine the unprovable type assertions to
as small a domain as possible, which in this case means swearing up and down that
the string transformations will behave properly:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="nf">t/defalias</span> <span class="nv">Silly</span> <span class="s">"silly map"</span> <span class="p">(</span><span class="nf">t/HMap</span> <span class="ss">:mandatory</span> <span class="p">{</span><span class="ss">:a</span> <span class="p">(</span><span class="nf">t/HMap</span> <span class="ss">:mandatory</span> <span class="p">{</span><span class="ss">:b</span> <span class="nv">t/Str</span><span class="p">})}))</span>
<span class="p">(</span><span class="nf">t/defalias</span> <span class="nv">Billy</span> <span class="s">"stuff in b"</span> <span class="p">(</span><span class="nf">t/HMap</span> <span class="ss">:mandatory</span> <span class="p">{</span><span class="ss">:c</span> <span class="nv">t/Num</span><span class="p">}))</span>
<span class="p">(</span><span class="nf">t/ann</span> <span class="o">^</span><span class="ss">:no-check</span> <span class="nv">s->billy</span> <span class="p">(</span><span class="nf">t/IFn</span> <span class="p">[</span><span class="nv">t/Str</span> <span class="nb">-> </span><span class="nv">Billy</span><span class="p">]))</span>
<span class="p">(</span><span class="nf">t/ann</span> <span class="o">^</span><span class="ss">:no-check</span> <span class="nv">billy->s</span> <span class="p">(</span><span class="nf">t/IFn</span> <span class="p">[</span><span class="nv">Billy</span> <span class="nb">-> </span><span class="nv">t/Str</span><span class="p">]))</span>
<span class="p">(</span><span class="kd">defn </span><span class="nv">s->billy</span> <span class="p">[</span><span class="nv">s</span><span class="p">]</span> <span class="p">(</span><span class="nf">read-string</span> <span class="nv">s</span><span class="p">))</span>
<span class="p">(</span><span class="kd">defn </span><span class="nv">billy->s</span> <span class="p">[</span><span class="nv">b</span><span class="p">]</span> <span class="p">(</span><span class="nb">pr-str </span><span class="nv">b</span><span class="p">))</span>
<span class="p">(</span><span class="nf">t/ann</span> <span class="nv">x</span> <span class="nv">Silly</span><span class="p">)</span>
<span class="p">(</span><span class="k">def </span><span class="nv">x</span> <span class="p">{</span><span class="ss">:a</span> <span class="p">{</span> <span class="ss">:b</span> <span class="s">"{:c 3}"</span><span class="p">}})</span>
</code></pre></div>
<p>We can create a getter that passes <code>(t/check-ns)</code></p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="nf">t/ann</span> <span class="nv">g</span> <span class="p">(</span><span class="nf">t/IFn</span> <span class="p">[</span><span class="nv">Silly</span> <span class="nb">-> </span><span class="nv">t/Num</span><span class="p">]))</span>
<span class="p">(</span><span class="k">def </span><span class="nv">g</span> <span class="p">(</span><span class="nf">mk-th-get</span> <span class="p">[</span><span class="ss">:a</span> <span class="ss">:b</span> <span class="p">[</span><span class="nv">billy->s</span> <span class="nv">s->billy</span><span class="p">]</span> <span class="ss">:c</span><span class="p">]))</span>
</code></pre></div>
<p>More importantly,</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="k">def </span><span class="nv">h</span> <span class="p">(</span><span class="nf">mk-th-get</span> <span class="p">[</span><span class="ss">:a</span> <span class="ss">:b</span> <span class="p">[</span><span class="nv">billy->s</span> <span class="nv">s->billy</span><span class="p">]</span> <span class="ss">:goat</span><span class="p">]))</span>
</code></pre></div>
<p>does not.</p>
<h4>Turtles on the march</h4>
<p>Now let's build a type-checkable <code>turtle-forward</code>.</p>
<p>As before, we need a function to some x and y position, heading and distance, and
to return new values of x and y, but this time, it should be properly annotated.
Trigonometry is an annoyance, since the static methods in <code>Math</code> aren't
proper <code>IFn</code>s, so we have to wrap them with <code>^:no-check</code>ed functions. Again,
this is a compromise, but it's a tightly contained compromise, where visual inspection
is, if not guaranteed to succeed, at least likely to do so:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="nf">t/ann</span> <span class="o">^</span><span class="ss">:no-check</span> <span class="nv">Cos</span> <span class="p">(</span><span class="nf">t/IFn</span> <span class="p">[</span><span class="nv">Number</span> <span class="nb">-> </span><span class="nv">Number</span><span class="p">]))</span>
<span class="p">(</span><span class="nf">t/ann</span> <span class="o">^</span><span class="ss">:no-check</span> <span class="nv">Sin</span> <span class="p">(</span><span class="nf">t/IFn</span> <span class="p">[</span><span class="nv">Number</span> <span class="nb">-> </span><span class="nv">Number</span><span class="p">]))</span>
<span class="p">(</span><span class="kd">defn </span><span class="nv">Cos</span> <span class="p">[</span><span class="nv">a</span><span class="p">]</span> <span class="p">(</span><span class="nf">Math/cos</span> <span class="nv">a</span><span class="p">))</span>
<span class="p">(</span><span class="kd">defn </span><span class="nv">Sin</span> <span class="p">[</span><span class="nv">a</span><span class="p">]</span> <span class="p">(</span><span class="nf">Math/sin</span> <span class="nv">a</span><span class="p">))</span>
<span class="p">(</span><span class="nf">t/defn</span> <span class="nv">movexy</span> <span class="p">[</span><span class="nv">x</span> <span class="ss">:-</span> <span class="nv">Number</span>
<span class="nv">y</span> <span class="ss">:-</span> <span class="nv">Number</span>
<span class="nv">dir</span> <span class="ss">:-</span> <span class="nv">Number</span>
<span class="nv">dist</span> <span class="ss">:-</span> <span class="nv">Number</span><span class="p">]</span> <span class="ss">:-</span> <span class="p">(</span><span class="nf">t/HVec</span> <span class="p">[</span><span class="nv">Number</span> <span class="nv">Number</span><span class="p">])</span>
<span class="p">[(</span><span class="nb">+ </span><span class="nv">x</span> <span class="p">(</span><span class="nb">* </span><span class="nv">dist</span> <span class="p">(</span><span class="nf">Cos</span> <span class="nv">dir</span><span class="p">)))</span>
<span class="p">(</span><span class="nb">+ </span><span class="nv">y</span> <span class="p">(</span><span class="nb">* </span><span class="nv">dist</span> <span class="p">(</span><span class="nf">Sin</span> <span class="nv">dir</span><span class="p">)))])</span>
</code></pre></div>
<p>Now, instead of</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="k">def </span><span class="nv">turtle-forward</span> <span class="p">(</span><span class="nf">mk-ph-mod</span> <span class="nv">movexy</span> <span class="p">[</span><span class="ss">:position</span> <span class="ss">:x</span><span class="p">]</span> <span class="p">[</span><span class="ss">:position</span> <span class="ss">:y</span><span class="p">]</span> <span class="p">[</span><span class="ss">:heading</span><span class="p">]))</span>
</code></pre></div>
<p>we'll call a new macro version</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="k">def </span><span class="nv">turtle-forward</span> <span class="p">(</span><span class="nf">mk-th-mod</span> <span class="nv">movexy</span> <span class="mi">2</span> <span class="mi">1</span> <span class="p">[</span><span class="ss">:position</span> <span class="ss">:x</span><span class="p">]</span> <span class="p">[</span><span class="ss">:position</span> <span class="ss">:y</span><span class="p">]</span> <span class="p">[</span><span class="ss">:heading</span><span class="p">]))</span>
</code></pre></div>
<p>which, a little awkwardly, requires specifying the number of return values from <code>movexy</code>
as well as the number of arguments in addition to the turtle that it will expect.
The macro will need these numbers before it has a chance to <code>movexy</code>.
I suspect it's possible to extract the information from the type declaration, but this doesn't seem to be
very straightforward.</p>
<p>Now, <code>mk-th-mod</code> is not the world's most complicated macro, but it is hefty, and I wanted to
take pains that its output be legible by humans, especially as those humans may be called upon
to interpret type errors referring to it. The pretty-printed macro expansion for <code>turtle-forward</code>
is:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="nf">fn*</span>
<span class="p">([</span><span class="nv">obj-26666</span> <span class="nv">more-arg-26668-0</span><span class="p">]</span>
<span class="p">(</span><span class="nf">clojure.core/let</span>
<span class="p">[</span><span class="nv">arg-26667-0</span>
<span class="p">(</span><span class="nf">th-get-in</span> <span class="nv">obj-26666</span> <span class="p">[</span><span class="ss">:position</span> <span class="ss">:x</span><span class="p">])</span>
<span class="nv">arg-26667-1</span>
<span class="p">(</span><span class="nf">th-get-in</span> <span class="nv">obj-26666</span> <span class="p">[</span><span class="ss">:position</span> <span class="ss">:y</span><span class="p">])</span>
<span class="nv">arg-26667-2</span>
<span class="p">(</span><span class="nf">th-get-in</span> <span class="nv">obj-26666</span> <span class="p">[</span><span class="ss">:heading</span><span class="p">])</span>
<span class="p">[</span><span class="nv">fv-26669-0</span> <span class="nv">fv-26669-1</span><span class="p">]</span>
<span class="p">(</span><span class="nf">movexy</span> <span class="nv">arg-26667-0</span> <span class="nv">arg-26667-1</span> <span class="nv">arg-26667-2</span> <span class="nv">more-arg-26668-0</span><span class="p">)]</span>
<span class="p">(</span><span class="nf">clojure.core/-></span>
<span class="nv">obj-26666</span>
<span class="p">(</span><span class="nf">th-assoc-in</span> <span class="p">[</span><span class="ss">:position</span> <span class="ss">:x</span><span class="p">]</span> <span class="nv">fv-26669-0</span><span class="p">)</span>
<span class="p">(</span><span class="nf">th-assoc-in</span> <span class="p">[</span><span class="ss">:position</span> <span class="ss">:y</span><span class="p">]</span> <span class="nv">fv-26669-1</span><span class="p">)))))</span>
</code></pre></div>
<p>The 5-digit numbers are courtesy of <code>gensym</code>. Removing them and reformatting only slightly, we see
an incredibly straightforward function that barely requires explanation:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="nf">fn*</span>
<span class="p">([</span><span class="nv">obj</span> <span class="nv">more-arg-0</span><span class="p">]</span>
<span class="p">(</span><span class="nf">clojure.core/let</span>
<span class="p">[</span><span class="nv">arg-0</span> <span class="p">(</span><span class="nf">th-get-in</span> <span class="nv">obj</span> <span class="p">[</span><span class="ss">:position</span> <span class="ss">:x</span><span class="p">])</span>
<span class="nv">arg-1</span> <span class="p">(</span><span class="nf">th-get-in</span> <span class="nv">obj</span> <span class="p">[</span><span class="ss">:position</span> <span class="ss">:y</span><span class="p">])</span>
<span class="nv">arg-2</span> <span class="p">(</span><span class="nf">th-get-in</span> <span class="nv">obj</span> <span class="p">[</span><span class="ss">:heading</span><span class="p">])</span>
<span class="p">[</span><span class="nv">fv-0</span> <span class="nv">fv1</span><span class="p">]</span> <span class="p">(</span><span class="nf">movexy</span> <span class="nv">arg-0</span> <span class="nv">arg-1</span> <span class="nv">arg-2</span> <span class="nv">more-arg-0</span><span class="p">)]</span>
<span class="p">(</span><span class="nf">clojure.core/-></span>
<span class="nv">obj</span>
<span class="p">(</span><span class="nf">th-assoc-in</span> <span class="p">[</span><span class="ss">:position</span> <span class="ss">:x</span><span class="p">]</span> <span class="nv">fv-0</span><span class="p">)</span>
<span class="p">(</span><span class="nf">th-assoc-in</span> <span class="p">[</span><span class="ss">:position</span> <span class="ss">:y</span><span class="p">]</span> <span class="nv">fv-1</span><span class="p">)))))</span>
</code></pre></div>
<p>To achieve this, legibility, it's necessary to go a little beyond the <code>name#</code> sugar and call <code>gensym</code> directly.
To generate a series of uniquely symbols that look like a series, we have</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="nf">t/defn</span> <span class="nv">gensyms</span> <span class="p">[</span><span class="nv">s</span> <span class="ss">:-</span> <span class="nv">t/Str</span> <span class="nv">n</span> <span class="ss">:-</span> <span class="nv">t/Int</span><span class="p">]</span> <span class="ss">:-</span> <span class="p">(</span><span class="nf">t/Seq</span> <span class="nv">t/Sym</span><span class="p">)</span>
<span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">s</span> <span class="p">(</span><span class="nb">gensym </span><span class="p">(</span><span class="nb">str </span><span class="nv">s</span> <span class="s">"-"</span><span class="p">))]</span>
<span class="p">(</span><span class="nb">map </span><span class="o">#</span><span class="p">(</span><span class="nb">symbol </span><span class="p">(</span><span class="nb">str </span><span class="nv">s</span> <span class="s">"-"</span> <span class="nv">%</span><span class="p">))</span> <span class="p">(</span><span class="nb">range </span><span class="nv">n</span><span class="p">))))</span>
</code></pre></div>
<p>Our macro</p>
<ol>
<li>
<p>Generates some symbols to hold the extracted arguments, user-provided arguments and function results:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defmacro </span><span class="nv">mk-th-mod</span> <span class="p">[</span><span class="nv">f</span> <span class="nv">n-out</span> <span class="nv">n-more</span> <span class="o">&</span> <span class="nv">arg-paths</span><span class="p">]</span>
<span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">o</span> <span class="p">(</span><span class="nb">symbol </span><span class="p">(</span><span class="nb">name </span><span class="p">(</span><span class="nb">gensym </span><span class="s">"obj-"</span><span class="p">)))</span>
<span class="nv">n-args</span> <span class="p">(</span><span class="nb">count </span><span class="nv">arg-paths</span><span class="p">)</span>
<span class="nv">args</span> <span class="p">(</span><span class="nf">gensyms</span> <span class="s">"arg"</span> <span class="nv">n-args</span><span class="p">)</span>
<span class="nv">margs</span> <span class="p">(</span><span class="nf">gensyms</span> <span class="s">"more-arg"</span> <span class="nv">n-more</span><span class="p">)</span>
<span class="nv">fnvals</span> <span class="p">(</span><span class="nf">gensyms</span> <span class="s">"fv"</span> <span class="nv">n-out</span><span class="p">)]</span>
</code></pre></div>
</li>
<li>
<p>generates code to extract arguments from the structure,</p>
<div class="highlight"><pre><span></span><code> <span class="nv">argvals</span> <span class="p">(</span><span class="nb">map </span><span class="o">#</span><span class="p">(</span><span class="nb">list </span><span class="ss">'th-get-in</span> <span class="nv">o</span> <span class="nv">%</span><span class="p">)</span> <span class="nv">arg-paths</span><span class="p">)</span>
</code></pre></div>
</li>
<li>
<p>builds up a function that takes a structure and the <code>more-arg-</code> arguments,</p>
<div class="highlight"><pre><span></span><code> <span class="o">`</span><span class="p">(</span><span class="k">fn </span><span class="p">[</span><span class="o">~</span><span class="nv">o</span> <span class="o">~@</span><span class="nv">margs</span><span class="p">]</span>
</code></pre></div>
</li>
<li>
<p>evaluates the extraction code and let-binds its results to the <code>arg-</code> variables,</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="o">~@</span><span class="p">(</span><span class="nb">interleave </span><span class="nv">args</span> <span class="nv">argvals</span><span class="p">)</span>
</code></pre></div>
</li>
<li>
<p>evaluates the user function (<code>movexy</code>) and destructures the results into the <code>fv-</code> variables,</p>
<div class="highlight"><pre><span></span><code> <span class="p">[</span><span class="o">~@</span><span class="nv">fnvals</span><span class="p">]</span> <span class="p">(</span><span class="o">~</span><span class="nv">f</span> <span class="o">~@</span><span class="nv">args</span> <span class="o">~@</span><span class="nv">margs</span><span class="p">)]</span>
</code></pre></div>
</li>
<li>
<p>threads the original structure through calls to <code>th-assoc-in</code> to place the <code>fv-</code> values where they belong:</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="nb">-> </span><span class="o">~</span><span class="nv">o</span>
<span class="o">~@</span><span class="p">(</span><span class="nb">map </span><span class="o">#</span><span class="p">(</span><span class="nb">list </span><span class="ss">'th-assoc-in</span> <span class="p">(</span><span class="nb">nth </span><span class="nv">arg-paths</span> <span class="nv">%1</span><span class="p">)</span> <span class="p">(</span><span class="nb">nth </span><span class="nv">fnvals</span> <span class="nv">%1</span><span class="p">))</span>
<span class="p">(</span><span class="nb">range </span><span class="nv">n-out</span><span class="p">)</span> <span class="p">))))))</span>
</code></pre></div>
</li>
</ol>
<p>The function built by this macro behaves as expected</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="nf">turtle-forward</span> <span class="nv">bruce</span> <span class="mi">3</span><span class="p">)</span>
<span class="o">#</span><span class="nv">user.Turtle</span><span class="p">{</span><span class="ss">:position</span> <span class="o">#</span><span class="nv">user.Point</span><span class="p">{</span><span class="ss">:x</span> <span class="mf">3.121320343559643</span>, <span class="ss">:y</span> <span class="mf">4.121320343559642</span><span class="p">}</span>, <span class="ss">:color</span> <span class="o">#</span><span class="nv">user.Color</span><span class="p">{</span><span class="ss">:r</span> <span class="mi">255</span>, <span class="ss">:g</span> <span class="mi">0</span>, <span class="ss">:b</span> <span class="mi">0</span><span class="p">}</span>, <span class="ss">:heading</span> <span class="mf">0.7853981633974483</span><span class="p">}</span>
</code></pre></div>
<p>and type-checks, but innocent-looking perturbations like</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="k">def </span><span class="nv">turtle-forward2</span> <span class="p">(</span><span class="nf">mk-th-mod</span> <span class="nv">movexy</span> <span class="mi">2</span> <span class="mi">1</span> <span class="p">[</span><span class="ss">:position</span> <span class="ss">:x</span><span class="p">]</span> <span class="p">[</span><span class="ss">:position</span> <span class="ss">:y</span><span class="p">]</span> <span class="p">[</span><span class="ss">:angle</span><span class="p">]))</span>
</code></pre></div>
<p>or</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="nf">t/defn</span> <span class="nv">movexy</span> <span class="p">[</span><span class="nv">x</span> <span class="ss">:-</span> <span class="nv">Long</span>
<span class="nv">y</span> <span class="ss">:-</span> <span class="nv">Long</span>
<span class="nv">...</span>
</code></pre></div>
<p>explode spectacularly during <code>check-ns</code>.</p>
<p>To be fair, the explosions that <code>core.typed</code> enjoys are not all that user friendly,
but that's excusable under the cruel-to-be-kind doctrine.</p>
<h4>We can have it all</h4>
<p>We wanted (in so many words) type-safe, efficient, concise, idiomatic and flexible handling
of deeply nested data structures, and that's what we got.</p>
<p>Prima facie, these were unreasonable requests - it's not as though other languages
were lining up to answer them - but Clojure and its ecosystem keep living up to their
advertised aptitude for making difficult sounding things simple, and sometimes
even easy. It's worth enumerating some of the machinery from which we benefited:</p>
<ol>
<li>
<p>Immutable, persistent data structures. Without these, the whole conversation might
never have ended very early in frustration,
because a mutable nested hash table may be the most dangerous
programmatic construct every conceived of.</p>
</li>
<li>
<p>Homoiconicity, i.e. macros that can be written in some semblance of the
language they emit.</p>
</li>
<li>
<p>Optional typing via <code>core.typed</code>. It was both necessary and natural in
this work to rely alternately on dynamic and static typing.</p>
</li>
</ol>
<p>Regarding the last point, I should admit that I was guilted into this iteration
of the project by <a href="https://www.youtube.com/watch?v=a0gT0syAXsY">Ambrose's Strangeloop talk</a>.
You take motivation where you find it.</p>
<div class="footnote">
<hr>
<ol>
<li id="fn:others">
<p>I don't mean to imply that nobody has used macros in Clojure lens constructs before; several comments in the previous post provide counterexamples. What I haven't seen, however, is the use of macros to inline recursive association and expose type. <a class="footnote-backref" href="#fnref:others" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
<li id="fn:destructuring">
<p><code>core.typed</code> does sometimes get confused by destructuring, hence the <code>let</code> instead of a more idiomatic <code>[k & ks]</code> in the function declaration <a class="footnote-backref" href="#fnref:destructuring" title="Jump back to footnote 2 in the text">↩</a></p>
</li>
</ol>
</div>Pinholes 2 - Ignorance, misrepresentation and prejudice edition2014-09-22T00:00:00-04:002014-09-22T00:00:00-04:00Peter Fraenkeltag:blog.podsnap.com,2014-09-22:/pinhole2.html<p>Sseveral comments regarding the <a href="http://blog.podsnap.com/pinhole.md">pinholes post</a>, have forced me,
against the deepest elements of my nature, to engage in thought. Since that might never happen again,
I thought it meet to record the event.</p>
<p>I'm going to say "I" a lot, because this is mostly my opinions.</p>
<h4>Bidirectional programming</h4>
<p>As pointed out by Christian Schuhegger in a comment on the original post,
lenses were originally introduced to computer science in the context
of bidirectional programming, rather than as a tool for dealing with deeply nested immutable structures.
He points to a good list of <a href="http://www.cis.upenn.edu/~bcpierce/papers/index.shtml#Lenses">papers on the subject</a>
on the …</p><p>Sseveral comments regarding the <a href="http://blog.podsnap.com/pinhole.md">pinholes post</a>, have forced me,
against the deepest elements of my nature, to engage in thought. Since that might never happen again,
I thought it meet to record the event.</p>
<p>I'm going to say "I" a lot, because this is mostly my opinions.</p>
<h4>Bidirectional programming</h4>
<p>As pointed out by Christian Schuhegger in a comment on the original post,
lenses were originally introduced to computer science in the context
of bidirectional programming, rather than as a tool for dealing with deeply nested immutable structures.
He points to a good list of <a href="http://www.cis.upenn.edu/~bcpierce/papers/index.shtml#Lenses">papers on the subject</a>
on the subject. I was, it seems, excessively influenced by the use case from the Scalaz tutorial (if not by
the exact details).</p>
<p>The original metaphor was, I suppose, that light rays traced out the
same path through a lens, irrespective of direction. My take on the
metaphor - that a lens is so called because it focuses on small or distant
things - is,
it seems to me, compelling, but it is not the original intent. Within
the context of the original definition (well not the original,
original definition, which would be anything in the shape of a
lentil), it seems like the things I create with <code>mk-ph-set</code> and
<code>mk-ph-get</code> are acceptably the ADT equivalent of lenses, but the
<code>mk-ph-mod</code> artifacts are really state transformers. Irrespective
of name, however, a tool for dealing with deeply and weirdly
nested immutable data structures is demonstrably important to have, and I'm not
penitent or creative enough to come up with a completely different metaphor.</p>
<h4>Fresnel and protocols and aesthetics</h4>
<p>Someone (whose name I'll publish if he asks me to) on twitter mentioned <a href="https://github.com/ckirkendall/fresnel">fresnel</a>, which is
also a lens library. I <em>did</em> know about this before blogging, but I didn't really want to argue about the differences,
because it's such a nice and polished piece of work.</p>
<p>Now that the subject has come up, I'll cop to being aesthetically opposed to
creating a protocol for anything that could potentially show up as (or in)
the second argument to <code>assoc</code> (or <code>assoc-in</code>), especially since, if
the first argument is a hash-map, <code>Lens</code> becomes an incomplete proxy for
<code>Object</code>.</p>
<p>Clojure and Clojurescript protocols pay homage to the object-oriented
nature of their virtual machines, but do so in moderation, providing
abstractions over fundamentally different implementations of
fundamental objects that are used in essentially the same way.
Hence <code>IPersistentMap</code> being
implemented by both <code>(hash-map)</code> and <code>(sorted-map)</code>, or
<code>core.async</code> having different <code>impl</code> namespaces for Clojure
and Clojurescript.
In Java code reviews, the detection of
<code>cond</code>-like logic immediately results in prescription for
a new <code>interface</code>. Not so, in Clojure, which recognizes the limitations
of virtual function dispatch and so provides rich semantics for computed
dispatch. In Java, the <code>interface</code> <strong>is</strong> the interface: instructions
for using a new library generally involve telling you to implement one.<sup id="fnref:lambda"><a class="footnote-ref" href="#fn:lambda">1</a></sup>
Clojure, I think, avoided the word "interface" in conscious rejection of this paradigm.</p>
<p>Pinhole relies on implementations of <code>clojure.lang.Associative</code>, to do the right thing
when <code>assoc</code> and <code>get</code> are ultimately called, but beyond that differentiates only between keys that are
vectors and keys that are not vectors, and in the rare case where you want a vector as a hash key, you have
provide an <code>[in,out]</code> function pair to do the dirty work. That feels right to me, but opinions may of course differ.</p>
<div class="footnote">
<hr>
<ol>
<li id="fn:lambda">
<p>Though, perhaps, the wind that brought lambdas to Java 8 may carry off a few of those idioms eventually. <a class="footnote-backref" href="#fnref:lambda" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
</ol>
</div>Pinholes - Idiomatic Clojure Lenses2014-09-19T00:00:00-04:002014-09-19T00:00:00-04:00Peter Fraenkeltag:blog.podsnap.com,2014-09-19:/pinhole.html<p>Lenses are a construct for getting, "setting" or "modifying" values within data structures, especially deeply nested
data structures. The quotation marks have the usual meaning when they show up in funktionsprache:<sup id="fnref:german"><a class="footnote-ref" href="#fn:german">1</a></sup> not mutating
anything <em>per se</em>, but instead producing an object, or reference thereto, that is identical except for the requested change.</p>
<p>In Scala, the need for lenses is pretty glaring, as illustrated in <a href="https://twitter.com/eed3si9n">Eugene Yokota</a>'s
<a href="http://eed3si9n.com/learning-scalaz/Lens.html">great explanation</a> of lenses in <a href="https://github.com/scalaz/scalaz">Scalaz</a>,
because of the centrality of <a href="http://www.scala-lang.org/old/node/107">case classes</a>. In his example problem, a turtle
is represented with three case classes:</p>
<div class="highlight"><pre><span></span><code> <span class="k">case</span> <span class="k">class</span> <span class="nc">Point</span><span class="p">(</span><span class="n">x</span><span class="p">:</span> <span class="nc">Double</span><span class="p">,</span> <span class="n">y</span><span class="p">:</span> <span class="nc">Double …</span></code></pre></div><p>Lenses are a construct for getting, "setting" or "modifying" values within data structures, especially deeply nested
data structures. The quotation marks have the usual meaning when they show up in funktionsprache:<sup id="fnref:german"><a class="footnote-ref" href="#fn:german">1</a></sup> not mutating
anything <em>per se</em>, but instead producing an object, or reference thereto, that is identical except for the requested change.</p>
<p>In Scala, the need for lenses is pretty glaring, as illustrated in <a href="https://twitter.com/eed3si9n">Eugene Yokota</a>'s
<a href="http://eed3si9n.com/learning-scalaz/Lens.html">great explanation</a> of lenses in <a href="https://github.com/scalaz/scalaz">Scalaz</a>,
because of the centrality of <a href="http://www.scala-lang.org/old/node/107">case classes</a>. In his example problem, a turtle
is represented with three case classes:</p>
<div class="highlight"><pre><span></span><code> <span class="k">case</span> <span class="k">class</span> <span class="nc">Point</span><span class="p">(</span><span class="n">x</span><span class="p">:</span> <span class="nc">Double</span><span class="p">,</span> <span class="n">y</span><span class="p">:</span> <span class="nc">Double</span><span class="p">)</span>
<span class="k">case</span> <span class="k">class</span> <span class="nc">Color</span><span class="p">(</span><span class="n">r</span><span class="p">:</span> <span class="nc">Byte</span><span class="p">,</span> <span class="n">g</span><span class="p">:</span> <span class="nc">Byte</span><span class="p">,</span> <span class="n">b</span><span class="p">:</span> <span class="nc">Byte</span><span class="p">)</span>
<span class="k">case</span> <span class="k">class</span> <span class="nc">Turtle</span><span class="p">(</span><span class="n">position</span><span class="p">:</span> <span class="nc">Point</span><span class="p">,</span> <span class="n">heading</span><span class="p">:</span> <span class="nc">Double</span><span class="p">,</span> <span class="n">color</span><span class="p">:</span> <span class="nc">Color</span><span class="p">)</span>
<span class="kd">val</span> <span class="n">t</span> <span class="o">=</span> <span class="nc">Turtle</span><span class="p">(</span><span class="nc">Point</span><span class="p">(</span><span class="mf">2.0</span><span class="p">,</span> <span class="mf">3.0</span><span class="p">),</span> <span class="mf">0.0</span><span class="p">,</span> <span class="nc">Color</span><span class="p">(</span><span class="mi">255</span><span class="p">.</span><span class="n">toByte</span><span class="p">,</span> <span class="mi">255</span><span class="p">.</span><span class="n">toByte</span><span class="p">,</span> <span class="mi">255</span><span class="p">.</span><span class="n">toByte</span><span class="p">))</span>
<span class="c1">// t: Turtle = Turtle(Point(2.0,3.0),0.0,Color(-1,-1,-1))</span>
</code></pre></div>
<p>This is lovely, but if you wanted to change just the <code>x</code> position, you'd have to write</p>
<div class="highlight"><pre><span></span><code> <span class="n">t</span><span class="p">.</span><span class="n">copy</span><span class="p">(</span><span class="n">position</span><span class="o">=</span><span class="n">t</span><span class="p">.</span><span class="n">position</span><span class="p">.</span><span class="n">copy</span><span class="p">(</span><span class="n">x</span><span class="o">=</span><span class="mf">42.0</span><span class="p">))</span>
</code></pre></div>
<p>and you can imagine how truly awful this would look for even more deeply nested case classes.</p>
<p>In Clojure, this sort of thing is a lot easier out of the gate, because records implement
map protocols (i.e. <code>clojure.lang.Associative</code>) and because of the ultra-slick
<code>assoc-in</code>:</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="kd">defrecord </span><span class="nv">Point</span> <span class="p">[</span><span class="o">^</span><span class="nb">double </span><span class="nv">x</span> <span class="o">^</span><span class="nb">double </span><span class="nv">y</span><span class="p">])</span>
<span class="p">(</span><span class="kd">defrecord </span><span class="nv">Color</span> <span class="p">[</span><span class="o">^</span><span class="nb">short </span><span class="nv">r</span> <span class="o">^</span><span class="nb">short </span><span class="nv">g</span> <span class="o">^</span><span class="nb">short </span><span class="nv">b</span><span class="p">])</span>
<span class="p">(</span><span class="kd">defrecord </span><span class="nv">Turtle</span> <span class="p">[</span><span class="o">^</span><span class="nv">Point</span> <span class="nv">position</span> <span class="o">^</span><span class="nb">double </span><span class="nv">heading</span> <span class="o">^</span><span class="nv">Color</span> <span class="nv">color</span><span class="p">])</span>
<span class="p">(</span><span class="k">def </span><span class="nv">t</span> <span class="p">(</span><span class="nf">->Turtle</span> <span class="p">(</span><span class="nf">->Point</span> <span class="mf">1.0</span> <span class="mf">2.0</span><span class="p">)</span> <span class="p">(</span><span class="nb">/ </span><span class="nv">Math/PI</span> <span class="mi">4</span><span class="p">)</span> <span class="p">(</span><span class="nf">->Color</span> <span class="mi">255</span> <span class="mi">0</span> <span class="mi">0</span><span class="p">)))</span>
<span class="p">(</span><span class="nf">assoc-in</span> <span class="nv">t</span> <span class="p">[</span><span class="ss">:position</span> <span class="ss">:x</span><span class="p">]</span> <span class="mi">42</span><span class="nv">.</span><span class="p">)</span>
<span class="c1">;; #user.Turtle{:position #user.Point{:x 42.0, :y 2.0}, :heading 0.7853981633974483, :color #user.Color{:r 255, :g 0, :b 0}}</span>
</code></pre></div>
<p>but tastiness like this just makes us hungrier.</p>
<p>Before proceeding, I should mention a <a href="https://speakerdeck.com/markhibberd/lens-from-the-ground-up-in-clojure">very nice presentation</a>
that takes a more formal (functorish) approach to lenses than I do here.</p>
<h4>Note: 2014-09-29</h4>
<p>Since this post has been tweeted a bit (well, more than zero times, which is what
I expected), I should note that there are two follow-up posts:
<a href="http://blog.podsnap.com/pinhole2.html">One</a> responding to some of the comments, and
<a href="http://blog.podsnap.com/tinhole.html">another</a> offering an alternate implementation that,
among other things, is compatible with <code>core.typed</code>. Neither, though, will make sense
if you haven't read this one.</p>
<h4>Use case: Amazon Web Services</h4>
<p>I've recently been spending a lot of time wrangling AWS with Clojure. The main route to AWS is
Amazon's Java SDK, around which
<a href="https://github.com/mcohen01/amazonica">amazonica</a> provides a complete wrapper. In fact, it's better
than a wrapper, because the plain SDK is mind-bogglingly tedious - a model citizen in
the <a href="http://steve-yegge.blogspot.com/2006/03/execution-in-kingdom-of-nouns.html">kingdom of nouns</a>.</p>
<p>For example, to bid on an instance in the spot auction market, you call a static method in the
<code>AmazonEC2</code> class:</p>
<div class="highlight"><pre><span></span><code> <span class="n">RequestSpotInstancesResult</span> <span class="nf">requestSpotInstances</span><span class="p">(</span><span class="n">RequestSpotInstancesRequest</span> <span class="n">requestSpotInstancesRequest</span><span class="p">)</span>
</code></pre></div>
<p>where the <code>RequestSpotInstantRequest</code> class has a do-nothing constructor and a Dukedom of <code>.setXYZ</code> methods,
the most important of which is</p>
<div class="highlight"><pre><span></span><code> <span class="kt">void</span> <span class="nf">setLaunchSpecification</span><span class="p">(</span><span class="n">LaunchSpecification</span> <span class="n">launchSpecification</span><span class="p">)</span>
</code></pre></div>
<p>in which <code>LaunchSpecification</code> has yet more <code>.set</code> methods, including</p>
<div class="highlight"><pre><span></span><code> <span class="kt">void</span> <span class="nf">setNetworkInterfaces</span><span class="p">(</span><span class="n">Collection</span><span class="o"><</span><span class="n">InstanceNetworkInterfaceSpecification</span><span class="o">></span> <span class="n">networkInterfaces</span><span class="p">)</span>
</code></pre></div>
<p>and the <code>InstanceNetworkInterfaceSpecification</code> is where <code>setSubnetId(String subnetId)</code> lives, so you really
end up needing all of these classes.</p>
<p>Amazonica, by contrast, turns this logically nested data into explicitly nested hash-maps, so an entire request
can be summarized and constructed as:</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="k">def </span><span class="nv">req</span> <span class="p">[</span><span class="ss">:spot-price</span> <span class="mf">0.01</span>,
<span class="ss">:instance-count</span> <span class="mi">1</span>,
<span class="ss">:type</span> <span class="s">"one-time"</span>,
<span class="ss">:launch-specification</span>
<span class="p">{</span><span class="ss">:image-id</span> <span class="s">"ami-something"</span>,
<span class="ss">:instance-type</span> <span class="s">"t1.micro"</span>,
<span class="ss">:placement</span> <span class="p">{</span><span class="ss">:availability-zone</span> <span class="s">"us-east-1a"</span><span class="p">}</span>,
<span class="ss">:key-name</span> <span class="s">"your-key"</span>
<span class="ss">:user-data</span> <span class="s">"aGVsbG8gc2FpbG9y"</span>
<span class="ss">:network-interfaces</span>
<span class="p">[{</span><span class="ss">:device-index</span> <span class="mi">0</span>
<span class="ss">:subnet-id</span> <span class="s">"subnet-yowsa"</span>
<span class="ss">:groups</span> <span class="p">[</span><span class="s">"sg-hubba"</span><span class="p">]}]</span>
<span class="ss">:iam-instance-profile</span>
<span class="p">{</span><span class="ss">:arn</span> <span class="s">"arn:aws:iam::123456789:instance-profile/name-you-chose"</span><span class="p">}}])</span>
<span class="p">(</span><span class="nf">request-spot-instances</span> <span class="nv">req</span><span class="p">)</span>
</code></pre></div>
<p>You can see where this is going. <code>(assoc-in req [:launch-specification :network-interfaces 0 :groups 0] "sg-foo")</code>
is nicer than the equivalent java jive, but it is not a thing of beauty.</p>
<p>Also, take a look at that <code>:user-data</code> field. The erudition of my readers is such that they will immediately
recognize this as the base-64 encoding of "hello sailor", but Clojure is not that clever (yet). And it gets worse.
Using amazonica with Amazon's Simple Query/Notification Services (SNS/SQS), you may encounter a response that</p>
<ul>
<li>comes to us as a map</li>
<li>one of whose values is a JSON-encoded string</li>
<li>which in turn contains another base-64 encoded string,</li>
<li>which for me happened to contain yet more name-value pairs.</li>
</ul>
<p>Can we find a way to deal with such complexity in a way that feels simple?</p>
<h4>Want, want want!</h4>
<p>What I want is:</p>
<ol>
<li>Paths that allow arbitrary transformations along the lookup/retrieval path.</li>
<li>A convenient way to specify a dictionary of aliases to such paths.</li>
<li>The usual lensy guff of special-purpose getters, setters and updaters.</li>
</ol>
<h4>Paths with arbitrary transformations</h4>
<p>In the example AWS request above, we would like to set and get the <code>:user-data</code> field as a plain old string, without
worrying about the encoding. Specifically, I'd like two functions:</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="nf">ph-get-in</span> <span class="nv">m</span> <span class="nv">ks</span><span class="p">)</span>
<span class="p">(</span><span class="nf">ph-assoc-in</span> <span class="nv">m</span> <span class="nv">ks</span><span class="p">)</span>
</code></pre></div>
<p>which work just like their <code>ph-</code>less forebearers, but accommodate special transformation entries of the form</p>
<div class="highlight"><pre><span></span><code> <span class="p">[</span><span class="nv">f-incoming</span> <span class="nv">f-outgoing</span><span class="p">]</span>
</code></pre></div>
<p>where <code>f-incoming</code> transforms data on its way into the map, and <code>f-outgoing</code> transforms entries read from the map.
For <code>:user-data</code>, we'll have<sup id="fnref:base64"><a class="footnote-ref" href="#fn:base64">2</a></sup></p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="k">def </span><span class="nv">upath</span> <span class="p">[</span><span class="ss">:launch-specification</span> <span class="ss">:user-data</span> <span class="p">[</span><span class="nv">s->b64</span> <span class="nv">b64->s</span><span class="p">]])</span>
</code></pre></div>
<p>so <code>(ph-get-in req-map upath)</code> should return "hello sailor", and
<code>(ph-assoc-in req-map upath "lord love a duck")</code> returns a map where the <code>:user-data</code>
field is "bG9yZCBsb3ZlIGEgZHVjaw==".</p>
<p>Let's start by looking the recursive definition of the existing <code>clojure.core/assoc-in</code>:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">assoc-in</span> <span class="p">[</span><span class="nv">m</span> <span class="p">[</span><span class="nv">k</span> <span class="o">&</span> <span class="nv">ks</span><span class="p">]</span> <span class="nv">v</span><span class="p">]</span>
<span class="p">(</span><span class="k">if </span><span class="nv">ks</span>
<span class="p">(</span><span class="nb">assoc </span><span class="nv">m</span> <span class="nv">k</span> <span class="p">(</span><span class="nf">assoc-in</span> <span class="p">(</span><span class="nb">get </span><span class="nv">m</span> <span class="nv">k</span><span class="p">)</span> <span class="nv">ks</span> <span class="nv">v</span><span class="p">))</span>
<span class="p">(</span><span class="nb">assoc </span><span class="nv">m</span> <span class="nv">k</span> <span class="nv">v</span><span class="p">)))</span>
</code></pre></div>
<p>which we'll modify very slightly to detect a final <code>k</code> of the form <code>[f-incoming f-outgoing]</code>:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">ph-assoc-in-v1</span> <span class="p">[</span><span class="nv">m</span> <span class="p">[</span><span class="nv">k</span> <span class="o">&</span> <span class="nv">ks</span><span class="p">]</span> <span class="nv">v</span><span class="p">]</span>
<span class="p">(</span><span class="nf">cond</span>
<span class="p">(</span><span class="nb">vector? </span><span class="nv">k</span><span class="p">)</span> <span class="p">((</span><span class="nb">first </span><span class="nv">k</span><span class="p">)</span> <span class="nv">v</span><span class="p">)</span>
<span class="nv">ks</span> <span class="p">(</span><span class="nb">assoc </span><span class="nv">m</span> <span class="nv">k</span> <span class="p">(</span><span class="nf">ph-assoc-in</span> <span class="p">(</span><span class="nb">get </span><span class="nv">m</span> <span class="nv">k</span><span class="p">)</span> <span class="nv">ks</span> <span class="nv">v</span><span class="p">))</span>
<span class="nv">k</span> <span class="p">(</span><span class="nb">assoc </span><span class="nv">m</span> <span class="nv">k</span> <span class="nv">v</span><span class="p">)</span>
<span class="ss">:else</span> <span class="nv">v</span><span class="p">))</span>
</code></pre></div>
<p>and just apply <code>f-incoming</code> to the the value <code>v</code>.</p>
<p>But... that isn't exactly what I asked for! I want to allow arbitrary transformations <em>anywhere</em> along
the path, not just at the very end. For instance, I want to be able to get at the <code>:c</code> value in
<code>{:a 1 :b "{:c 2}"}</code> so that</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="nf">ph-assoc-in</span> <span class="p">{</span><span class="ss">:a</span> <span class="mi">1</span> <span class="ss">:b</span> <span class="s">"{:c 2}"</span><span class="p">}</span>
<span class="p">[</span><span class="ss">:b</span> <span class="p">[</span><span class="nb">pr-str </span><span class="nv">read-string</span><span class="p">]</span> <span class="ss">:c</span> <span class="p">[</span><span class="nb">inc </span><span class="nv">dec</span><span class="p">]]</span> <span class="mi">5</span><span class="p">)</span>
</code></pre></div>
<p>should return <code>{:b "{:c 6}", :a 1}</code>, both incrementing the <code>:c</code> value and putting it back into string form.</p>
<p>In version 1, the <code>(vector? k)</code> clause just assumed we were at the end of the path and applied <code>f-incoming</code>,
but if we're the middle of the path (e.g. we just encountered <code>"{:c 6}"</code>), we need to</p>
<ol>
<li>
<p>transform whatever weird thing we found (in the example above, a string)
back into a <code>java.lang.Associative</code> (in the example, a map)
by calling the <em>outgoing</em> function (<code>read-string</code>, in this case),</p>
<div class="highlight"><pre><span></span><code>(let [[f-incoming f-outgoing] k
m (f-outgoing m)
</code></pre></div>
</li>
<li>
<p>insert the innernext layer by recursively calling ourselves (exactly as <code>assoc-in</code> does),</p>
<div class="highlight"><pre><span></span><code> m (ph-assoc-in m ks v)]
</code></pre></div>
</li>
<li>
<p>and transform the modified map back into its original form (here, with <code>pr-str</code>),</p>
<div class="highlight"><pre><span></span><code> (f-incoming m))
</code></pre></div>
</li>
<li>
<p>to be returned to our recursive caller.</p>
</li>
</ol>
<p>Putting that all together, along with handling the case where we <em>are</em> at the end of the path:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">ph-assoc-in</span> <span class="p">[</span><span class="nv">m</span> <span class="p">[</span><span class="nv">k</span> <span class="o">&</span> <span class="nv">ks</span><span class="p">]</span> <span class="nv">v</span><span class="p">]</span>
<span class="c1">;;(println " m=" (pr-str m) "k=" k)</span>
<span class="p">(</span><span class="nf">cond</span>
<span class="p">(</span><span class="nb">vector? </span><span class="nv">k</span><span class="p">)</span> <span class="p">(</span><span class="k">let </span><span class="p">[[</span><span class="nv">f-incoming</span> <span class="nv">f-outgoing</span><span class="p">]</span> <span class="nv">k</span><span class="p">]</span>
<span class="p">(</span><span class="nf">f-incoming</span> <span class="p">(</span><span class="nb">if-not </span><span class="nv">ks</span> <span class="nv">v</span> <span class="p">(</span><span class="nf">ph-assoc-in</span> <span class="p">(</span><span class="nf">f-outgoing</span> <span class="nv">m</span><span class="p">)</span> <span class="nv">ks</span> <span class="nv">v</span><span class="p">))))</span>
<span class="nv">ks</span> <span class="p">(</span><span class="nb">assoc </span><span class="nv">m</span> <span class="nv">k</span> <span class="p">(</span><span class="nf">ph-assoc-in</span> <span class="p">(</span><span class="nb">get </span><span class="nv">m</span> <span class="nv">k</span><span class="p">)</span> <span class="nv">ks</span> <span class="nv">v</span><span class="p">))</span>
<span class="nv">k</span> <span class="p">(</span><span class="nb">assoc </span><span class="nv">m</span> <span class="nv">k</span> <span class="nv">v</span><span class="p">)</span>
<span class="ss">:else</span> <span class="nv">v</span><span class="p">))</span>
</code></pre></div>
<p>That's better. We can now transform whatever we find in the nested map, wherever we find it.
Uncommenting the <code>println</code>, we can see what happens very clearly:</p>
<div class="highlight"><pre><span></span><code><span class="nv">acyclic.utils.pinhole></span> <span class="p">(</span><span class="nf">ph-assoc-in</span> <span class="p">{</span><span class="ss">:a</span> <span class="mi">1</span> <span class="ss">:b</span> <span class="s">"{:c 2}"</span><span class="p">}</span> <span class="p">[</span><span class="ss">:b</span> <span class="p">[</span><span class="nb">pr-str </span><span class="nv">read-string</span><span class="p">]</span> <span class="ss">:c</span> <span class="p">[</span><span class="nb">inc </span><span class="nv">dec</span><span class="p">]]</span> <span class="mi">5</span><span class="p">)</span>
<span class="nv">m=</span> <span class="p">{</span><span class="ss">:b</span> <span class="s">"{:c 2}"</span>, <span class="ss">:a</span> <span class="mi">1</span><span class="p">}</span> <span class="nv">k=</span> <span class="ss">:b</span>
<span class="nv">m=</span> <span class="s">"{:c 2}"</span> <span class="nv">k=</span> <span class="p">[</span><span class="o">#</span><span class="nv"><core$pr_str</span> <span class="nv">clojure.core$pr_str</span><span class="o">@</span><span class="mi">6006</span><span class="nv">f3fa></span> <span class="o">#</span><span class="nv"><core$read_string</span> <span class="nv">clojure.core$read_string</span><span class="o">@</span><span class="mi">246</span><span class="nv">a3dc1></span><span class="p">]</span>
<span class="nv">m=</span> <span class="p">{</span><span class="ss">:c</span> <span class="mi">2</span><span class="p">}</span> <span class="nv">k=</span> <span class="ss">:c</span>
<span class="nv">m=</span> <span class="mi">2</span> <span class="nv">k=</span> <span class="p">[</span><span class="o">#</span><span class="nv"><core$inc</span> <span class="nv">clojure.core$inc</span><span class="o">@</span><span class="mi">196</span><span class="nv">fe8b1></span> <span class="o">#</span><span class="nv"><core$dec</span> <span class="nv">clojure.core$dec</span><span class="o">@</span><span class="mi">7</span><span class="nv">c9ab3af></span><span class="p">]</span>
<span class="p">{</span><span class="ss">:b</span> <span class="s">"{:c 6}"</span>, <span class="ss">:a</span> <span class="mi">1</span><span class="p">}</span>
</code></pre></div>
<p>When we encounter <code>"{:c 2}"</code>, the next "key" in the sequence is <code>[pr-str read-string]</code>, so
<code>read-string</code> gets applied to it before its passed recursively to <code>ph-assoc-in</code>,
which means that the <code>:c</code> that's next gets applied to <code>{:c 2}</code>. At the innermost recursion
the 5 gets <code>inc</code>remented, and then we tumble back up the stack <code>assoc</code>-ing everything
back into place.</p>
<p>The corresponding <code>get-in</code> is a bit simpler, because we have no need for <code>f-incoming</code>, but
<code>f-outgoing</code> is still used to transform weird intermediate values into <code>Associative</code>s:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">ph-get-in</span> <span class="p">[</span><span class="nv">m</span> <span class="p">[</span><span class="nv">k</span> <span class="o">&</span> <span class="nv">ks</span><span class="p">]]</span>
<span class="c1">;;(println " m=" (pr-str m) "k=" k)</span>
<span class="p">(</span><span class="nf">cond</span>
<span class="p">(</span><span class="nb">vector? </span><span class="nv">k</span><span class="p">)</span> <span class="p">(</span><span class="k">let </span><span class="p">[[</span><span class="nv">_</span> <span class="nv">f-outgoing</span><span class="p">]</span> <span class="nv">k</span><span class="p">]</span> <span class="p">(</span><span class="nf">ph-get-in</span> <span class="p">(</span><span class="nf">f-outgoing</span> <span class="nv">k</span><span class="p">)</span> <span class="nv">m</span><span class="p">)</span> <span class="nv">ks</span><span class="p">)</span>
<span class="nv">k</span> <span class="p">(</span><span class="nf">ph-get-in</span> <span class="p">(</span><span class="nb">get </span><span class="nv">m</span> <span class="nv">k</span><span class="p">)</span> <span class="nv">ks</span><span class="p">)</span>
<span class="ss">:else</span> <span class="nv">m</span><span class="p">))</span>
</code></pre></div>
<p>Turning on printing:</p>
<div class="highlight"><pre><span></span><code><span class="nv">acyclic.utils.pinhole></span> <span class="p">(</span><span class="nf">ph-get-in</span> <span class="p">{</span><span class="ss">:a</span> <span class="mi">1</span> <span class="ss">:b</span> <span class="s">"{:c 2}"</span><span class="p">}</span> <span class="p">[</span><span class="ss">:b</span> <span class="p">[</span><span class="nb">pr-str </span><span class="nv">read-string</span><span class="p">]</span> <span class="ss">:c</span> <span class="p">[</span><span class="nb">inc </span><span class="nv">dec</span><span class="p">]])</span>
<span class="nv">m=</span> <span class="p">{</span><span class="ss">:b</span> <span class="s">"{:c 2}"</span>, <span class="ss">:a</span> <span class="mi">1</span><span class="p">}</span> <span class="nv">k=</span> <span class="ss">:b</span>
<span class="nv">m=</span> <span class="s">"{:c 2}"</span> <span class="nv">k=</span> <span class="p">[</span><span class="o">#</span><span class="nv"><core$pr_str</span> <span class="nv">clojure.core$pr_str</span><span class="o">@</span><span class="mi">6006</span><span class="nv">f3fa></span> <span class="o">#</span><span class="nv"><core$read_string</span> <span class="nv">clojure.core$read_string</span><span class="o">@</span><span class="mi">246</span><span class="nv">a3dc1></span><span class="p">]</span>
<span class="nv">m=</span> <span class="p">{</span><span class="ss">:c</span> <span class="mi">2</span><span class="p">}</span> <span class="nv">k=</span> <span class="ss">:c</span>
<span class="nv">m=</span> <span class="mi">2</span> <span class="nv">k=</span> <span class="p">[</span><span class="o">#</span><span class="nv"><core$inc</span> <span class="nv">clojure.core$inc</span><span class="o">@</span><span class="mi">196</span><span class="nv">fe8b1></span> <span class="o">#</span><span class="nv"><core$dec</span> <span class="nv">clojure.core$dec</span><span class="o">@</span><span class="mi">7</span><span class="nv">c9ab3af></span><span class="p">]</span>
<span class="nv">m=</span> <span class="mi">1</span> <span class="nv">k=</span> <span class="nv">nil</span>
<span class="mi">1</span>
</code></pre></div>
<h4>A dictionary of aliases</h4>
<p>In my AWS example, there are several fields I will want to change frequently, and it would be handy
to have a dictionary of aliases to the complex paths to these fields, e.g.:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="k">def </span><span class="nv">path-dict</span>
<span class="p">{</span><span class="ss">:zone</span> <span class="p">[</span><span class="ss">:launch-specification</span> <span class="ss">:placement</span> <span class="ss">:availability-zone</span><span class="p">]</span>
<span class="ss">:public?</span> <span class="p">[</span><span class="ss">:launch-specification</span> <span class="ss">:network-interfaces</span> <span class="mi">0</span> <span class="ss">:associate-public-ip-address</span><span class="p">]</span>
<span class="ss">:udata</span> <span class="p">[</span><span class="ss">:launch-specification</span> <span class="ss">:user-data</span> <span class="p">[</span><span class="nv">s->b64</span> <span class="nv">b64->s</span><span class="p">]]}</span> <span class="p">)</span>
</code></pre></div>
<p>Moreover, I shouldn't have to remember when I'm using an alias, so it would be nice to have a functions that worked
like regular <code>get</code>/<code>assoc</code> when called with a keyword that isn't an alias for something else, even
with a full path. Let's pull this massaging process into its own function, with returns a path,
almost irrespective of what it's fed:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">condition-key</span> <span class="p">[</span><span class="nv">path-dict</span> <span class="nv">k</span><span class="p">]</span>
<span class="p">(</span><span class="nf">cond</span>
<span class="p">(</span><span class="nf">sequential?</span> <span class="nv">k</span><span class="p">)</span> <span class="nv">k</span>
<span class="p">(</span><span class="nf">path-dict</span> <span class="nv">k</span><span class="p">)</span> <span class="p">(</span><span class="nf">path-dict</span> <span class="nv">k</span><span class="p">)</span>
<span class="ss">:else</span> <span class="p">[</span><span class="nv">k</span><span class="p">]))</span>
</code></pre></div>
<p>With this, <code>ph-get</code> is essentially trivial.</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">ph-get</span> <span class="p">[</span><span class="nv">path-dict</span> <span class="nv">o</span> <span class="nv">k</span><span class="p">]</span> <span class="p">(</span><span class="nf">ph-get-in</span> <span class="nv">o</span> <span class="p">(</span><span class="nf">condition-key</span> <span class="nv">path-dict</span> <span class="nv">k</span><span class="p">)))</span>
</code></pre></div>
<p>Now <code>(ph-get paths-dict req-map :spot-price)</code> will return the unadulterated field, while
<code>(ph-get path-dict req-map :udata)</code> does a fancy base-64 transformation.</p>
<p>As the <code>core/assoc</code> can take an arbitrary number of interleaved key-value arguments, we should be able to do so as well.
We'll knead the arguments into a list of key-value pairs and reduce over <code>ph-assoc-in</code> to insert the
values sequentially.</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">ph-assoc</span> <span class="p">[</span><span class="nv">paths</span> <span class="nv">m</span> <span class="o">&</span> <span class="nv">kvs</span><span class="p">]</span>
<span class="p">(</span><span class="nb">reduce </span><span class="p">(</span><span class="k">fn </span><span class="p">[</span><span class="nv">m</span> <span class="p">[</span><span class="nv">k</span> <span class="nv">v</span><span class="p">]]</span>
<span class="p">(</span><span class="nf">ph-assoc-in</span> <span class="nv">m</span> <span class="p">(</span><span class="nf">condition-key</span> <span class="nv">paths</span> <span class="nv">k</span><span class="p">)</span> <span class="nv">v</span><span class="p">))</span> <span class="nv">m</span> <span class="p">(</span><span class="nf">partition</span> <span class="mi">2</span> <span class="nv">kvs</span><span class="p">)))</span>
</code></pre></div>
<h4>The lensy guff</h4>
<p>The <a href="http://eed3si9n.com/learning-scalaz/Lens.html">scalaz tutorial</a> shows how to a create a <code>turtleX</code>, which focuses
directly on terrapinic abscissae:</p>
<div class="highlight"><pre><span></span><code><span class="n">scala</span><span class="o">></span> <span class="n">turtleX</span><span class="p">.</span><span class="n">set</span><span class="p">(</span><span class="n">t0</span><span class="p">,</span> <span class="mf">5.0</span><span class="p">)</span>
<span class="n">res17</span><span class="p">:</span> <span class="n">scalaz</span><span class="p">.</span><span class="nc">Id</span><span class="p">.</span><span class="nc">Id</span><span class="p">[</span><span class="nc">Turtle</span><span class="p">]</span> <span class="o">=</span> <span class="nc">Turtle</span><span class="p">(</span><span class="nc">Point</span><span class="p">(</span><span class="mf">5.0</span><span class="p">,</span><span class="mf">3.0</span><span class="p">),</span><span class="mf">0.0</span><span class="p">,</span><span class="nc">Color</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span><span class="o">-</span><span class="mi">1</span><span class="p">))</span>
</code></pre></div>
<p>Take a look, if you like, at how <code>turtleX</code> is built.
To my taste, it's a bit lengthy and boilerplated,<sup id="fnref:bp"><a class="footnote-ref" href="#fn:bp">3</a></sup> though admittedly you get lots of type goodness in the bargain.</p>
<p>I'd like to to be able to make the lens with a one-liner:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="k">def </span><span class="nv">txget</span> <span class="p">(</span><span class="nf">mk-ph-get</span> <span class="p">[</span><span class="ss">:position</span> <span class="ss">:x</span><span class="p">]))</span>
</code></pre></div>
<p>and then use (or pass about) <code>txget</code> as a normal function <code>(txget t0)</code> for retrieving
position.</p>
<p>With what we have so far, the necessary machinery is a <em>two</em>-liner:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">mk-ph-get</span> <span class="p">[</span><span class="nv">ks</span><span class="p">]</span> <span class="p">(</span><span class="k">fn </span><span class="p">[</span><span class="nv">o</span><span class="p">]</span> <span class="p">(</span><span class="nf">ph-get-in</span> <span class="nv">o</span> <span class="nv">ks</span><span class="p">)))</span>
<span class="p">(</span><span class="kd">defn </span><span class="nv">mk-ph-set</span> <span class="p">[</span><span class="nv">ks</span><span class="p">]</span> <span class="p">(</span><span class="k">fn </span><span class="p">[</span><span class="nv">o</span> <span class="nv">v</span><span class="p">]</span> <span class="p">(</span><span class="nf">ph-assoc-in</span> <span class="nv">o</span> <span class="nv">ks</span> <span class="nv">v</span><span class="p">)))</span>
</code></pre></div>
<p>What about modification in place? To start, imagine a function to turn the turtle to the right by
some amount. Our user would provide a function of the original angle and the amount to turn, and
return the new angle,<sup id="fnref:radians"><a class="footnote-ref" href="#fn:radians">4</a></sup></p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="kd">defn </span><span class="nv">new-heading</span> <span class="p">[</span><span class="nv">old-heading</span> <span class="nv">amt</span><span class="p">]</span> <span class="p">(</span><span class="nf">mod</span> <span class="p">(</span><span class="nb">- </span><span class="nv">old-heading</span> <span class="nv">amt</span><span class="p">)</span> <span class="p">(</span><span class="nb">* </span><span class="nv">Math/PI</span> <span class="mi">2</span><span class="p">)))</span>
</code></pre></div>
<p>and we'll provide <code>mk-ph-mod</code> such that</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="k">def </span><span class="nv">turn</span> <span class="p">(</span><span class="nf">mk-ph-mod</span> <span class="nv">new-heading</span> <span class="p">[</span><span class="ss">:heading</span><span class="p">]))</span>
</code></pre></div>
<p>so that</p>
<div class="highlight"><pre><span></span><code><span class="nv">acyclic.utils.pinhole></span> <span class="p">(</span><span class="nf">turn</span> <span class="nv">t</span> <span class="p">(</span><span class="nb">/ </span><span class="nv">Math/PI</span> <span class="mi">2</span><span class="p">))</span>
<span class="nv">m=</span> <span class="o">#</span><span class="nv">user.Turtle</span><span class="p">{</span><span class="ss">:position</span> <span class="o">#</span><span class="nv">user.Point</span><span class="p">{</span><span class="ss">:x</span> <span class="mf">1.0</span>, <span class="ss">:y</span> <span class="mf">2.0</span><span class="p">}</span>, <span class="ss">:heading</span> <span class="mf">0.7853981633974483</span>, <span class="ss">:color</span> <span class="o">#</span><span class="nv">user.Color</span><span class="p">{</span><span class="ss">:r</span> <span class="mi">255</span>, <span class="ss">:g</span> <span class="mi">0</span>, <span class="ss">:b</span> <span class="mi">0</span><span class="p">}}</span> <span class="nv">k=</span> <span class="ss">:heading</span>
<span class="o">#</span><span class="nv">user.Turtle</span><span class="p">{</span><span class="ss">:position</span> <span class="o">#</span><span class="nv">user.Point</span><span class="p">{</span><span class="ss">:x</span> <span class="mf">1.0</span>, <span class="ss">:y</span> <span class="mf">2.0</span><span class="p">}</span>, <span class="ss">:heading</span> <span class="mf">5.497787143782138</span>, <span class="ss">:color</span> <span class="o">#</span><span class="nv">user.Color</span><span class="p">{</span><span class="ss">:r</span> <span class="mi">255</span>, <span class="ss">:g</span> <span class="mi">0</span>, <span class="ss">:b</span> <span class="mi">0</span><span class="p">}}</span>
</code></pre></div>
<p>Cake city. We <code>ph-get-in</code> the heading, apply their function to it, and <code>ph-assoc-in</code> it back:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">mk-ph-mod-v1</span> <span class="p">[</span><span class="nv">f</span> <span class="nv">arg-path</span><span class="p">]</span>
<span class="p">(</span><span class="k">fn </span><span class="p">[</span><span class="nv">o</span> <span class="o">&</span> <span class="nv">more-args</span><span class="p">]</span> <span class="p">(</span><span class="nf">ph-assoc-in</span> <span class="nv">o</span> <span class="nv">arg-path</span> <span class="p">(</span><span class="nb">apply </span><span class="nv">f</span> <span class="p">(</span><span class="nf">ph-get-in</span> <span class="nv">o</span> <span class="nv">arg-path</span><span class="p">)</span> <span class="nv">more-args</span><span class="p">))))</span>
</code></pre></div>
<p>But we can do better...</p>
<h4>Turtles all the way</h4>
<p>The scalaz lens tutorial ends with a lens for moving the turtle forward by some amount in whatever
direction it's currently pointed, so both the x and y coordinates change:</p>
<div class="highlight"><pre><span></span><code><span class="n">scala</span><span class="o">></span> <span class="n">forward</span><span class="p">(</span><span class="mf">10.0</span><span class="p">)(</span><span class="n">t0</span><span class="p">)</span>
<span class="n">res31</span><span class="p">:</span> <span class="p">(</span><span class="nc">Turtle</span><span class="p">,</span> <span class="p">(</span><span class="nc">Double</span><span class="p">,</span> <span class="nc">Double</span><span class="p">))</span> <span class="o">=</span> <span class="p">(</span><span class="nc">Turtle</span><span class="p">(</span><span class="nc">Point</span><span class="p">(</span><span class="mf">12.0</span><span class="p">,</span><span class="mf">3.0</span><span class="p">),</span><span class="mf">0.0</span><span class="p">,</span><span class="nc">Color</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span><span class="o">-</span><span class="mi">1</span><span class="p">)),(</span><span class="mf">12.0</span><span class="p">,</span><span class="mf">3.0</span><span class="p">))</span>
</code></pre></div>
<p>Let's see if we can do the same, ideally in an intuitive, Clojurey way:</p>
<p>We'll tweak <code>mk-ph-mod</code> a bit, to accept a user function that consumes and returns
an arbitrary number of path-specified values. Our turtle enthusiast will provide the
trigonometry in the form of a function that now returns a vector of values:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">movexy</span> <span class="p">[</span><span class="nv">x</span> <span class="nv">y</span> <span class="nv">dir</span> <span class="nv">dist</span><span class="p">]</span> <span class="p">[(</span><span class="nb">+ </span><span class="nv">x</span> <span class="p">(</span><span class="nb">* </span><span class="nv">dist</span> <span class="p">(</span><span class="nf">Math/cos</span> <span class="nv">dir</span><span class="p">)))</span>
<span class="p">(</span><span class="nb">+ </span><span class="nv">y</span> <span class="p">(</span><span class="nb">* </span><span class="nv">dist</span> <span class="p">(</span><span class="nf">Math/sin</span> <span class="nv">dir</span><span class="p">)))])</span>
</code></pre></div>
<p>And we'd like to be able to use it like this:</p>
<div class="highlight"><pre><span></span><code><span class="nv">user></span> <span class="p">(</span><span class="k">def </span><span class="nv">turtle-forward</span> <span class="p">(</span><span class="nf">mk-ph-mod</span> <span class="nv">movexy</span> <span class="p">[</span><span class="ss">:position</span> <span class="ss">:x</span><span class="p">]</span> <span class="p">[</span><span class="ss">:position</span> <span class="ss">:y</span><span class="p">]</span> <span class="p">[</span><span class="ss">:heading</span><span class="p">]))</span>
<span class="nv">user></span> <span class="p">(</span><span class="nf">turtle-forward</span> <span class="nv">t</span> <span class="mi">100</span><span class="nv">.</span><span class="p">)</span>
<span class="o">#</span><span class="nv">user.Turtle</span><span class="p">{</span><span class="ss">:position</span> <span class="o">#</span><span class="nv">user.Point</span><span class="p">{</span><span class="ss">:x</span> <span class="mf">71.71067811865476</span>, <span class="ss">:y</span> <span class="mf">72.71067811865474</span><span class="p">}</span>, <span class="ss">:heading</span> <span class="mf">0.7853981633974483</span>, <span class="ss">:color</span> <span class="o">#</span><span class="nv">user.Color</span><span class="p">{</span><span class="ss">:r</span> <span class="mi">255</span>, <span class="ss">:g</span> <span class="mi">0</span>, <span class="ss">:b</span> <span class="mi">0</span><span class="p">}}</span>
</code></pre></div>
<p>All we need is <code>mk-ph-mod</code>, which will</p>
<ol>
<li>
<p>return a function of the object and maybe some run-time arguments,</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">mk-ph-mod</span> <span class="p">[</span><span class="nv">f</span> <span class="o">&</span> <span class="nv">arg-paths</span><span class="p">]</span>
<span class="p">(</span><span class="kd">fn </span><span class="p">[</span><span class="nv">o</span> <span class="o">&</span> <span class="nv">more-args</span><span class="p">]</span>
</code></pre></div>
</li>
<li>
<p>that extracts values for all the paths specified as the 2nd and subsequent arguments to <code>mk-ph-mod</code>,</p>
<div class="highlight"><pre><span></span><code> (let [args (map (partial ph-get-in o) arg-paths)
</code></pre></div>
</li>
<li>
<p>passes them to the the user function, expecting it to return a vector of new values for one or more of those paths,</p>
<div class="highlight"><pre><span></span><code> vs (apply f (concat args more-args))
</code></pre></div>
</li>
<li>
<p>creates path/value pairs for all the values returned, and</p>
<div class="highlight"><pre><span></span><code> kvs (map vector arg-paths vs)]
</code></pre></div>
</li>
<li>
<p>reduces over the pairs, tucking them into the structure with <code>ph-assoc-in</code>.</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="n">reduce</span> <span class="p">(</span><span class="n">fn</span> <span class="p">[</span><span class="n">m</span> <span class="p">[</span><span class="n">k</span> <span class="n">v</span><span class="p">]]</span> <span class="p">(</span><span class="n">ph</span><span class="o">-</span><span class="n">assoc</span><span class="o">-</span><span class="k">in</span> <span class="n">m</span> <span class="n">k</span> <span class="n">v</span><span class="p">))</span> <span class="n">o</span> <span class="n">kvs</span><span class="p">))))</span>
</code></pre></div>
</li>
</ol>
<p>All together:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">mk-ph-mod</span> <span class="p">[</span><span class="nv">f</span> <span class="o">&</span> <span class="nv">arg-paths</span><span class="p">]</span>
<span class="p">(</span><span class="k">fn </span><span class="p">[</span><span class="nv">o</span> <span class="o">&</span> <span class="nv">more-args</span><span class="p">]</span>
<span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">args</span> <span class="p">(</span><span class="nb">map </span><span class="p">(</span><span class="nb">partial </span><span class="nv">ph-get-in</span> <span class="nv">o</span><span class="p">)</span> <span class="nv">arg-paths</span><span class="p">)</span>
<span class="nv">vs</span> <span class="p">(</span><span class="nb">apply </span><span class="nv">f</span> <span class="p">(</span><span class="nb">concat </span><span class="nv">args</span> <span class="nv">more-args</span><span class="p">))</span>
<span class="nv">kvs</span> <span class="p">(</span><span class="nb">map vector </span><span class="nv">arg-paths</span> <span class="nv">vs</span><span class="p">)]</span>
<span class="p">(</span><span class="nb">reduce </span><span class="p">(</span><span class="k">fn </span><span class="p">[</span><span class="nv">m</span> <span class="p">[</span><span class="nv">k</span> <span class="nv">v</span><span class="p">]]</span> <span class="p">(</span><span class="nf">ph-assoc-in</span> <span class="nv">m</span> <span class="nv">k</span> <span class="nv">v</span><span class="p">))</span> <span class="nv">o</span> <span class="nv">kvs</span><span class="p">))))</span>
</code></pre></div>
<p>Et voila.</p>
<p>We can now trivially build functions that do nearly anything to a structure of nested <code>Associative</code>s.</p>
<h4>Idiomatic Lenses for Clojure</h4>
<p>I'm anticipating complaints that these aren't <em>real lenses</em>, because
those would be defined by the Haskell and Scalaz varieties that came came first,
and a huge piece of what they do is ensure type correctness. The "pinhole" sobriquet
is supposed to preempt such whiners by proudly owning the more primitive technology.</p>
<p>Another possible complaint is that Clojure already has <code>assoc-in</code> and <code>get-in</code>,
and it's true that they are quite sufficient for many cases, and if they're not lenses, then
neither is a more slightly complicated variety.</p>
<p>Nonetheless, whatever it is that we have here is useful for the thing
that lenses are useful for, irrespective of language: focusing on data
deep within a structure. What's more, thanks to the expressive power
of dynamic Clojure, and higher order functions, these lenses are not
just simple to use but simple to create.</p>
<p>The code discussed
all lives <a href="https://github.com/pnf/clj-utils">on github</a> along with with some other utilities I'm still hacking at.
At some point, they'll make their way to clojars.</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="nf">ph-get-in</span> <span class="nv">m</span> <span class="nv">path</span><span class="p">)</span>
<span class="p">(</span><span class="nf">ph-assoc-in</span> <span class="nv">m</span> <span class="nb">path </span><span class="nv">v</span><span class="p">)</span>
<span class="p">(</span><span class="nf">ph-assoc</span> <span class="nv">paths-dict</span> <span class="nv">m</span> <span class="nv">path1</span> <span class="nv">v1</span> <span class="nv">path2</span> <span class="nv">v2</span> <span class="nv">...</span><span class="p">)</span>
<span class="p">(</span><span class="nf">ph-get</span> <span class="nv">path-dict</span> <span class="nv">m</span> <span class="nv">path-or-key-or-alias</span><span class="p">)</span>
<span class="p">(</span><span class="nf">ph-assoc</span> <span class="nv">path-dict</span> <span class="nv">m</span> <span class="nv">path-or-key-or-alias</span> <span class="nv">v</span><span class="p">)</span>
<span class="p">(</span><span class="nf">mk-ph-set</span> <span class="nv">path</span><span class="p">)</span> <span class="nb">or </span> <span class="p">(</span><span class="nf">mk-ph-set</span> <span class="nv">path-dict</span> <span class="nv">path-or-key-or-alias</span><span class="p">)</span>
<span class="p">(</span><span class="nf">mk-ph-get</span> <span class="nv">path</span><span class="p">)</span> <span class="nb">or </span> <span class="p">(</span><span class="nf">mk-ph-get</span> <span class="nv">path-dict</span> <span class="nv">path-or-key-or-alias</span><span class="p">)</span>
<span class="p">(</span><span class="nf">mk-ph-mod</span> <span class="nv">f</span> <span class="nv">arg-path1</span> <span class="nv">arg-path2</span> <span class="nv">...</span><span class="p">)</span> <span class="nb">or </span> <span class="p">(</span><span class="nf">mk-ph-mod</span> <span class="nv">path-dict</span> <span class="nv">f</span> <span class="nv">arg-key1</span> <span class="nv">arg-key2</span> <span class="nv">...</span><span class="p">)</span>
</code></pre></div>
<div class="footnote">
<hr>
<ol>
<li id="fn:german">
<p>I don't know German. <a class="footnote-backref" href="#fnref:german" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
<li id="fn:base64">
<p>The functions are defined using <code>clojure.data.codec.base64</code> as <code>(defn s->b64 [s] (String. (b64/encode (.getBytes s))))</code> and <code>(defn b64->s [s] (String. (b64/decode (.getBytes s))))</code>. <a class="footnote-backref" href="#fnref:base64" title="Jump back to footnote 2 in the text">↩</a></p>
</li>
<li id="fn:bp">
<p>An etymologically naive back-formation, but I like it. <a class="footnote-backref" href="#fnref:bp" title="Jump back to footnote 3 in the text">↩</a></p>
</li>
<li id="fn:radians">
<p>Subtraction because, by convention, zero radians points to 3:00, with increases moving clockwise. Modulus, because we're because the circle is 2 π around. <a class="footnote-backref" href="#fnref:radians" title="Jump back to footnote 4 in the text">↩</a></p>
</li>
</ol>
</div>'Ducers Wild -- a concise guide to the menagerie2014-09-14T00:00:00-04:002014-09-14T00:00:00-04:00Peter Fraenkeltag:blog.podsnap.com,2014-09-14:/ducers.html<h3>TL;DR</h3>
<p>It's not too long, but, to summarize the summary, if you read
Rich Hickey's 2014 blog post on
<a href="http://blog.cognitect.com/blog/2014/8/6/transducers-are-coming">transducers</a>
first, his 2012 post on
<a href="http://clojure.com/blog/2012/05/08/reducers-a-library-and-model-for-collection-processing.html">reducers</a>
will be easier to understand.</p>
<h3>Brief Definitions</h3>
<p>Herewith, all in one place, are Clojuresque definitions of:</p>
<ul>
<li>reducible</li>
<li>reducing function</li>
<li>transducer</li>
<li>reducer</li>
<li>folder</li>
<li>decomplected</li>
</ul>
<p>Longer elaborations of these definitions follow in the subsequent section.</p>
<h4>reducing function</h4>
<p>Anything that can be used as the <em>first</em> argument of the <code>reduce</code> function, e.g.
<code>+</code> or <code>conj</code>. Generally, it's a binary function that returns something of
the type of its first argument, which is supposed to be a …</p><h3>TL;DR</h3>
<p>It's not too long, but, to summarize the summary, if you read
Rich Hickey's 2014 blog post on
<a href="http://blog.cognitect.com/blog/2014/8/6/transducers-are-coming">transducers</a>
first, his 2012 post on
<a href="http://clojure.com/blog/2012/05/08/reducers-a-library-and-model-for-collection-processing.html">reducers</a>
will be easier to understand.</p>
<h3>Brief Definitions</h3>
<p>Herewith, all in one place, are Clojuresque definitions of:</p>
<ul>
<li>reducible</li>
<li>reducing function</li>
<li>transducer</li>
<li>reducer</li>
<li>folder</li>
<li>decomplected</li>
</ul>
<p>Longer elaborations of these definitions follow in the subsequent section.</p>
<h4>reducing function</h4>
<p>Anything that can be used as the <em>first</em> argument of the <code>reduce</code> function, e.g.
<code>+</code> or <code>conj</code>. Generally, it's a binary function that returns something of
the type of its first argument, which is supposed to be a kind of accumulation.</p>
<h4>reducible</h4>
<p>Anything that can be used as the <em>second</em> argument of the <code>reduce</code> function.
For the purposes of the next few column inches, it's a collection - a vector, list, etc.</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="nb">reduce </span><span class="nv">reducing-function</span> <span class="nv">reducible</span><span class="p">)</span>
</code></pre></div>
<h4>transducer</h4>
<p>A function that turns one <strong>reducing function</strong> into another, by modifying the
second argument or invoking the input function a number of times other than one.</p>
<h4>reducer</h4>
<p>I lied a little when I defined <strong>reducible</strong>; actually, it's anything that implements
the <code>CollReduce</code> protocol, whose <code>coll-reduce</code> method is invoked by <code>reduce</code>.
Thus, the exact behavior of <code>reduce</code> is a property of the specific thing being reduced.
A <strong>reducer</strong> is a just a special kind of <strong>reducible</strong> that has a <strong>transducer</strong>
embedded in it, ready to be invoked whenever <code>reduce</code> is finally called.</p>
<h4>folder</h4>
<p>A <strong>reducible</strong> that also implements <code>CollFold</code>, enabling you to to use
<code>fold</code> in lieu of <code>reduce</code>; the two functions yield
the same answer, but <code>fold</code> can be parallelized.</p>
<h4>decomplected</h4>
<p>Basically, a synonym for "separated," as in "separation of concerns."</p>
<h3>Elaborations</h3>
<h4>Transducer</h4>
<p>First off, don't be misled by the
<a href="https://en.wikipedia.org/wiki/Transducer">real meaning</a>, which isn't
even a good metaphor for the way we're using it here.
For my sins, I
<a href="http://newcatalog.library.cornell.edu/catalog/1766822">know a little</a> about
transducers.</p>
<p>Here's a transducer that doubles all incoming elements before passing them
to the input reducing function:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">double1</span> <span class="p">[</span><span class="nv">reduction-function</span><span class="p">]</span>
<span class="p">(</span><span class="k">fn </span><span class="p">[</span><span class="nv">result</span> <span class="nv">input</span><span class="p">]</span>
<span class="p">(</span><span class="nf">reduction-function</span> <span class="nv">result</span> <span class="p">(</span><span class="nb">* </span><span class="mi">2</span> <span class="nv">input</span><span class="p">))))</span>
</code></pre></div>
<p>So that <code>(reduce (double1 +) 0 c)</code> would produce $2\sum_i c_i$
while <code>(reduce (double1 *) 1 c)</code> would return
$\prod^n_i 2 c_i = 2^n \prod^n_i c_i$.
And here's one that doubles incoming elements in a different way, by repeating them twice:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">double2</span> <span class="p">[</span><span class="nv">reduction-function</span><span class="p">]</span>
<span class="p">(</span><span class="k">fn </span><span class="p">[</span><span class="nv">result</span> <span class="nv">input</span><span class="p">]</span>
<span class="p">(</span><span class="nf">reduction-function</span> <span class="p">(</span><span class="nf">reduction-function</span> <span class="nv">result</span> <span class="nv">input</span><span class="p">)</span> <span class="nv">input</span><span class="p">)))</span>
</code></pre></div>
<p>So that <code>(reduce (double2 +) 0 coll)</code> will return
$\sum_i (c_i + c_i) = 2 \sum_i c_i$, the same as with <code>double1</code>,
but <code>(reduce (double2 *) 1 coll)</code> would return the
$\prod_i c_i c_i = (\prod c_i)^2$.</p>
<p>Three cool things to note:</p>
<ol>
<li>The things can be composed, e.g. <code>(reduce ((comp double1 double2) +) 0 c)</code>
gives you
$\sum_i 2(c_i +c_i) = 4\sum_i c_i$.</li>
<li>
<p>The needs of <code>reduce</code> are a superset of those of <code>map</code>.
You could create a kind of <code>map</code> that takes reducing functions
instead of unary functions, and, unlike <code>reduce</code>, it can be lazy:</p>
<div class="highlight"><pre><span></span><code><span class="n">defn</span> <span class="n">process</span> <span class="p">[</span><span class="n">xfn</span> <span class="n">c</span><span class="p">]</span>
<span class="p">(</span><span class="n">lazy</span><span class="o">-</span><span class="n">seq</span> <span class="p">(</span><span class="n">when</span><span class="o">-</span><span class="n">let</span> <span class="p">[</span><span class="n">s</span> <span class="p">(</span><span class="n">seq</span> <span class="n">c</span><span class="p">)]</span>
<span class="p">(</span><span class="n">concat</span> <span class="p">((</span><span class="n">xfn</span> <span class="err">#</span><span class="p">(</span><span class="n">concat</span> <span class="o">%</span><span class="mi">1</span> <span class="p">(</span><span class="n">list</span> <span class="o">%</span><span class="mi">2</span><span class="p">)))</span> <span class="err">'</span><span class="p">()</span> <span class="p">(</span><span class="n">first</span> <span class="n">s</span><span class="p">))</span>
<span class="p">(</span><span class="n">process</span> <span class="n">xfn</span> <span class="p">(</span><span class="n">rest</span> <span class="n">s</span><span class="p">))))))</span>
<span class="p">;;</span> <span class="p">(</span><span class="n">process</span> <span class="n">double1</span> <span class="p">[</span><span class="mi">1</span> <span class="mi">2</span> <span class="mi">3</span><span class="p">])</span> <span class="p">;</span> <span class="n">yields</span> <span class="p">(</span><span class="mi">2</span> <span class="mi">4</span> <span class="mi">6</span><span class="p">)</span>
<span class="p">;;</span> <span class="p">(</span><span class="n">process</span> <span class="n">double2</span> <span class="p">[</span><span class="mi">1</span> <span class="mi">2</span> <span class="mi">3</span><span class="p">])</span> <span class="p">;</span> <span class="n">yields</span> <span class="p">(</span><span class="mi">1</span> <span class="mi">1</span> <span class="mi">2</span> <span class="mi">2</span> <span class="mi">3</span> <span class="mi">3</span><span class="p">)</span>
<span class="p">;;</span> <span class="p">(</span><span class="n">take</span> <span class="mi">10</span> <span class="p">(</span><span class="n">process</span> <span class="n">double2</span> <span class="p">(</span><span class="n">range</span><span class="p">)))</span> <span class="p">;;</span> <span class="n">yields</span> <span class="p">(</span><span class="mi">0</span> <span class="mi">0</span> <span class="mi">1</span> <span class="mi">1</span> <span class="mi">2</span> <span class="mi">2</span> <span class="mi">3</span> <span class="mi">3</span> <span class="mi">4</span> <span class="mi">4</span><span class="p">)</span>
</code></pre></div>
<p>In fact, the <code>sequence</code> function in Clojure 1.7 will have a two-argument
form</p>
<div class="highlight"><pre><span></span><code><span class="p">([</span><span class="n">xform</span> <span class="n">coll</span><span class="p">]</span>
<span class="p">(</span><span class="n">clojure</span><span class="p">.</span><span class="n">lang</span><span class="p">.</span><span class="n">LazyTransformer</span><span class="o">/</span><span class="n">create</span> <span class="n">xform</span> <span class="n">coll</span><span class="p">))</span>
</code></pre></div>
<p>that does the same thing as my <code>process</code> but much more efficiently.</p>
</li>
<li>
<p>As long as the transducer doesn't diddle with <code>result</code>, it
will be totally oblivious to how it is called, from what kind of a
collection, or even if it's a traditional collection at all. The
earliest compelling examples of this generality are
<code>core.async/chan</code>nels, which, in their latest incarnation can
take transducer arguments to process data that flows through them.</p>
</li>
</ol>
<h4>reducer</h4>
<p>For example</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="k">def </span><span class="nv">fatrange</span> <span class="p">(</span><span class="nf">reducer</span> <span class="p">(</span><span class="nb">range </span><span class="mi">10</span><span class="p">)</span> <span class="nv">double1</span><span class="p">))</span>
</code></pre></div>
<p>is a reducible with a multiplication <em>hiding out</em>, waiting to be invoked
when you <code>(reduce + fatrange)</code> Since you can compose any number of
transducers, and they'll all get invoked at the time of reduction,
it's a lot more efficient than the conventional</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="nb">reduce + </span><span class="p">(</span><span class="nb">map </span><span class="nv">f1</span> <span class="p">(</span><span class="nb">map </span><span class="nv">f2</span> <span class="p">(</span><span class="nb">map </span><span class="nv">f3</span> <span class="nv">...</span><span class="p">))))</span>
</code></pre></div>
<p>which creates a vast apparatus of lazy
sequences feeding each other. Moreover, of course, the transducers
can do more than just modify individual elements, as illustrated above
with <code>double2</code>.</p>
<p>The <code>core.reducers</code> namespace ships with a bevvy of functions like <code>r/map</code>,
<code>r/filter</code> - all convenience functions creating reducers with specific
transducers already attached.</p>
<h4>folder</h4>
<p>The differences between the <code>reduce</code> and <code>fold</code> are that the latter:</p>
<ol>
<li>assumes that its <em>reducing function</em> is associative, so that it can</li>
<li>attack the problem in parallel by repeatedly dividing the collection in two, following
the approach famously <a href="http://vimeo.com/6624203">outlined by Guy Steele</a> and</li>
<li>using the
magic of Java's <a href="http://docs.oracle.com/javase/tutorial/essential/concurrency/forkjoin.html">fork/join framework</a>
to perform this work efficiently with a finite number of threads.</li>
</ol>
<p>If you call <code>reduce</code> on a <code>folder</code>, or <code>fold</code> on a <code>reducer</code>, the reduction
proceeds normally. Only if you use <code>fold</code>, and the collection implements <code>CollFold</code> does
the parallelization occur.</p>
<p>Some of the <code>r/whatever</code> methods actually create folders, allowing
them to be used with either <code>fold</code> or <code>reduce</code>.</p>
<p>It's interesting (to me) to notice that, while the <strong>redicible</strong> is the 2nd argument to
<code>reduce</code> (as is the convention in functional languages), it has to be the first
argument in the internal implementations, so that protocol delegation can occur properly.
Under some circumstances, reduction is handled in Java, via the <code>.reduce</code> method
of the <code>IReduce</code> interface. This illustrates the "best of both worlds" approach that
Clojure takes to residence on the JVM. We're straight functional lisp when we want to be,
and we're OO when we need to be.</p>
<h4>decomplected</h4>
<p>Search engines chiefly come up with blog posts about transducers, plus
a few curricula vitae. It's definitely not the opposite of
<a href="http://www.merriam-webster.com/dictionary/complected">complected</a>,
so it definitely has nothing do with <a href="http://www.skinwhitening.org/">this</a>.
One is given to believe that this repurposing in nomenclature is less confusing
than fancy math words like "monoid," by I'm not so sure.</p>Testing on Hundreds of AWS Instances - (sort of Distributed functional programming, part 3a)2014-09-08T00:00:00-04:002014-09-08T00:00:00-04:00Peter Fraenkeltag:blog.podsnap.com,2014-09-08:/girder3-aws.html<p>In <a href="http://blog.podsnap.com/girder.html">part 1</a> and <a href="http://blog.podsnap.com/girder2.html">part2</a> of this almost
unbearably exciting
series, I outlined the concept of distributing purely functional programs and went through some implementation
details of <a href="http://github.com/pnf/girder">Girder</a>.</p>
<p>So far, however, I've only asserted that it works, so today I want to start
explaining how I used Amazon Web Services to test pretty cheaply on gobs of
machines.</p>
<p>The art of AWS wrangling was somewhat new to me, and I went through a few iterations
before deciding on the right mix of tools. Needless to say, the right mix ended up being
fairly Clojure heavy, which helps to smooth out …</p><p>In <a href="http://blog.podsnap.com/girder.html">part 1</a> and <a href="http://blog.podsnap.com/girder2.html">part2</a> of this almost
unbearably exciting
series, I outlined the concept of distributing purely functional programs and went through some implementation
details of <a href="http://github.com/pnf/girder">Girder</a>.</p>
<p>So far, however, I've only asserted that it works, so today I want to start
explaining how I used Amazon Web Services to test pretty cheaply on gobs of
machines.</p>
<p>The art of AWS wrangling was somewhat new to me, and I went through a few iterations
before deciding on the right mix of tools. Needless to say, the right mix ended up being
fairly Clojure heavy, which helps to smooth out the lack of a true asynchronous interface to AWS.</p>
<h4>Warnings and Confessions</h4>
<ol>
<li>This post isn't really about distributed functional programming at all, so if you know all
about AWS, bounce at will.</li>
<li>Moreover, tag notwithstanding, there won't be any Clojure code in today's chapter, but
the setup will be necessary for Clojure code in the next chapter.</li>
<li>I had to remove a bunch of snarky comments about how the cool kids eschew
Amazon's web interface in favor of the command line, because, the way the post had evolved, I
was no longer a cool kid. Seriously, I did in fact replicate everything using the CLI, but, except
for bragging rights and general geek calisthenics, the exercise was not useful. A big list of the
commands that I used would be both opaque and brittle, of little use to you if something went wrong
or if you wanted to do anything slightly different.</li>
</ol>
<h4>The world of Amazon Web Services</h4>
<p>AWS comprises a vast set of services. The ones I've used generally sharing the following qualities.</p>
<ol>
<li>Reliability</li>
<li>Reasonable price</li>
<li>Enormous feature set</li>
<li>Idiosyncratic API choices, compounded by</li>
<li>Unforgivably scattered documentation.</li>
</ol>
<p>My guess is that if you use their more fully vertical solutions, like Workspaces or even Elastic Beanstalk,
the user experience is a bit more uplifting, but the wrestling with the
general problem of "running stuff on a bunch of
linux boxes" may be the single greatest risk factor for
<a href="https://en.wikipedia.org/wiki/Bruxism">bruxism</a>.</p>
<h4>General approach</h4>
<p>I want to minimize interaction with the AWS management console, and,
to the greatest extent possible, I even want to avoid ssh-ing onto
machines. So the basic strategy, not all of which we'll get through today,
is as follows:</p>
<ol>
<li>Set up all the access security, network isolation, etc. on AWS.</li>
<li>Bring up a linux instance, install a bunch of packages,
do some sysadminy stuff and
set up a mechanism for later making it do other things without logging in.</li>
<li>Bring it down and take a "snapshot," which we'll later be able to use as a template.</li>
<li>Write a bunch of Clojure tools to automate the process of bidding on, bringing up and shutting
down instances, dealing with at least the most commonly encountered non-determinism.</li>
<li>Write a general CLI application in Clojure to make it easy to slap
together specific CLI applications for running particular services.</li>
<li>Create uberjars for these applications and upload them to S3, whence
they can be downloaded cheaply and quickly onto running instances.</li>
<li>Do some light scripting around my tools to instantiate a flotilla of
appropriate instances and run the tests.</li>
<li>Giggle with delight as 100 real world computers in Virginia do nonsense on our behalf.</li>
<li>Quickly shut them all down before it gets expensive.</li>
</ol>
<h4>Really basic set-up</h4>
<p>Once you set up an account, your starting point for further fun will be from
this <a href="https://console.aws.amazon.com/console/home?region=us-east-1">mind-boggling list of services</a>
When I refer to various control panels below, you'll get to them from this last link.</p>
<p>Assuming you just set up your AWS account, you'll want to start by creating an
administrative user account, from their Identity and Access Management, or
IAM panel. It's more or less self-explanatory,
and when creating a user you have an easy option to give it full administrative privileges.
You'll be presented with a set of "credentials" that you are warned not to lose.
I like to keep them in <a href="https://en.wikipedia.org/wiki/EncFS">encfs</a> mounts, with
root directories in Dropbox, but that's up to you. Create two files, with extensions,
respectively <code>sh</code> and <code>clj</code>:</p>
<div class="highlight"><pre><span></span><code><span class="nb">export</span> <span class="nv">AWS_ACCESS_KEY_ID</span><span class="o">=</span><span class="s2">"THISISMYACCESSKEY"</span>
<span class="nb">export</span> <span class="nv">AWS_SECRET_KEY</span><span class="o">=</span><span class="s2">"THISISMYSECRETKEYITSABITLONGER"</span>
<span class="nb">export</span> <span class="nv">AWS_SECRET_ACCESS_KEY</span><span class="o">=</span><span class="si">${</span><span class="nv">AWS_SECRET_KEY</span><span class="si">}</span>
<span class="nb">export</span> <span class="nv">AWS_DEFAULT_REGION</span><span class="o">=</span><span class="s2">"us-east-1"</span>
</code></pre></div>
<p>and</p>
<div class="highlight"><pre><span></span><code><span class="p">{</span><span class="ss">:access-key</span> <span class="s">"THISISMYACCESSKEY"</span>
<span class="ss">:secret-key</span> <span class="s">"THISISMYSECRETKEYITSABITLONGER"</span>
<span class="ss">:endpoint</span> <span class="s">"us-east-1"</span><span class="p">}</span>
</code></pre></div>
<p>(Those are my actual credentials. Go ahead, give them a try.) There are various places
where you can choose a region - there's no great reason for choosing the same one I did.</p>
<h4>Setting up billing alarms</h4>
<p>Seriously. It's possible to run up huge bills without trying very hard. Go to Billing and Cost Management,
in the dropdown under your name. You'll be led by the nose to set up an alarm that goes off when your total
bill for the month exceeds some amount. Set up a few more, at varying levels. I have them ranging from 1 to 100 USD.</p>
<p>The alarms will be described with use of the phrase, "for 6 hours." That does not mean you're monitoring charges that
accrued over the last 6 hours; it means that your charges remained above the alarm level for that long. Obviously that
bit of configuration is more useful for signal value that do not increase monotonically. As far as I can tell, there is no
way to get an alarm based on accrued charges over a rolling window - only total charges for the billing month.</p>
<p>You'll also want to set up detailed cost reporting, which is helpful
when trying to figure out why your bill is so high. From billing
preferences, tick Receive Billing Reports, open the "sample policy"
link and copy its JSON contents. In a different tab, go to the S3
console and create a unique bucket. Click Edit Bucket Policy and
paste in the sample policy, then Save. Back on the billing page, the
<strong>Verify</strong> button should reward you with a green check mark. Turn on all
the reports.</p>
<p>Remember to check these reports reasonably often and to delete the old ones,
so that you don't fill up S3 with reports about how much it's costing you
to keep reports there.</p>
<p>You can also glean a lot from the bill details link. One surprise
will likely be charges in a category called EBS Storage, unless you
follow a particular instruction below.</p>
<h4>Setting up a less privileged user</h4>
<p>Then go ahead and create a second user, and opt for read-only access for it, and store its credentials similarly.
Also make a note of its "user ARN," which will be something like <code>arn:aws:iam::1234567:user/name-you-chose</code>.</p>
<h4>Create another S3 bucket</h4>
<p>We'll be uploading uberjars to S3, and then downloading from S3 to running instances. That will be much faster
and cheaper than <code>scp</code>ing to every instance individually,
and it mens that we don't have to make every instance accessible to the internet. So, select the S3 service
and create a new bucket with a name you like.</p>
<h4>Setting up notification services</h4>
<p>We're going to want to use the Simple Notification Service to allow EC2 instances to alert us to
important information in an application independent fashion. We're also going to use the Simple
Queue Service to receive this information. Setting up permissions properly is a non-obvious two-step
dance, so proceed as follows:</p>
<ol>
<li>From the AWS services list, select SNS and create a new topic, with a name like "instance-status".
Accept default options, and ignore the suggestion to create a subscription, but make a note of the topic ARN.</li>
<li>From the AWS services list, select SQS and create a new queue, with a similar name. Again,
accept default options, and ignore the suggestion to create a subscription, but make a note of the queue ARN
and the queue URL (which are similar).</li>
<li>Go back to SNS and create a subscription for your topic of type SQS, entering the queue ARN.</li>
<li>Go back to SQS, find the permissions tab for your queue and click "add permission"<ol>
<li>Inexplicably, tick the "Everybody box" for principal.</li>
<li>From the drop-down, select Publish</li>
<li>Click "Add Conditions (optional)"</li>
<li>From dropdowns, select Qualifier=None, Condition=StringEquals, Key=aws:SourceArn, and for Value,
enter the SNS topic ARN.
The permissions tab for the queue will look like this:
<img alt="sqs-perms" src="images/girder3-aws-sqs-perm.png"></li>
</ol>
</li>
<li>Go back to the IAM panel and select the non-privileged user.<ol>
<li>Click "Attach User Policy"</li>
<li>Select "Policy Generator"</li>
<li>Set Effect=Allow, Service=SNS, Actions=Publish, and enter the SNS topic ARN.</li>
<li>Examine the policy you just created. It will have a name beginning with <code>policygen-</code> and
look something like this:</li>
</ol>
</li>
</ol>
<div class="highlight"><pre><span></span><code><span class="p">{</span>
<span class="nt">"Version"</span><span class="p">:</span> <span class="s2">"2012-10-17"</span><span class="p">,</span>
<span class="nt">"Statement"</span><span class="p">:</span> <span class="p">[</span>
<span class="p">{</span>
<span class="nt">"Sid"</span><span class="p">:</span> <span class="s2">"Stmt123456789"</span><span class="p">,</span>
<span class="nt">"Effect"</span><span class="p">:</span> <span class="s2">"Allow"</span><span class="p">,</span>
<span class="nt">"Action"</span><span class="p">:</span> <span class="p">[</span>
<span class="s2">"sns:Publish"</span>
<span class="p">],</span>
<span class="nt">"Resource"</span><span class="p">:</span> <span class="p">[</span>
<span class="s2">"arn:aws:sns:us-east-1:123456789:instance-up"</span>
<span class="p">]</span>
<span class="p">}</span>
<span class="p">]</span>
<span class="p">}</span>
</code></pre></div>
<p>So, we've created a topic and a queue, subscribed the queue to the topic, set the queue
to allow anybody to enqueue, as long as its from that topic, then allowed only
one user to publish to that topic.
As far as I can tell, there isn't a simpler way.</p>
<h4>Two keys pairs.</h4>
<p>From EC2, select Key Pairs. First, <strong>Import</strong> the <code>id_rsa.pub</code> of an account from
which you will want to connect externally.</p>
<p>Second <strong>Create Key Pair</strong> and choose a nice name, like "boffo". Keep track of the <code>boffo.pem</code>
file, which will download as you create the pair. This is the key that we will use to communicate
between instances.</p>
<h4>Creating a template instance</h4>
<p>We're going to create a disk image containing linux, with a bunch of
software installed, and some custom configuration. Subsequently, we can bring up as
many instances as we like, all starting from the same image.</p>
<p>From the web console, go to EC2 and click the blue <strong>Launch Instance</strong> button.
They give you a choice of linuxes, but in my experience everything is best maintained on
their custom Amazon version, so I tend to choose that. (Every time you do a web search for some
obscure error and discover that it's been experienced by dozens of AWS Ubuntu users
since they updated their distribution yesterday, you happily become slightly more
sheep-like.)
It's the first option. Continue along
the wizard, choosing your VPC, the public subnet thereof, your security group, the read-only IAM you
set up earlier. Choose shutdown behavior to be "Stop," because you'll want to be able to play
around with this machine later.</p>
<p>Keep clicking the right-most progress button (i.e. not "Review and Launch") until
you get to the security group page, at which point choose the one you just created. Click
Launch, and select the key pair you <em>uploaded</em> (not the one you created).
Then follow the link to View Instances, and watch
the progress spinner.</p>
<p>Eventually a Public IP will show up on the lower pane, at which point you should be able to ssh
to ec2-user at it (assuming you chose the proper key pair).</p>
<p>Before you forget, upload that new <code>.pem</code> file:</p>
<div class="highlight"><pre><span></span><code>scp boffo.pem ec2-user@ec2-user@ec1-54-164-127-5.compute-1.amazonaws.com: <span class="c1"># or whatever</span>
ssh ec2-user@ec2-user@ec1-54-164-127-5.compute-1.amazonaws.com
</code></pre></div>
<p>Now install everything you might like to have around.
This definitely includes Redis, since Girder needs it, but it's also handy
to have the Clojure development stack around (although I won't be making use
of it in these posts).
You might choose something
other than emacs, but I do recommend keeping my other choices.
You'll get <code>OpenJDK</code> by default, which is good enough for me; getting Oracle's
version onto AWS would involve more work.</p>
<div class="highlight"><pre><span></span><code>sudo yum -y update
sudo yum -y install git-core emacs strace
wget https://raw.githubusercontent.com/technomancy/leiningen/stable/bin/lein
chmod +x lein
mkdir bin
./lein
mv lein bin/
wget http://download.redis.io/releases/redis-2.8.14.tar.gz
zcat redis-2.8.14.tar.gz <span class="p">|</span> tar xf <span class="m">0</span>
<span class="nb">cd</span> redis-2.8.14
make
<span class="nb">cd</span> ~/bin
ln -s ~/redis-2.8.12/src/redis-server
ln -s ~/redis-2.8.12/src/redis-cli
</code></pre></div>
<p>Edit <code>.bashrc</code> to put <code>${HOME}/bin</code> in the path.</p>
<p>Download this handy utility to extract metadata (like internal and external IP
addresses, instance id names, etc.) in an almost parseable form:</p>
<div class="highlight"><pre><span></span><code>wget http://s3.amazonaws.com/ec2metadata/ec2-metadata
chmod +x ec2-metadata
mv ec2-metadata ~/bin
</code></pre></div>
<h4>Allowing specification at launch time of user-space commands to be run by an instance</h4>
<p>Now we're going to do some really non-standard things. First, create an
executable script in the home directory called <code>robot.sh</code>:</p>
<div class="highlight"><pre><span></span><code> <span class="c1">#!/bin/sh</span>
<span class="nb">set</span> -uo pipefail
<span class="nb">cd</span> <span class="nv">$HOME</span>
date > <span class="nv">$HOME</span>/ROBODATE
<span class="o">(</span>curl -f http://169.254.169.254/latest/user-data <span class="p">|</span> sh -v<span class="o">)</span> ><span class="p">&</span> <span class="nv">$HOME</span>/ROBOSCRIPT
</code></pre></div>
<p>That magic URL is, cross my heart, the official way to get access to "user data" specified when
an instance is launched. It's inaccessible from outside AWS, and of course it generally
ddisgorges different information on different instances within AWS. We'll talk about populating
this user data more next time, as it's not possible to do so (pre-launch anyway) from the
web interface.</p>
<p>As <code>root</code> create <code>/etc/init.d/userdatarobot</code>, also executable, containing:</p>
<div class="highlight"><pre><span></span><code> <span class="c1">#! /bin/sh</span>
<span class="c1">###</span>
<span class="c1"># chkconfig: 235 98 55</span>
<span class="c1"># description: Manages the services you are controlling with the chkconfig command</span>
<span class="c1">###</span>
. /etc/init.d/functions
<span class="nv">USER</span><span class="o">=</span>ec2-user
<span class="nv">DIR</span><span class="o">=</span>/home/<span class="si">${</span><span class="nv">USER</span><span class="si">}</span>
<span class="nv">CMD</span><span class="o">=</span><span class="si">${</span><span class="nv">DIR</span><span class="si">}</span>/robot.sh
<span class="nv">PIDFILE</span><span class="o">=</span><span class="si">${</span><span class="nv">DIR</span><span class="si">}</span>/ROBOPID
<span class="k">case</span> <span class="s2">"</span><span class="nv">$1</span><span class="s2">"</span> <span class="k">in</span>
start<span class="o">)</span>
<span class="nb">echo</span> -n <span class="s2">"Starting </span><span class="si">${</span><span class="nv">CMD</span><span class="si">}</span><span class="s2"> as </span><span class="si">${</span><span class="nv">USER</span><span class="si">}</span><span class="s2">"</span>
daemon --user<span class="o">=</span><span class="si">${</span><span class="nv">USER</span><span class="si">}</span> --pidfile<span class="o">=</span><span class="si">${</span><span class="nv">PIDFILE</span><span class="si">}</span> <span class="si">${</span><span class="nv">CMD</span><span class="si">}</span> <span class="p">&</span>>/dev/null <span class="p">&</span>
<span class="nb">echo</span> <span class="s2">"."</span>
<span class="p">;;</span>
stop<span class="o">)</span>
<span class="nb">echo</span> -n <span class="s2">"Stopping </span><span class="si">${</span><span class="nv">CMD</span><span class="si">}</span><span class="s2">"</span>
killproc -p <span class="si">${</span><span class="nv">PIDFILE</span><span class="si">}</span> <span class="si">${</span><span class="nv">CMD</span><span class="si">}</span>
<span class="nb">echo</span> <span class="s2">"."</span>
<span class="p">;;</span>
*<span class="o">)</span>
<span class="nb">echo</span> <span class="s2">"Usage: /sbin/service userdatarobot {start|stop}"</span>
<span class="nb">exit</span> <span class="m">1</span>
<span class="k">esac</span>
<span class="nb">exit</span> <span class="m">0</span>
</code></pre></div>
<p>and <code>/sbin/chkconfig --add userdatarobot</code>. The <code>robot.sh</code> script will now run, as <code>ec2-user</code> when
the machine boots.</p>
<h4>Testing out the instance</h4>
<p>On the EC2 machine, run <code>aws configure</code>, entering the credentials and region for the non-privileged user you created,
not those of the privileged user.</p>
<p>On your home machine,</p>
<div class="highlight"><pre><span></span><code>pip install awscli
</code></pre></div>
<p>Then run <code>aws configure</code>. This will, among other things, ask for your credentials and then put them
in the directory <code>${HOME}/.aws</code>, which it creates. I'd recommend moving this directory into the <code>``encfs</code>'d
mount, since otherwise it's yet another place for your plain-text credentials to be found.</p>
<p>If all works correctly, you should be able to this on the EC2 instance:</p>
<div class="highlight"><pre><span></span><code>aws --region us-east-1 sns publish --topic-arn arn:aws:sns:us-east-1:yourtopic --message yowsa
</code></pre></div>
<p>And this back home:</p>
<div class="highlight"><pre><span></span><code>aws sqs receive-message --queue-url https://sqs.us-east-1.amazonaws.com/your-queue-url --wait-time-seconds <span class="m">20</span>
</code></pre></div>
<p>You'll get back a majestic blob of JSON, in which "yowsa" is discreetly buried somewhere.</p>
<p>As noted, we'll check out the <code>roboscript</code> business next time.</p>
<p>Now log out, and, under Actions, stop (DON'T TERMINATE!) the instance. When it's stopped, again
under Actions, enable Termination Protection, so you will be harder to terminate by mistake:</p>
<p><img alt="termprot" src="images/girder3-aws-term-prot.png"></p>
<p>Next, under Actions, Create Image, and accept all the default options
give you another magic keyword <code>ami-something</code> to remember.</p>
<h4>Creating a Virtual Private Cloud</h4>
<p>We're going to want a whole lot of machines that can chat without too much fuss, but
we don't want them to be easily accessible from the outside world. Typically, we'll
launch one or two machines to which we want to be able to <code>ssh</code>, and then a whole bunch more that
can only be reached from within the VPC.</p>
<p>The whole thing will look like this:</p>
<p><img alt="vpc" src="http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/images/nat-instance-diagram.png"></p>
<p>There's one VPC, with two subnets: one public, and accessible from without via a gateway; one
private, with access to the outside world via NAT. (Annoyingly, <code>aws</code> commands require
access to the outside world, so you really do need the NAT. The magic user-data
URL, which is not in the private subnet address space, shows that this shouldn't have to be so, but it is.)</p>
<p>From the <a href="https://console.aws.amazon.com/vpc/home?region=us-east-1">VPC Dashboard</a>, click the
embarrassing blue <strong>Start VPC Wizard</strong> button. Choose the second configuration, with two subnets.</p>
<p>Keep the IP CIDR block of <code>10.0.0.0/16</code> and <code>/24</code> for the subnets. That gives you
251 possible IP addresses in each subnet, of the 65k you have available altogether, so you'll
be able to create more subnets in the future if you like.</p>
<p>For both subnets, change the availability zone from no-preference to a specific one. (<code>us-east-1a</code>
is usually cheapest, for some reason.)
Later, we might create new subnets for different availability zones,
but we don't want them to choose for us at this point.</p>
<p>Keep all other defaults and press "Create VPC". You'll
notice that it creates and then launches an instance to run the NAT. Head to the EC2 dashboard
to STOP it now, so we don't have to pay for it to be up while continuing our configuration. Note:
don't TERMINATE, as that will make it disappear forever. In fact, set up termination protection for it too.</p>
<p>Now we need to make some changes. Go to "Your VPCs", and start by giving your new VPC a name.
Then go to "Security Groups". Find the one that's associated with the VPC you just created and
give it a name. Click "Inbound Rules" tab for this security group, and then Edit. Add a new rule,
for port 22, with source <code>0.0.0.0/0</code>, i.e. anywhere. If you don't do this, you'll never be
able to get to instances even in your publicly accessible subnet. Leave Outbound Rules alone.
The inbound rule should show that all traffic is allowed from within the security group, but only
SSH is allowed from everywhere else:</p>
<p><img alt="sgrules" src="images/girder3-aws-sg-rules.png"></p>
<p>Go to Subnets. You'll see subnets named Public and Private (unless you overrode them in the
Wizard), both associated with the new VPC. For each successively, click Modify Auto-Assign Public IP, and
make sure it's turned <strong>on</strong> for the public subnet</p>
<p><img alt="assignip" src="images/girder3-aws-assign-ip.png"> </p>
<p>and <strong>off</strong> for the private subnet. It will be wrong for at
least one of them.</p>
<h4>Testing out the VPC</h4>
<p>Go to the EC2 console and the AMIs page. Select the AMI you saved earlier and
Launch it. Then</p>
<ol>
<li>Select a <code>t1.micro</code> instance, and then "Next: Configuration Details" (not "Review and Launch").</li>
<li>Tick "Request Spot Instances" and enter something like 0.01 as maximum price per hour.
You won't actually pay this, but if the spot price should float above it, your request will be denied.</li>
<li>Under the Network dropdown, select your new VPC, then under Subnet, the <strong>public</strong> one.</li>
<li>For IAM rule, select the unprivileged users.</li>
<li>Click "Add Storage" (but don't add any), then "Tag Instance" (entering a name if you want), then "Configure Security Group".</li>
<li>Choose "Select an existing security group". You'll should only see one group listed, but if there for some
reason are more, choose the one associated with your VPC. Then "Review and Launch".</li>
<li>You'll see a warning about your instance being open to the world. It isn't really, only port 22, so ignore it.
Finally launch. You'll be given a list of key pairs; select the home key you <strong>uploaded</strong>.</li>
</ol>
<p>You'll be directed to the Spot Request screen, where you'll see your request as "open", "pending fulfillment".</p>
<p>Now repeat the whole process for a second instance,
except choose the <strong>private</strong> subnet and the <code>boffo</code> key you created.
The warning will be even more irrelevant, because this
instance won't even have a public IP address.</p>
<p>Finally, on the EC2 Instances page, Actions->Start up the NAT. (Not the Launch button, that starts from scratch.)</p>
<p>Now keep clicking back and forth between Instances and Spot Requests, until you see that all three instances (NAT, public,
private) are running. There's a small chance that the spot requests will fail with an obscure error about insufficient
capacity, even though the spot price never exceeded your maximum. In that case try again.</p>
<p>At this point, you should be able to ssh to the public instance, at the address shown on the Instances page.</p>
<p>From the public instance, issue the <code>aws sns</code> command you tried earlier, and then from home, the <code>aws receive-message</code>
command. Hopefully that works.</p>
<p>Now, on the public instance, verify that you can <code>ssh -i boffo.pem</code> to the private instance.</p>
<p>From the private instance, issue the <code>aws sns</code> command you tried earlier, and then from home, the <code>aws receive-message</code>
command. This time, the AWS command is being transmitted to Amazon via the NAT.</p>
<p>You might also look at the contents of ROBODATE and ROBOSCRIPT. The latter should show an error from <code>curl</code>, because
there wasn't any user data specified.</p>
<p>You might also verify that it is impossible to connect to the public instance from outside the VPC via any mechanism other
than ssh. For example, try telnet, or bring up the redis-server and try to redis-cli to it.</p>
<h4>Shutting everything down</h4>
<p>AWS is very good at accruing charges, so you want to be especially sure you shut everything down properly. We'll
be automating some of this in Clojure later, but for now, just use the web page:</p>
<ol>
<li>From the Instances page, TERMINATE the instances created from spot requests.</li>
<li>STOP the NAT instance. Sometimes, this will give you a not-useful error about how this
is an instance and therefore cannot be stopped. That seems to be related to the spot instances not being fully
shut down, so try again every few seconds, pressing the mouse button harder each time.</li>
<li>From the Spot Requests page, CANCEL all open requests. Sometimes, it takes a few seconds and a refresh after cancellation,
before the state changes to "canceled". To teach Amazon a lesson, I recommend canceling over and over again during this
period.</li>
<li>From the Volumes page, filter for Detached Volumes, and delete them. This is super-duper, ultra important, because
volumes accrue like dust bunnies, and can quickly come to dominate your monthly bill.</li>
</ol>
<h4>The Clojure Next Time</h4>
<p>Next time, I'll introduce some basic tools in Clojure for managing the start-up and shutdown process, and actually get to
the point of testing Girder.</p>Comedy - A brief syllabus in honor of Joan Rivers2014-09-06T00:00:00-04:002014-09-06T00:00:00-04:00Peter Fraenkeltag:blog.podsnap.com,2014-09-06:/joan.html<p>I have a confession to make. I don't find comedy funny. There are exceptions - Gilbert
Gottfried and Louis CK can occasionally reduce me to putty - but, by and large, I appreciate
comedy from a technical rather than aesthetic standpoint, combined with an admiration for
and empathy with the drive that compels people to perfect their craft.</p>
<p>I have similar feelings about magicians. Having spent a significant portion of my youth
reading about and reverse-engineering illusions, before concluding that certain qualities
necessary for success in the field are innate rather than learned, and I would always
lack them. For example, I …</p><p>I have a confession to make. I don't find comedy funny. There are exceptions - Gilbert
Gottfried and Louis CK can occasionally reduce me to putty - but, by and large, I appreciate
comedy from a technical rather than aesthetic standpoint, combined with an admiration for
and empathy with the drive that compels people to perfect their craft.</p>
<p>I have similar feelings about magicians. Having spent a significant portion of my youth
reading about and reverse-engineering illusions, before concluding that certain qualities
necessary for success in the field are innate rather than learned, and I would always
lack them. For example, I am pathologically bad at misdirection,
a failing which I like to spin
into an ode to my defining commitment to honesty, but in truth is just an example of something
I do badly, despite having tried quite a bit. As a result, my only aim in watching magic
shows is to convince myself that I understand what happened at a mechanical level.
It's mildly satisfying, but not sufficiently so that I would ever seek out such performances.</p>
<p>In the same vein, while I never made a serious attempt at stand-up, I am deeply familiar with
the fundamentally manipulative nature of comedy, having consciously deployed it throughout
life to compensate for the absence of certain other social skills. I understand the process
of iterative refinement, the accrual of mental notes on the difference a pause or a
rewording makes, or the resigned conclusion that a particularly promising "bit" just doesn't
work in practice. I do this consciously. I'm just not as good as the professionals.</p>
<p>Many years ago, I read a profile of Joan Rivers. It contained quotations, none of which I
found any funnier than what I'd heard her say on television over the years, but I was deeply
struck by the description of what she did after each show. She kept a notebook, in which
she would methodically examine every aspect of her act, noting what worked better or
worse, recording any innovations that emerged spontaneously. Comedy as craft was a
revelation for me.</p>
<p>The frequent, perhaps unavoidable, offensiveness of comedy presents a conundrum, with which
I deal semi-successfully. Fortunately, the most offensive performers (say, Andrew Dice
Clay) are usually technically unimpressive - they successfully bond with an audience over
shared prejudices, but nobody without those prejudices would ever deem the jokes funny.
It's not comedy, but tribalism. Other situations (some illustrated below)
are more difficult. I like to think that the best comedy
exploits the existence of prejudices for comic effect without endorsing or enforcing them, but
in truth I don't know that. It may in fact be doing harm.</p>
<p>Herewith, a brief syllabus on comedy as craft.</p>
<ol>
<li>
<p>First, and most topically right now, is
<a href="http://www.imdb.com/title/tt1568150/">Joan Rivers: A Piece of Work</a>. The title
gives away the message: she is a consummate professional with a fanatical drive for
perfection. If you ignore everything else on this list, please watch this film.
It's on Netflix.</p>
</li>
<li>
<p><a href="http://www.imdb.com/title/tt0328962">Comedian</a> documents Jerry Seinfeld's "road
trip" (actually he traveled by private jet) after the end of his TV series. He
has committed to rebuilding his act from scratch, discarding all previous material,
and the only way to do this is to experiment on a huge variety of audiences.
The most striking moment for me is when he tells an anecdote to a much younger
(and, frankly, not all that promising) comedian, about Glenn Miller's airplane
making an emergency landing in field, forcing the musicians to hike the rest of
the way to their next gig. At one point, they pass a farmhouse, through the
big front window of which is seen a happy, extended family eating and laughing
at a large dining table. One of the sax players turns to the guy next to him
and asks, "How can people live that way?" The story is an insiders' joke,
highlighting the difference between performers and "ordinary people."</p>
<p>The best part, however, is Jerry muttering to himself after
delivering the apocryphal story that it would have worked better
with a bus instead of the airplane. Here is the richest comedian on the
planet obsessing over his delivery in a casual conversation with someone
he'll never see again.</p>
</li>
<li>
<p><a href="http://www.imdb.com/title/tt0085794">King of Comedy</a> is also important, if only
to hear the "Rupert Pupkin" many times over 109 minutes and to remind us that
Robert De Niro used to be an amazing actor. This is not a documentary, but it's
a relatively early depiction of comedy as something more complex than
funniness.</p>
</li>
<li>
<p><a href="http://www.amazon.com/Killed-True-Stories-Americas-Comics/dp/030738229X">I Killed</a>
is a not particularly well edited compilation of anecdotes from road comedians of
varying levels of fame and talent.
As you read, you realize how apt the book's title is.
Many of the stories are not pleasant at all, displaying the naked aggression behind
this fundamentally manipulative craft. Misogyny is on prominent display and
often horrifying.</p>
</li>
<li>
<p><a href="http://www.imdb.com/title/tt1678670/">Louie, episode 2, season 1</a>, like many episodes of the
show, has an
<a href="https://www.youtube.com/watch?v=v-55wC5dEnc">extended scene</a>
that is just a conversation among a group of professional
comedians after night's work is done. The conversation
<a href="http://www.thebacklot.com/louis-cks-extraordinary-ten-minutes-of-gay-tv/06/2010/">becomes serious</a>, as Rick Crom, the one comedian among them who is openly gay,
is asked to comment on a
variety of homophobic riffs. There is quite a bit of nuance in the
scene, which doesn't end up explicitly denouncing such jokes, and leaves the
viewer to guess at what's going on in each comedian's head, especially Rick's.</p>
</li>
<li>
<p><a href="http://www.imdb.com/title/tt1945794/Louie">Louie, episode 9, season 2</a> reunites
Louie with an old friend, Eddie, who has, it turns out concluded that suicide would be
preferable to the ongoing arduousness of life as a semi-successful road comic.
At one point, Eddie has himself inserted into the roster of a comedy night
(he is well known enough that the request is immediately granted) under the
name Shitty Fat Tits, and proceeds to demolish the audience - which essentially means
causing them to keep laughing, despite growing and profound discomfort.
This is comedy as expert violence. Is the combination of craft, cruelty and
self-loathing inextricably linked? We're left to piece that out on our own.</p>
</li>
<li>
<p><a href="https://www.youtube.com/watch?v=OkaJ4U4EgyU">The WTF Podcast, with Ben Stiller</a>
answers the question of what this definitively benign, audience-friendly
clown shares with the archetypal stand-up comedian. The answer is a lot.
You'd never imagine that Marc Maron and Ben Stiller would have anything to talk about,
but they're both practitioners of the same craft, differing only in details of
implementation.</p>
</li>
<li>
<p>In <a href="https://www.youtube.com/watch?v=Kh4zqBqF0KU">The WTF Podcast, with Judd Apatow</a>
we learn the incredibly successful film producer started out as a stand-up
comedian and approaches film production with the same level of methodical,
hyper-obsessive detail that characterizes all professional comics.</p>
</li>
<li>
<p><a href="http://www.amazon.com/Seriously-Funny-Rebel-Comedians-1950s-ebook/dp/B002MHOD34/">Seriously Funny - Rebel Comedians of the 1950s</a> just landed on my Kindle, so this recommendation is
only second-hand. I'll let you know...</p>
</li>
</ol>Distributed purely functional programming, part 22014-08-25T00:00:00-04:002014-08-25T00:00:00-04:00Peter Fraenkeltag:blog.podsnap.com,2014-08-25:/girder2.html<h3>Update 2015-01-12</h3>
<p>The algorithm as it exists in HEAD is somewhat different from the below,
in ways that I'll describe (eventually) in an another post. In some ways, it's
closer to Fork-Join, but with important differences to support reentrancy,
share results of duplicate requests and adjust for the costs of distribution.</p>
<h3>Recap of recap</h3>
<p>In a <a href="http://podsnap.com/girder.html">previous post</a>, I introduced a
framework called Girder (the code is on
<a href="https://github.com/pnf/girder">github</a>), which aims to facilitate
<strong>Plain Old Functional Programming</strong> on distributed systems. By POFP,
I mean code that, as much as possible, consists of normal looking
calls to normal looking functions, the …</p><h3>Update 2015-01-12</h3>
<p>The algorithm as it exists in HEAD is somewhat different from the below,
in ways that I'll describe (eventually) in an another post. In some ways, it's
closer to Fork-Join, but with important differences to support reentrancy,
share results of duplicate requests and adjust for the costs of distribution.</p>
<h3>Recap of recap</h3>
<p>In a <a href="http://podsnap.com/girder.html">previous post</a>, I introduced a
framework called Girder (the code is on
<a href="https://github.com/pnf/girder">github</a>), which aims to facilitate
<strong>Plain Old Functional Programming</strong> on distributed systems. By POFP,
I mean code that, as much as possible, consists of normal looking
calls to normal looking functions, the only requirement for which is
referential transparency: there are no side effects, and calls to the
same function with the same arguments should always return the same
value.</p>
<p>Functional programming, in my view, is dramatically less functional
when you use a framework that requires you to cast your algorithm in
terms of message passing, whether in the form of key-value pairs for
map/reduce or explicitly as mailbox deliveries in the actor model.
What makes me particularly uncomfortable is an emerging conventional wisdom
that one should write in these styles even when the program will
run comfortably on one machine, because <em>someday you will need to scale out</em>,
and you might as well get the pain out of the way now.</p>
<p>To over-simplify more than a bit,
I want to distinguish between algorithms that are <em>innately about</em> large scale
data aggregation (in the case of map-reduce) or dynamic response to signals
(in the case of the actor model), and those that are most elegantly expressed in terms
of functions, but might benefit from running on more than one core. Put in reverse,
if the problem is innately about data aggregation or dynamic response to signals, then
maybe you should use one of these message-based models, <em>even if you know in
advance that scale will never be important</em>.</p>
<p>In this post, I'll explain the core machinery behind Girder, showing how
requests get dispatched, distributed and executed.</p>
<h3>Recap of examples</h3>
<p>I don't want to repeat too much, but here are a couple of the examples from earlier. The first illustrates
a sort of calculation that would obviously benefit from distributed, parallel execution.</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="nf">cdefn</span> <span class="nv">calc-price</span> <span class="p">[</span><span class="nv">details</span><span class="p">]</span> <span class="p">(</span><span class="nf">calculate-something-expensive</span> <span class="nv">deets</span><span class="p">))</span>
<span class="p">(</span><span class="nf">cdefn</span> <span class="nv">calc-portfolio</span> <span class="p">[</span><span class="nv">spec</span><span class="p">]</span>
<span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">reqs</span> <span class="p">[</span> <span class="p">[</span><span class="nv">f</span> <span class="p">(</span><span class="nf">munge</span> <span class="nv">spec</span><span class="p">)]</span> <span class="p">[</span><span class="nv">f</span> <span class="p">(</span><span class="nf">diddle</span> <span class="nv">spec</span><span class="p">)]]]</span>
<span class="p">(</span><span class="nb">reduce + </span><span class="p">(</span><span class="nf">requests</span> <span class="nv">reqs</span><span class="p">))))</span>
<span class="p">(</span><span class="nf">request</span> <span class="s">"pool"</span> <span class="p">(</span><span class="nf">calc-portfolio</span> <span class="s">"xyz"</span><span class="p">))</span>
</code></pre></div>
<p>Of note here is that the calculation requests look almost exactly like function calls.
(With a little more macro work, I think I can get rid of the "almost.")</p>
<p>The second example illustrates code that might generate an error that
we ought to be able to report on, irrespective of where on the grid
the code executes:</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="nf">cdefn</span> <span class="nv">divide</span> <span class="p">[</span><span class="nv">x</span> <span class="nv">y</span><span class="p">]</span> <span class="p">(</span><span class="nb">float </span><span class="p">(</span><span class="nb">/ </span><span class="nv">x</span> <span class="nv">y</span><span class="p">)))</span>
<span class="p">(</span><span class="nf">cdefn</span> <span class="nv">ratio</span> <span class="p">[</span><span class="nv">i</span><span class="p">]</span> <span class="p">(</span><span class="nf">request</span> <span class="p">(</span><span class="nf">divide</span> <span class="nv">i</span> <span class="p">(</span><span class="nb">dec </span><span class="nv">i</span><span class="p">))))</span>
</code></pre></div>
<p>And, as shown last time, an the exception thrown when requesting
<code>(ratio 1)</code> contains meaningful information where on the
"distributed stack" the error occurred.</p>
<h3>Serialized requests</h3>
<p>Requests, remember, are finite sequences that start with a function
and all of whose successive members are EDN serializable. The whole
request gets turned into a string with</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="kd">defn </span><span class="nv">->reqid</span> <span class="p">[</span> <span class="p">[</span><span class="nv">f</span> <span class="o">&</span> <span class="nv">args</span><span class="p">]</span> <span class="p">]</span>
<span class="p">(</span><span class="nb">pr-str </span><span class="p">(</span><span class="nb">concat </span><span class="p">[(</span><span class="nf">fname</span> <span class="nv">f</span><span class="p">)]</span> <span class="nv">args</span><span class="p">)))</span>
</code></pre></div>
<p>where <code>fname</code> is so ghastly I sort of wince pasting it here, but
it's inspired by an only slightly less gruesome
<a href="https://groups.google.com/forum/#!topic/clojure/ORRhWgYd2Dk">post</a> on
the Clojure mailing list, so it must be OK, right?</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="k">def </span><span class="nv">STR->OPT</span>
<span class="p">(</span><span class="nb">apply hash-map </span><span class="p">(</span><span class="nb">mapcat </span><span class="o">#</span><span class="p">(</span><span class="nb">vector </span><span class="p">(</span><span class="nb">second </span><span class="nv">%</span><span class="p">)</span> <span class="p">(</span><span class="nb">str </span><span class="p">(</span><span class="nb">first </span><span class="nv">%</span><span class="p">)))</span> <span class="nv">clojure.lang.Compiler/CHAR_MAP</span><span class="p">)))</span>
<span class="p">(</span><span class="k">def </span><span class="nv">OPTPAT</span>
<span class="p">(</span><span class="nb">re-pattern </span><span class="p">(</span><span class="nb">str </span><span class="s">"\\b"</span> <span class="p">(</span><span class="nf">clojure.string/join</span> <span class="s">"|"</span> <span class="p">(</span><span class="nb">vals </span><span class="nv">clojure.lang.Compiler/CHAR_MAP</span><span class="p">))</span> <span class="s">"\\b"</span> <span class="p">)))</span>
<span class="p">(</span><span class="kd">defn </span><span class="nv">fname</span>
<span class="s">"Extract the qualified name of a clojure function as a string."</span>
<span class="p">[</span><span class="nv">f</span><span class="p">]</span>
<span class="p">(</span><span class="nb">-> </span><span class="p">(</span><span class="nb">str </span><span class="nv">f</span><span class="p">)</span>
<span class="p">(</span><span class="nf">clojure.string/replace</span> <span class="nv">OPTPAT</span> <span class="nv">STR->OPT</span><span class="p">)</span>
<span class="p">(</span><span class="nf">clojure.string/replace-first</span> <span class="s">"$"</span> <span class="s">"/"</span><span class="p">)</span>
<span class="p">(</span><span class="nf">clojure.string/replace</span> <span class="o">#</span><span class="s">"@\w+$"</span> <span class="s">""</span><span class="p">)</span>
<span class="p">(</span><span class="nf">clojure.string/replace</span> <span class="o">#</span><span class="s">"_"</span> <span class="s">"-"</span><span class="p">)))</span>
</code></pre></div>
<p>Al this regexing is
responsible for turning a string like "#<core$_PLUS_ clojure.core$_PLUS_@4955aabe>" into
a "clojure.core/+", which can be <code>resolve</code>d in the reverse operation, </p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">reqid->req</span> <span class="p">[</span><span class="nv">reqid</span><span class="p">]</span>
<span class="p">(</span><span class="k">let </span><span class="p">[[</span><span class="nv">f</span> <span class="o">&</span> <span class="nv">args</span><span class="p">]</span> <span class="p">(</span><span class="nf">read-string</span> <span class="nv">reqid</span><span class="p">)</span>
<span class="nv">f</span> <span class="p">(</span><span class="nb">resolve </span><span class="p">(</span><span class="nb">symbol </span><span class="nv">f</span><span class="p">))</span>
<span class="nv">f</span> <span class="p">(</span><span class="k">if </span><span class="p">(</span><span class="ss">:girded</span> <span class="p">(</span><span class="nb">meta </span><span class="nv">f</span><span class="p">))</span> <span class="nv">f</span> <span class="p">(</span><span class="nf">cfn</span> <span class="nv">f</span><span class="p">))]</span>
<span class="p">(</span><span class="nb">apply vector </span><span class="nv">f</span> <span class="nv">args</span><span class="p">)))</span>
</code></pre></div>
<p>As you see, Clojure complicates the task by maintaining its own
mangling scheme for non-alphanumeric operators, and I'm violating some sort of ethical
code by extracting it from <code>src/jvm/clojure/lang/Compiler.java</code>.</p>
<p>Remember from the last post that the macro<code>cdefn</code> for defining functions
that return results via an <code>async</code> channel, wrapped up in <code>{:value ...}</code> or <code>{:error ...}</code>.
That macro also set metadata to indicate
that the function was thus <code>:girded</code>. If, as in the case of <code>+</code>, it's not girded,
the <a href="https://github.com/pnf/girder/blob/master/src/acyclic/girder/grid.clj#L41">easily imaginable</a>
<code>cfn</code> function takes care of doing so. This way, we can distribute
requests using boring old functions as well as fancy functions like <code>calc-portfolio</code> that themselves
also make request.</p>
<p>Once again, it should be emphasized that if
<code>your.namespace.here/gorgonzola</code> cannot be resolved, or resolves to
something different in different running instances, then terrible,
terrible things will happen.</p>
<h3>Redis and core.async</h3>
<p>The central state keeper behind all the remote calculations is
<a href="http://redis.io">Redis</a>. In theory, it could be something else, but
the only implementation of <code>Girder-Backend</code> I've so far written is
for <code>Redis-Backend</code>.</p>
<p>Redis is an in-memory database with many advantages. For our
purposes, the greatest two are</p>
<ol>
<li>extraordinary speed (generally hundreds
of thousands of operations, loosely defined, per second)</li>
<li>certain assurances of atomicity.</li>
</ol>
<p>Such assurances are contingent on proper
functioning of the Redis process and the platform on which it runs,
and, while Redis does generates periodic snapshots, from which it can
be recovered, there are many plausible failure scenarios where you'll
lose all your Redis data forever.</p>
<p>But, it is reasonable to ask, who cares?
Obviously there are cases you might, but there are also many cases
where you shouldn't. The primary assumption that we'll be writing
purely functional code implies that the very worst consequence of data
loss is having to run our program again. Again, while that might not always
be acceptable, it very often is.</p>
<p>I'll discuss various implementations of methods in the <code>Redis-Backend</code> implementation
of the <code>Girder-Backened</code> protocol
below. Know for now that
<code>(defrecord Redis-Backend [redis kvl]...)</code>
holds some information about the connection to Redis and
a reference to what I'll call a key-value listener that's used for publishing
things internally.</p>
<p>In a post scheduled for a few months from now, I'll discuss
integrating Girder with Datomic, with further considerations about
data locality, but, for now, let's just assume that input data is
provided "somehow."</p>
<h3>Queues</h3>
<p>To ease working with Redis, and to provide an abstraction layer that
will someday allow different back-ends, we make heavy use of
<code>core.async</code> internally. (The user of Girder won't see it.)
Here's an implementation of a typical <code>Girder-Backend</code> method in
<code>Redis-Backend</code>. This creates a channel, reading from which will
actually be right-popping a Redis queue:</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="nf">crpop</span> <span class="p">[</span><span class="nv">this</span> <span class="nv">qkey</span> <span class="nv">queue-type</span><span class="p">]</span>
<span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">qkey</span> <span class="p">(</span><span class="nf">queue-key</span> <span class="nb">key </span><span class="nv">queue-type</span><span class="p">)</span>
<span class="nv">bkey</span> <span class="p">(</span><span class="nf">queue-bak-key</span> <span class="nb">key </span><span class="nv">queue-type</span><span class="p">)</span>
<span class="nv">out</span> <span class="p">(</span><span class="nf">lchan</span> <span class="p">(</span><span class="nb">str </span><span class="s">"crpop-"</span> <span class="nv">key</span><span class="p">))]</span>
<span class="p">(</span><span class="nf">async/go-loop</span> <span class="p">[]</span>
<span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nb">val </span><span class="p">(</span><span class="nf">wcar</span> <span class="p">(</span><span class="ss">:redis</span> <span class="nv">this</span><span class="p">)</span> <span class="p">(</span><span class="nf">car/brpoplpush</span> <span class="nv">qkey</span> <span class="nv">bkey</span> <span class="mi">60</span><span class="p">))]</span>
<span class="p">(</span><span class="nf">trace</span> <span class="s">"crpop"</span> <span class="nb">key </span><span class="s">"got"</span> <span class="nb">val </span><span class="s">"from redis list"</span> <span class="nv">qkey</span><span class="p">)</span>
<span class="p">(</span><span class="k">if </span><span class="p">(</span><span class="nf">still-open?</span> <span class="nv">out</span><span class="p">)</span>
<span class="p">(</span><span class="k">do </span><span class="p">(</span><span class="nb">when </span><span class="nv">val</span><span class="p">(</span><span class="nf">>!</span> <span class="nv">out</span> <span class="nv">val</span><span class="p">))</span>
<span class="p">(</span><span class="nf">recur</span><span class="p">))</span>
<span class="p">(</span><span class="nf">debug</span> <span class="s">"crpop"</span> <span class="nb">key </span><span class="nv">queue-type</span> <span class="s">"shutting down"</span><span class="p">))))</span>
<span class="nv">out</span><span class="p">))</span>
</code></pre></div>
<p>In a design decision that I know some people object to, the <code>out</code> channel does
double duty as a control channel. (You'll notice that <code>still-open?</code> isn't part
of the officially exposed <code>core.async</code> interface.)</p>
<p>Taking <code>crpop</code> from the start:</p>
<p>The private functions <code>xxx-key</code> just create strings that will be
used to name Redis lists. E.g. one that came up in last post's
examples ended up getting called "requests-queue-pool". Longer strings
might get md5'd into something briefer, to save wire wear, but that's
an implementation detail.</p>
<p>The reason for defining both <code>qkey</code> and <code>bkey</code> is that we'll
be using the Redis <code>RPOPPLPUSH</code> command to atomically stash what
we pop off of one queue in a backup queue, which can be used for
recovery purpose if the the process running <code>crpop</code> dies after
popping from Redis but before whatever it was popped was read from its
channel. This is the
<a href="http://redis.io/commands/rpoplpush">recommended</a>
implementation of reliable queues in Redis. For different back-ends,
the second key might be ignored.</p>
<p>The <code>out</code> channel is defined with <code>lchan</code>, which was written in desperation
to name <code>async</code> channels and keep track of what's going in and out of them:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="k">def </span><span class="nv">lchans</span> <span class="p">(</span><span class="nf">atom</span> <span class="p">(</span><span class="nf">cache/weak-cache-factory</span> <span class="p">{})))</span>
<span class="p">(</span><span class="kd">defn </span><span class="nv">lchan</span> <span class="p">[</span><span class="nb">name </span><span class="o">&</span> <span class="p">[</span><span class="nv">buf</span><span class="p">]]</span>
<span class="p">(</span><span class="nb">if-not </span><span class="p">(</span><span class="nf">timbre/level-sufficient?</span> <span class="ss">:trace</span> <span class="nv">nil</span><span class="p">)</span> <span class="p">(</span><span class="nf">chan</span> <span class="nv">buf</span><span class="p">)</span> <span class="c1">;; short-circuit if not at :trace level</span>
<span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nb">name </span><span class="p">(</span><span class="k">if </span><span class="p">(</span><span class="nf">fn?</span> <span class="nv">name</span><span class="p">)</span> <span class="p">(</span><span class="nf">name</span><span class="p">)</span> <span class="nv">name</span><span class="p">)</span>
<span class="nv">c</span> <span class="p">(</span><span class="nf">async/map></span> <span class="o">#</span><span class="p">(</span><span class="k">do </span><span class="p">(</span><span class="nf">trace</span> <span class="s">"Channel"</span> <span class="nb">name </span><span class="s">"receiving"</span> <span class="nv">%</span><span class="p">)</span> <span class="nv">%</span><span class="p">)</span>
<span class="p">(</span><span class="nf">async/map<</span> <span class="o">#</span><span class="p">(</span><span class="k">do </span><span class="p">(</span><span class="nf">trace</span> <span class="s">"Channel"</span> <span class="nb">name </span><span class="s">"delivering"</span> <span class="nv">%</span><span class="p">)</span> <span class="nv">%</span><span class="p">)</span> <span class="p">(</span><span class="nf">chan</span> <span class="nv">buf</span><span class="p">)))]</span>
<span class="p">(</span><span class="nf">trace</span> <span class="s">"Channel"</span> <span class="nb">name </span><span class="s">"created:"</span> <span class="nv">c</span><span class="p">)</span>
<span class="p">(</span><span class="nf">swap!</span> <span class="nv">lchans</span> <span class="o">#</span><span class="p">(</span><span class="nb">assoc </span><span class="nv">%</span> <span class="nv">c</span> <span class="nv">name</span><span class="p">))</span>
<span class="nv">c</span><span class="p">)))</span>
</code></pre></div>
<p>At <code>:trace</code> level, all writes to and reads from the channel will be logged.</p>
<p>I found this extraordinarily useful during development, since I also find it
extraordinarily easy to get confused in the
<a href="https://en.wikipedia.org/wiki/Colossal_Cave_Adventure">maze of twisty</a>
<code>core.async</code> channels it's so easy to create. Obviously, one motivation
for Girder is to allow people <em>not</em> to deal with channels sometimes.</p>
<p>The actual logging comes from the marvelous
<a href="https://github.com/ptaoussanis/timbre">timbre</a> library, and the
<code>trace</code> messages are generated by sandwiching the <code>chan</code>
between <code>async/map<</code> and <code>async/map></code>, which intercept the
content on the way in and out, but otherwise ignore it.</p>
<p>Note that <code>lchans</code>, where named channels get stashed for
future examination is an atom containing a
[weak](<a href="https://weblogs.java.net/blog/enicholas/archive/2006/05/understanding_w.html">weak references</a>
hash map. That means channels entries will tend to disappear quickly after
all other references to the contained channels are gone.</p>
<p>Aside:
Unfortunately, <code>core.cache/weak-cache-factory</code> didn't actually
exist, so I had to add it in my own
<a href="https://github.com/pnf/core.cache">fork</a>.
was an adventure in its own right, since I had to
<a href="http://podsnap.com/clj-deps.html">supplant the version</a>
of <code>core.cache</code> that <code>core.async</code> was pulling in.</p>
<p>Anyway.</p>
<p>The main <code>crpop</code> action is in <code>(wcar (:redis this)
(car/brpoplpush qkey bkey 60))</code>. To communicate with Redis, we use
the <a href="https://github.com/ptaoussanis/carmine">carmine</a> library
(coincidentally by the same author as timbre). <code>carmine</code> is very
cool. In addition to exposing <em>every</em>* standard
<a href="http://redis.io/commands">Redis command</a> as a
<code>car/standard-redis-command</code>, it handles connection pooling and
serialization of values, so you don't have to pretend everything's a
string.</p>
<p>The <code>(:redis this)</code> extracts connection details from the
<code>Redis-Backend</code> record; then <code>(car/brpoplpush qkey bkey 60)</code>
blocks, trying to pop an element from right end <code>qkey</code>, which it
will do atomically with pushing it onto left end of <code>bkey</code>. If we
get something, we push it onto the <code>out</code> channel and block again.
If we don't, we make sure that <code>out</code> is still open (i.e. someone
cares), and block for another 60 seconds. The synchronous, blocking
nature of the Redis call is hidden from the consumer of <code>out</code>.</p>
<p>There were other options here. First, I could have specified an
infinite timeout, which generally wouldn't change functionality, but
would have made it harder to shut down operations if necessary.
Second, I
could have used Redis pub/sub capability to avoid explicit blocking,
but pub/sub with exactly one consumer felt like overkill, and the
blocking is only moderately evil in a scheme of things where (1) Redis
itself is fully asynchronous internally, (2) the pub/sub alternative
still requires something to block, wait waiting for messages.</p>
<p>There's a corresponding <code>clpush</code>, which exposes pushing onto a
Redis queue as writing to an <code>async</code> channel.</p>
<h3>Listening for finished calculations</h3>
<p>I haven't actually explained how you ask for a calculation, or how it
get done, but it will be helpful first to understand how we distribute
the results.</p>
<p>Here we <em>do</em> actually make use of Redis pub/sub, to provide
notification when a calculation has completed. As multiple running
requests may request the same calculation, having multiple listeners
is an important feature, but since <em>ongoing</em> notification of changes
to the invariant return value of a pure function is by definition
useless, it would be wasteful to build up and tear down a channel
corresponding to every request id. Accordingly, there's only <em>one</em>
actual Redis pub/sub channel called "CALCS",
but we locally keep an atom <code>subs</code>, holding a map of request ids to a
collection of channels to be notified whenever an appropriate messages
comes in:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defrecord </span><span class="nv">Redis-KV-Listener</span> <span class="p">[</span><span class="nb">subs </span><span class="nv">listener</span><span class="p">])</span>
<span class="p">(</span><span class="kd">defn- </span><span class="nv">kv-message-cb</span> <span class="p">[</span><span class="nv">a</span> <span class="p">[</span><span class="nv">etype</span> <span class="nv">_</span> <span class="nb">val </span><span class="ss">:as</span> <span class="nv">msg</span><span class="p">]]</span>
<span class="p">(</span><span class="nb">when </span><span class="p">(</span><span class="nb">and </span><span class="p">(</span><span class="nb">= </span><span class="nv">etype</span> <span class="s">"message"</span><span class="p">)</span> <span class="p">(</span><span class="nb">vector? </span><span class="nv">val</span><span class="p">))</span>
<span class="p">(</span><span class="k">let </span><span class="p">[[</span><span class="nv">k</span> <span class="nv">v</span><span class="p">]</span> <span class="nv">val</span><span class="p">]</span>
<span class="p">(</span><span class="nf">swap!</span> <span class="nv">a</span> <span class="p">(</span><span class="k">fn </span><span class="p">[</span><span class="nv">cmap</span><span class="p">]</span>
<span class="p">(</span><span class="nb">doseq </span><span class="p">[</span><span class="nv">c</span> <span class="p">(</span><span class="nb">keys </span><span class="p">(</span><span class="nb">get </span><span class="nv">cmap</span> <span class="nv">k</span><span class="p">))]</span> <span class="p">(</span><span class="nf">go</span> <span class="p">(</span><span class="nf">>!</span> <span class="nv">c</span> <span class="nv">v</span><span class="p">)</span> <span class="p">(</span><span class="nf">close!</span> <span class="nv">c</span><span class="p">)))</span>
<span class="p">(</span><span class="nb">dissoc </span><span class="nv">cmap</span> <span class="nv">k</span><span class="p">))))))</span>
<span class="p">(</span><span class="kd">defn- </span><span class="nv">kv-listener</span> <span class="p">[</span><span class="nv">redis</span> <span class="s">"CALCS"</span><span class="p">]</span>
<span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nb">subs </span> <span class="p">(</span><span class="nf">atom</span> <span class="p">{})</span> <span class="c1">;; {reqid {c1 TBD, c2 TBD}}</span>
<span class="nv">redis-listener</span> <span class="p">(</span><span class="nf">car/with-new-pubsub-listener</span>
<span class="p">(</span><span class="ss">:spec</span> <span class="nv">redis</span><span class="p">)</span>
<span class="p">{</span><span class="s">"CALCS"</span> <span class="p">(</span><span class="nb">partial </span><span class="nv">kv-message-cb</span> <span class="nv">subs</span><span class="p">)}</span>
<span class="p">(</span><span class="nf">car/subscribe</span> <span class="s">"CALCS"</span><span class="p">))]</span>
<span class="p">(</span><span class="nf">->Redis-KV-Listener</span> <span class="nb">subs </span><span class="nv">redis-listener</span><span class="p">)))</span>
</code></pre></div>
<p>The <code>(car/with-new-pubsub-listener ...)</code> call creates a thread in
<code>carmine</code>, which listens on a socket for messages on the topic "CALCS" and
calls my <code>kv-message-cb</code> whenever it receives one.</p>
<p>Messages come back from Redis as tuples of
<code>[event-type channel-name value]</code>, where
<code>value</code> for us will be a further tuple, <code>[reqid, result]</code>.
The callback just forwards results to all <code>async</code> channels
registered for the <code>reqid</code> in the <code>subs</code> map.</p>
<p>A corresponding
method in <code>Redis-Backend</code> is responsible for registering interest
in the results of a particular reqid into the <code>subs</code> atom of a
<code>Redis-KV-Listener</code>:</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="nf">kv-listen</span> <span class="p">[</span><span class="nv">this</span> <span class="nv">k</span><span class="p">]</span>
<span class="p">(</span><span class="k">let </span><span class="p">[{{</span><span class="nv">a</span> <span class="ss">:subs</span><span class="p">}</span> <span class="ss">:kvl</span><span class="p">}</span> <span class="nv">this</span>
<span class="nv">c</span> <span class="p">(</span><span class="nf">lchan</span> <span class="p">(</span><span class="nb">str </span><span class="s">"kv-listen"</span> <span class="nv">k</span><span class="p">)</span> <span class="p">(</span><span class="nf">async/dropping-buffer</span> <span class="mi">1</span><span class="p">)]</span>
<span class="p">(</span><span class="nf">swap!</span> <span class="nv">a</span> <span class="p">(</span><span class="k">fn </span><span class="p">[</span><span class="nv">cmap</span><span class="p">]</span> <span class="p">(</span><span class="nf">assoc-in</span> <span class="nv">cmap</span> <span class="p">[</span><span class="nv">k</span> <span class="nv">c</span><span class="p">]</span> <span class="mi">1</span><span class="p">)))</span>
<span class="nv">c</span><span class="p">))</span>
</code></pre></div>
<p>Note</p>
<ol>
<li>fancy destructuring here: <code>{{a :subs} :kvl} this</code> first extracts the <code>:kvl</code>
element of the <code>Redis-Backend</code> record, from which it then extracts the <code>:subs</code> record
of the <code>Redis-KV-Listener</code>.</li>
<li>Yes, the collection of channels could have been a set, but I wanted to use <code>assoc-in</code>
'cause it's prettier, and I have a premonition that someday, the values in the map will be useful
for something.</li>
<li>Use of <code>dropping-buffer</code> is explained about 10 cm below.</li>
</ol>
<p>Publication is relatively normal, Redis-wise, but note that we take
care of informing any local listeners ASAP, without making them wait
for the message to pass through Redis:</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="nf">kv-publish</span> <span class="p">[</span><span class="nv">this</span> <span class="nv">k</span> <span class="nv">v</span><span class="p">]</span>
<span class="p">(</span><span class="nf">go</span> <span class="p">(</span><span class="nb">doseq </span><span class="p">[</span><span class="nv">c</span> <span class="p">(</span><span class="nb">keys </span><span class="p">(</span><span class="nb">get </span><span class="o">@</span><span class="nv">a</span> <span class="nv">k</span><span class="p">))]</span> <span class="p">(</span><span class="nf">>!</span> <span class="nv">c</span> <span class="nv">v</span><span class="p">)</span> <span class="p">(</span><span class="nf">close!</span> <span class="nv">c</span><span class="p">))</span>
<span class="p">(</span><span class="nf">swap!</span> <span class="nv">a</span> <span class="nb">dissoc </span><span class="nv">k</span><span class="p">))</span>
<span class="p">(</span><span class="nf">wcar</span> <span class="nv">redis</span> <span class="p">(</span><span class="ss">:redis</span> <span class="nv">this</span><span class="p">)</span> <span class="p">(</span><span class="nf">car/publish</span> <span class="s">"CALCS"</span> <span class="p">[</span><span class="nv">k</span> <span class="nv">v</span><span class="p">])))</span>
</code></pre></div>
<p>Eventually, <code>kv-message-cb</code> will receive these same messages after they pass through
Redis, but this race condition is harmless, as the loser will just be whistling into <code>/dev/null</code>.
The one improbable disaster that might have occurred is that both the <code>kv-message-cb</code> and
<code>kv-publish</code> might have interleaved their <code>>!</code>s and <code>close!</code>s, but the
<code>dropping-buffer</code> makes sure that this couldn't result in a blocking condition.</p>
<h3>Requesting a calculation</h3>
<p>Girder provides an <code>enqueue</code> function, which, ignoring some straightforward
business with a local cache (which, by the way, uses <code>SoftReference</code>s, so it will hang to its contents until we get
close to the heap limit)
and a bit of instrumentation,
doesn't do much more than call <code>Girder-Backend</code>'s <code>enqueue-listen</code>, which
will hand back a channel that eventually delivers the wrapped calculation result:</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="k">def </span><span class="nv">local-cache</span> <span class="p">(</span><span class="nf">atom</span> <span class="p">(</span><span class="nf">cache/soft-cache-factory</span> <span class="p">{})))</span>
<span class="p">(</span><span class="kd">defn </span><span class="nv">enqueue</span> <span class="p">[</span><span class="nv">nodeid</span> <span class="nv">req</span> <span class="o">&</span> <span class="p">[</span><span class="nv">deb</span><span class="p">]]</span>
<span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">id</span> <span class="p">(</span><span class="nf">iid</span> <span class="nv">deb</span> <span class="s">"ENQ"</span><span class="p">)</span>
<span class="nv">res</span> <span class="p">(</span><span class="nb">get </span><span class="o">@</span><span class="nv">local-cache</span> <span class="p">(</span><span class="nf">->reqid</span> <span class="nv">req</span><span class="p">))]</span>
<span class="p">(</span><span class="k">if </span><span class="nv">res</span>
<span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">c</span> <span class="p">(</span><span class="nf">chan</span><span class="p">)]</span>
<span class="p">(</span><span class="nf">debug</span> <span class="s">"enqueue"</span> <span class="nv">id</span> <span class="s">" found cached value for"</span> <span class="nv">req</span><span class="p">)</span>
<span class="p">(</span><span class="nf">go</span> <span class="p">(</span><span class="nf">>!</span> <span class="nv">c</span> <span class="nv">res</span><span class="p">))</span>
<span class="nv">c</span><span class="p">)</span>
<span class="p">(</span><span class="nf">enqueue-listen</span> <span class="o">@</span><span class="nv">back-end</span>
<span class="nv">nodeid</span> <span class="p">(</span><span class="nf">->reqid</span> <span class="nv">req</span><span class="p">)</span>
<span class="ss">:requests</span> <span class="ss">:state</span>
<span class="p">(</span><span class="k">fn </span><span class="nv">not-started?</span> <span class="p">[</span><span class="nv">v</span><span class="p">]</span> <span class="p">(</span><span class="nb">nil? </span><span class="nv">v</span><span class="p">))</span>
<span class="p">(</span><span class="k">fn </span><span class="nv">done?</span> <span class="p">[</span> <span class="p">[</span><span class="nv">s</span> <span class="nv">_</span><span class="p">]</span> <span class="p">]</span> <span class="p">(</span><span class="nb">= </span><span class="nv">s</span> <span class="ss">:done</span><span class="p">))</span>
<span class="p">(</span><span class="k">fn </span><span class="nv">extract</span> <span class="p">[</span> <span class="p">[</span><span class="nv">_</span> <span class="nv">v</span><span class="p">]</span> <span class="p">]</span> <span class="nv">v</span><span class="p">)</span>
<span class="nv">id</span><span class="p">))))</span>
</code></pre></div>
<p>The <code>enqueue-listen</code> method makes use of a global state values stored in redis under <code>(str "state-val-" reqid)</code>
and comprising a vector <code>[STATE VALUE]</code> where <code>STATE</code> is <code>nil</code>, <code>:running</code> or <code>:done</code>, and
will always progress in that order. If the <code>STATE</code> is <code>:done</code>, <code>VALUE</code> will be the cached result.
The three function arguments deal with detecting whether a request is brand-new or finished, and extracting
the result. This is a case where the optional naming of $\lambda$ functions makes the code a lot easier to understand.</p>
<p>This operation of <code>enqueue-listen</code> is relatively complicated,
because it's the one place in the code that explicitly
handles a potential race condition:
a calculation starting between the time we get the request and the time
we go about handling it. Really, this is just optimization that makes redundant calculations
less likely, since, our our lovely functional world, they would at worst be wasteful, but
in practice it's not uncommon for a raft of simultaneously issued requests go on to request
exactly the same thing, also simultaneously, so the optimization does get used.</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="nf">enqueue-listen</span>
<span class="p">[</span><span class="nv">this</span>
<span class="nv">nodeid</span> <span class="nv">reqid</span>
<span class="nv">queue-type</span> <span class="nv">val-type</span>
<span class="nv">enqueue-pred</span> <span class="nv">done-pred</span> <span class="nv">done-extract</span> <span class="nv">debug-info</span><span class="p">]</span>
<span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">redis</span> <span class="p">(</span><span class="nb">assoc </span><span class="p">(</span><span class="ss">:redis</span> <span class="nv">this</span><span class="p">)</span> <span class="ss">:reqid</span> <span class="nv">reqid</span> <span class="ss">:nodeid</span> <span class="nv">nodeid</span><span class="p">)</span> <span class="c1">; :single-conn true</span>
<span class="nv">qkey</span> <span class="p">(</span><span class="nf">queue-key</span> <span class="nv">nodeid</span> <span class="nv">queue-type</span><span class="p">)</span>
<span class="nv">vkey</span> <span class="p">(</span><span class="nf">val-key</span> <span class="nv">reqid</span> <span class="nv">val-type</span><span class="p">)</span>
<span class="nv">c</span> <span class="p">(</span><span class="nf">kv-listen</span> <span class="nv">this</span> <span class="nv">reqid</span> <span class="nv">debug-info</span><span class="p">)]</span>
<span class="p">(</span><span class="nf">wcar</span> <span class="nv">redis</span>
<span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">v</span> <span class="p">(</span><span class="nb">second </span><span class="p">(</span><span class="nf">protocol/with-replies*</span> <span class="c1">; wcar redis ;</span>
<span class="p">(</span><span class="nf">car/watch</span> <span class="nv">vkey</span><span class="p">)</span>
<span class="p">(</span><span class="nf">car/get</span> <span class="nv">vkey</span><span class="p">)))</span>
<span class="nv">_</span> <span class="p">(</span><span class="nf">trace</span> <span class="s">"enqueue-listen at"</span> <span class="nv">nodeid</span> <span class="s">"found state of"</span> <span class="nv">reqid</span> <span class="s">"="</span> <span class="nv">v</span> <span class="nv">c</span><span class="p">)]</span>
<span class="p">(</span><span class="nf">cond</span>
<span class="p">(</span><span class="nf">done-pred</span> <span class="nv">v</span><span class="p">)</span> <span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">v</span> <span class="p">(</span><span class="nf">done-extract</span> <span class="nv">v</span><span class="p">)]</span>
<span class="p">(</span><span class="nf">trace</span> <span class="s">"enqueue-listen"</span> <span class="nv">reqid</span> <span class="s">"already done, publishing"</span> <span class="nv">v</span><span class="p">)</span>
<span class="p">(</span><span class="nf">go</span> <span class="p">(</span><span class="nf">>!</span> <span class="nv">c</span> <span class="nv">v</span><span class="p">)</span> <span class="p">(</span><span class="nf">close!</span> <span class="nv">c</span><span class="p">)))</span>
<span class="p">(</span><span class="nf">enqueue-pred</span> <span class="nv">v</span><span class="p">)</span> <span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">r</span> <span class="p">(</span><span class="nf">protocol/with-replies*</span> <span class="c1">; wcar redis</span>
<span class="c1">;; x'n will fail if vkey has been messed with.</span>
<span class="p">(</span><span class="nf">car/multi</span><span class="p">)</span>
<span class="p">(</span><span class="nf">car/lpush</span> <span class="nv">qkey</span> <span class="nv">reqid</span><span class="p">)</span>
<span class="p">(</span><span class="nf">car/exec</span><span class="p">))]</span>
<span class="p">(</span><span class="nf">trace</span> <span class="s">"enqueue-listen enqueueing"</span> <span class="nv">reqid</span> <span class="nv">r</span><span class="p">))</span>
<span class="ss">:else</span> <span class="p">(</span><span class="nf">trace</span> <span class="s">"enqueue-listen"</span> <span class="nv">reqid</span> <span class="s">"state already"</span> <span class="nv">v</span><span class="p">))</span>
<span class="p">(</span><span class="nf">car/unwatch</span><span class="p">)))</span>
<span class="nv">c</span><span class="p">))</span>
</code></pre></div>
<p>This first thing that happens is we <code>kv-listen</code> as explained above
for information about this <code>reqid</code>,
so no matter what happens, we'll eventually find out the result.</p>
<p>Everything else occurs within the carmine <code>wcar</code> macro. Within
this scope, anything Redis-related will use the same connection, which
is useful because the first thing that happens is the pipelined pair
of Redis commands <code>(car/watch vkey) (car/get vkey)</code>.
The <code>watch</code> will cause any subsequent transaction
(defined in Redis as commands sandwiched between <code>multi</code> and <code>exec</code>) to
fail. This "optimistic" approach can be a more efficient way of handling concurrency than
true locking. Here, we are getting the state and then <code>watch</code> for changes in it.</p>
<p>If we find the state to be <code>:running</code>, then we don't have to do anything. Someone else is calculating
it somewhere, and we'll pick up the result, as we're already <code>kv-listen</code>ing for it.</p>
<p>If the state is <code>:done</code>, then it
isn't going to change any further. Most likely, it became <code>:done</code> prior to our
<code>kv-listen</code>, so we inform the output channel directly. On the off chance that it became
<code>:done</code> after our <code>kv-listen</code>, then it will be published redundantly.</p>
<p>If the state is <code>nil</code>, then we're probably encountering the <code>reqid</code> for the first time
in recorded history, so
we <code>lpush</code> it onto the Redis queue; however, since push occurs <em>within a Redis transaction</em>, it will
fail if any <code>watch</code>ed value has changed, i.e. if the request has started
<code>:running</code> somewhere else or has passed through that stage and is now <code>:done</code>. In either case, the
<code>kv-listen</code> will take care of the result. We can let the transaction fail silently, with the same
effect as if we had determined that the request was <code>:running</code>.</p>
<h3>Acting on requests for calculations</h3>
<p>Work gets done by worker nodes, which listen on a queue, plucking off work to do.
Worker nodes are launched with the function below, and <code>async/go-loop</code> until told to stop.
As this is a function, rather than a protocol method, it has no access to the <code>Girder-Backend</code>
object, which we have therefore stashed in a local <code>atom</code> called <code>back-end</code>. I'll
walk through this code below, but for now just note that no assumptions are made about <code>Redis</code>
being involved.</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">unclaimed</span> <span class="p">[</span><span class="nv">reqid</span><span class="p">]</span> <span class="p">(</span><span class="nb">nil? </span><span class="p">(</span><span class="nf">get-val</span> <span class="o">@</span><span class="nv">back-end</span> <span class="nv">reqid</span> <span class="ss">:state</span><span class="p">)))</span>
<span class="p">(</span><span class="kd">defn </span><span class="nv">launch-worker</span>
<span class="p">[</span><span class="nv">nodeid</span> <span class="nv">poolid</span><span class="p">]</span>
<span class="p">(</span><span class="nf">add-member</span> <span class="o">@</span><span class="nv">back-end</span> <span class="nv">poolid</span> <span class="ss">:volunteers</span> <span class="nv">nodeid</span><span class="p">)</span>
<span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">ctl</span> <span class="p">(</span><span class="nf">lchan</span> <span class="p">(</span><span class="nb">str </span><span class="s">"launch-worker "</span> <span class="nv">nodeid</span><span class="p">))</span>
<span class="nv">allreqs</span> <span class="p">(</span><span class="nf">crpop</span> <span class="o">@</span><span class="nv">back-end</span> <span class="nv">nodeid</span> <span class="ss">:requests</span><span class="p">)</span>
<span class="nv">reqs</span> <span class="p">(</span><span class="nf">async/filter<</span> <span class="nv">unclaimed</span> <span class="nv">allreqs</span><span class="p">)]</span>
<span class="p">(</span><span class="nf">debug</span> <span class="s">"worker"</span> <span class="nv">nodeid</span> <span class="s">"starting"</span><span class="p">)</span>
<span class="p">(</span><span class="nf">async/go-loop</span> <span class="p">[</span><span class="nv">volunteering</span> <span class="nv">false</span><span class="p">]</span>
<span class="p">(</span><span class="nf">trace</span> <span class="s">"worker"</span> <span class="nv">nodeid</span> <span class="s">"volunteering state="</span> <span class="nv">volunteering</span><span class="p">)</span>
<span class="p">(</span><span class="k">let </span><span class="p">[[</span><span class="nv">reqid</span> <span class="nv">ch</span><span class="p">]</span> <span class="p">(</span><span class="k">if </span><span class="nv">volunteering</span>
<span class="p">(</span><span class="nf">async/alts!</span> <span class="p">[</span><span class="nv">reqs</span> <span class="nv">ctl</span><span class="p">])</span>
<span class="p">(</span><span class="nf">async/alts!</span> <span class="p">[</span><span class="nv">reqs</span> <span class="nv">ctl</span><span class="p">]</span> <span class="ss">:default</span> <span class="ss">:empty</span><span class="p">))]</span>
<span class="p">(</span><span class="nf">debug</span> <span class="s">"Worker"</span> <span class="nv">nodeid</span> <span class="s">"received"</span> <span class="nv">reqid</span> <span class="nv">ch</span><span class="p">)</span>
<span class="p">(</span><span class="nf">cond</span>
<span class="c1">;; Apparently out of work. Time to volunteer</span>
<span class="p">(</span><span class="nb">= </span><span class="nv">reqid</span> <span class="ss">:empty</span><span class="p">)</span> <span class="p">(</span><span class="nf">do</span>
<span class="p">(</span><span class="nf">debug</span> <span class="s">"worker"</span> <span class="nv">nodeid</span> <span class="s">"is bored and volunteering with"</span> <span class="nv">poolid</span><span class="p">)</span>
<span class="p">(</span><span class="nf">clear-bak</span> <span class="o">@</span><span class="nv">back-end</span> <span class="p">[</span><span class="nv">nodeid</span> <span class="ss">:requests</span><span class="p">])</span>
<span class="p">(</span><span class="nf">lpush-and-set</span> <span class="o">@</span><span class="nv">back-end</span>
<span class="nv">poolid</span> <span class="ss">:volunteers</span> <span class="nv">nodeid</span>
<span class="nv">nodeid</span> <span class="ss">:busy</span> <span class="nv">nil</span><span class="p">)</span>
<span class="p">(</span><span class="nf">recur</span> <span class="nv">true</span><span class="p">))</span>
<span class="c1">;; Control channel wants us to close.</span>
<span class="p">(</span><span class="nb">= </span><span class="nv">ch</span> <span class="nv">ctl</span><span class="p">)</span> <span class="p">(</span><span class="k">do </span><span class="p">(</span><span class="nf">debug</span> <span class="s">"Closing worker"</span> <span class="nv">nodeid</span><span class="p">)</span>
<span class="p">(</span><span class="nf">remove-member</span> <span class="o">@</span><span class="nv">back-end</span> <span class="nv">poolid</span> <span class="ss">:volunteers</span> <span class="nv">nodeid</span><span class="p">)</span>
<span class="p">(</span><span class="nf">close-all!</span> <span class="nv">reqs</span> <span class="nv">allreqs</span> <span class="nv">ctl</span><span class="p">))</span>
<span class="c1">;; Looks like real work: </span>
<span class="ss">:else</span> <span class="p">(</span><span class="nb">when </span><span class="nv">reqid</span>
<span class="p">(</span><span class="nf">debug</span> <span class="s">"Worker"</span> <span class="nv">nodeid</span> <span class="s">" nodeid will now process"</span> <span class="nv">reqid</span><span class="p">)</span>
<span class="p">(</span><span class="nf">set-val</span> <span class="o">@</span><span class="nv">back-end</span> <span class="nv">nodeid</span> <span class="ss">:busy</span> <span class="nv">true</span><span class="p">)</span>
<span class="p">(</span><span class="nf"><!</span> <span class="p">(</span><span class="nf">process-reqid</span> <span class="nv">nodeid</span> <span class="nv">reqs</span> <span class="nv">reqid</span><span class="p">))</span>
<span class="p">(</span><span class="nf">recur</span> <span class="nv">false</span><span class="p">)))))</span>
<span class="nv">ctl</span><span class="p">))</span>
</code></pre></div>
<p>In some JVM, somewhere, we'll call <code>(launch-worker "workername" "poolname")</code>, where the second argument
refers to a distributor node, which (as explained in the previous post) will probably be the source
of requests on the worker's queue.</p>
<p>The worker uses <code>crpop</code> to get requests from its queue, <code>filter<</code>ing them through <code>unclaimed</code>,
which checks the global state to see if someone else might by <code>:running</code> this task already. Continuing
a theme that has hopefully become boring by now, it doesn't matter if we fail to detect that the task
is already in progress. At worst, we'll do it twice.</p>
<p>The main go-loop uses <code>async/alts!</code> to look for work, or for a signal on <code>ctl</code> telling it
to shut down. (<code>get-val</code> and <code>set-val</code> are boring protocol methods, which in <code>Redis-Backend</code>
do a <code>GET</code> and <code>SET</code> of scalar values).</p>
<p>Workers exist in two states. Either they are "volunteering" (as
described in the previous post) in which case they expect eventually
to get work on their request queue, or they are busy, which means that,
as far as they know, there may still be work on their own sub-requests. If we're not
volunteering, then <code>alts!</code> call sets a <code>:default</code>, so an empty queue
causes an immediate return value of <code>:empty</code>, indicating that it is time to push
our <code>nodeid</code> onto the volunteer queue.</p>
<p>The <code>clear-bak</code> method of the <code>Redis-Backend</code> empties out the backup queue that we populated
earlier with <code>brpoplpush</code>, because we know we've handled everything we're supposed to.</p>
<h3>actually doing work</h3>
<p>We finally get to evaluating the functions in <code>process-reqid</code>:</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="kd">defn- </span><span class="nv">process-reqid</span> <span class="p">[</span><span class="nv">nodeid</span> <span class="nv">reqchan</span> <span class="nv">req-or-id</span> <span class="o">&</span> <span class="p">[</span><span class="nv">deb</span><span class="p">]]</span>
<span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">id</span> <span class="p">(</span><span class="nf">iid</span> <span class="nv">deb</span> <span class="s">"PRQ"</span><span class="p">)</span>
<span class="p">[</span><span class="nv">req</span> <span class="nv">reqid</span><span class="p">]</span> <span class="p">(</span><span class="nf">req+id</span> <span class="nv">req-or-id</span><span class="p">)</span>
<span class="nv">c</span> <span class="p">(</span><span class="nf">lchan</span> <span class="o">#</span><span class="p">(</span><span class="nb">str </span><span class="s">"process-reqid"</span> <span class="nv">id</span> <span class="nv">nodeid</span> <span class="nv">reqid</span><span class="p">))</span>
<span class="nv">res</span> <span class="p">(</span><span class="nb">get </span><span class="o">@</span><span class="nv">local-cache</span> <span class="nv">reqid</span><span class="p">)]</span>
<span class="p">(</span><span class="nf">go</span>
<span class="p">(</span><span class="k">if </span><span class="nv">res</span>
<span class="p">(</span><span class="k">do </span>
<span class="p">(</span><span class="nf">debug</span> <span class="s">"process-reqid found cached value"</span> <span class="nv">id</span> <span class="nv">nodeid</span> <span class="nv">reqid</span><span class="p">)</span>
<span class="p">(</span><span class="nf">>!</span> <span class="nv">c</span> <span class="p">[</span><span class="ss">:cached</span> <span class="nv">res</span><span class="p">])</span>
<span class="p">(</span><span class="nf">kv-publish</span> <span class="o">@</span><span class="nv">back-end</span> <span class="nv">reqid</span> <span class="nv">res</span><span class="p">))</span>
<span class="p">(</span><span class="k">let </span><span class="p">[</span> <span class="p">[</span><span class="nv">state</span> <span class="nv">val</span><span class="p">]</span> <span class="p">(</span><span class="nf">get-val</span> <span class="o">@</span><span class="nv">back-end</span> <span class="nv">reqid</span> <span class="ss">:state</span><span class="p">)]</span>
<span class="p">(</span><span class="nf">condp</span> <span class="nb">= </span><span class="nv">state</span>
<span class="ss">:running</span> <span class="p">(</span><span class="k">do </span>
<span class="p">(</span><span class="nf">debug</span> <span class="s">"process-reqid"</span> <span class="nv">id</span> <span class="nv">nodeid</span> <span class="nv">reqid</span> <span class="s">"already running"</span> <span class="nv">val</span><span class="p">)</span>
<span class="p">(</span><span class="nf">>!</span> <span class="nv">c</span> <span class="p">[</span><span class="ss">:running</span> <span class="nv">val</span><span class="p">]))</span>
<span class="ss">:done</span> <span class="p">(</span><span class="k">do </span><span class="p">(</span><span class="nf">debug</span> <span class="s">"process-reqid"</span> <span class="nv">id</span> <span class="nv">nodeid</span> <span class="nv">reqid</span> <span class="s">"already done"</span> <span class="nv">val</span><span class="p">)</span>
<span class="p">(</span><span class="nf">kv-publish</span> <span class="o">@</span><span class="nv">back-end</span> <span class="nv">reqid</span> <span class="nv">val</span><span class="p">)</span>
<span class="p">(</span><span class="nf">>!</span> <span class="nv">c</span> <span class="p">[</span><span class="ss">:done</span> <span class="nv">val</span><span class="p">]))</span>
<span class="nv">nil</span> <span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">state1</span> <span class="p">(</span><span class="nf">set-val</span> <span class="o">@</span><span class="nv">back-end</span> <span class="nv">reqid</span> <span class="ss">:state</span> <span class="p">[</span><span class="ss">:running</span> <span class="nv">nodeid</span><span class="p">])</span>
<span class="nv">_</span> <span class="p">(</span><span class="nf">debug</span> <span class="s">"process-reqid"</span> <span class="nv">id</span> <span class="nv">nodeid</span> <span class="nv">reqid</span> <span class="nv">state1</span> <span class="s">"->"</span> <span class="ss">:running</span><span class="p">)</span>
<span class="p">[</span><span class="nv">f</span> <span class="o">&</span> <span class="nv">args</span><span class="p">]</span> <span class="nv">req</span>
<span class="nv">cres</span> <span class="p">(</span><span class="nb">binding </span><span class="p">[</span><span class="nv">*nodeid*</span> <span class="nv">nodeid</span>
<span class="nv">*reqchan*</span> <span class="nv">reqchan</span>
<span class="nv">*current-reqid*</span> <span class="p">(</span><span class="nf">->reqid</span> <span class="nv">reqid</span><span class="p">)]</span>
<span class="p">(</span><span class="nb">apply </span><span class="nv">f</span> <span class="nv">args</span><span class="p">))</span>
<span class="nv">res</span> <span class="p">(</span><span class="nf"><!</span> <span class="nv">cres</span><span class="p">)</span>
<span class="nv">state2</span> <span class="p">(</span><span class="nf">set-val</span> <span class="o">@</span><span class="nv">back-end</span> <span class="nv">reqid</span> <span class="ss">:state</span> <span class="p">[</span><span class="ss">:done</span> <span class="nv">res</span><span class="p">])]</span>
<span class="p">(</span><span class="nf">>!</span> <span class="nv">c</span> <span class="p">[</span><span class="ss">:calcd</span> <span class="nv">res</span><span class="p">])</span>
<span class="p">(</span><span class="nf">swap!</span> <span class="nv">local-cache</span> <span class="nb">assoc </span><span class="nv">reqid</span> <span class="nv">res</span><span class="p">)</span>
<span class="p">(</span><span class="nf">debug</span> <span class="s">"process-reqid"</span> <span class="nv">id</span> <span class="nv">nodeid</span> <span class="nv">reqid</span> <span class="nv">state1</span> <span class="s">"->"</span> <span class="nv">state2</span> <span class="s">"->"</span> <span class="ss">:done</span> <span class="nv">res</span><span class="p">)</span>
<span class="p">(</span><span class="nf">kv-publish</span> <span class="o">@</span><span class="nv">back-end</span> <span class="nv">reqid</span> <span class="nv">res</span><span class="p">))))))</span>
<span class="nv">c</span><span class="p">))</span>
</code></pre></div>
<p>As elsewhere, we re-check the local cache and the global state, in a hail-Mary pass to avoid doing actual work.</p>
<p>Under the horrible circumstance where we actually have to do something:</p>
<ol>
<li>Call a protocol method to mark the state as <code>:running</code> and the location where it is doing so,
in case someone is curious. (The existence of myriad tools for inspecting Redis rewards leaving breadcrumbs like
this when it comes time to debug.)</li>
<li>Destructure the function and its arguments from the request. Remember that the function is <code>:girded</code>, so
it will wrap up the <code>:value</code> or <code>:error</code> as the case may be and send it to a channel that it returns immediately.</li>
<li>Locally bind our current <code>nodeid</code> and the <code>reqchan</code> from which we're pulling requests, in case the function
ends up making re-entrant calls.</li>
<li>Locally bind the <code>current-reqid</code> to assist <code>format-exception</code> in generating an error stack if necessary.</li>
<li><code>apply</code> the function to the request arguments and wait.</li>
<li>Mark that we're done, and store the result.</li>
<li>Publish the result.</li>
</ol>
<h3>Distributing requests</h3>
<p>Typically, one doesn't request work directly from a worker, but from a distributor that has a pool of
workers at its disposal.</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">not-busy</span> <span class="p">[</span><span class="nv">nodeid</span><span class="p">]</span> <span class="p">(</span><span class="nb">nil? </span><span class="p">(</span><span class="nf">get-val</span> <span class="o">@</span><span class="nv">back-end</span> <span class="nv">nodeid</span> <span class="ss">:busy</span><span class="p">)))</span>
<span class="p">(</span><span class="kd">defn </span><span class="nv">launch-distributor</span>
<span class="s">"Listen for requests and volunteers.</span>
<span class="s"> When we find one of each, add request to volunteer's queue."</span>
<span class="p">[</span><span class="nv">nodeid</span> <span class="o">&</span> <span class="p">[</span><span class="nv">poolid</span><span class="p">]]</span>
<span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">ctl</span> <span class="p">(</span><span class="nf">lchan</span> <span class="p">(</span><span class="nb">str </span> <span class="s">"launch-distributor "</span> <span class="nv">nodeid</span><span class="p">))</span>
<span class="nv">allreqs</span> <span class="p">(</span><span class="nf">crpop</span> <span class="o">@</span><span class="nv">back-end</span> <span class="nv">nodeid</span> <span class="ss">:requests</span><span class="p">)</span>
<span class="nv">reqs</span> <span class="p">(</span><span class="nf">async/filter<</span> <span class="nv">unclaimed</span> <span class="nv">allreqs</span><span class="p">)</span>
<span class="nv">allvols</span> <span class="p">(</span><span class="nf">crpop</span> <span class="o">@</span><span class="nv">back-end</span> <span class="nv">nodeid</span> <span class="ss">:volunteers</span><span class="p">)</span>
<span class="nv">vols</span> <span class="p">(</span><span class="nf">async/filter<</span> <span class="nv">not-busy</span> <span class="nv">allvols</span><span class="p">)</span>
<span class="nv">reqs+vols</span> <span class="p">(</span><span class="nf">async/map</span> <span class="nb">vector </span><span class="p">[</span><span class="nv">reqs</span> <span class="nv">vols</span><span class="p">])]</span>
<span class="p">(</span><span class="nb">when </span><span class="nv">poolid</span> <span class="p">(</span><span class="nf">add-member</span> <span class="o">@</span><span class="nv">back-end</span> <span class="nv">poolid</span> <span class="ss">:volunteers</span> <span class="nv">nodeid</span><span class="p">))</span>
<span class="p">(</span><span class="nf">async/go-loop</span> <span class="p">[</span><span class="nv">volunteering</span> <span class="nv">false</span><span class="p">]</span>
<span class="p">(</span><span class="nf">trace</span> <span class="s">"distributor"</span> <span class="nv">nodeid</span> <span class="s">"waiting for requests and volunteers"</span><span class="p">)</span>
<span class="p">(</span><span class="k">let </span><span class="p">[</span> <span class="p">[</span><span class="nv">v</span> <span class="nv">c</span><span class="p">]</span> <span class="p">(</span><span class="k">if </span><span class="nv">volunteering</span>
<span class="p">(</span><span class="nf">async/alts!</span> <span class="p">[</span><span class="nv">reqs+vols</span> <span class="nv">ctl</span><span class="p">])</span>
<span class="p">(</span><span class="nf">async/alts!</span> <span class="p">[</span><span class="nv">reqs+vols</span> <span class="nv">ctl</span><span class="p">]</span> <span class="ss">:default</span> <span class="ss">:empty</span><span class="p">))]</span>
<span class="p">(</span><span class="nf">debug</span> <span class="s">"distributor"</span> <span class="nv">nodeid</span> <span class="s">"got"</span> <span class="nv">v</span> <span class="nv">c</span><span class="p">)</span>
<span class="p">(</span><span class="nf">cond</span>
<span class="p">(</span><span class="nb">= </span><span class="nv">c</span> <span class="nv">ctl</span><span class="p">)</span> <span class="p">(</span><span class="k">do </span>
<span class="p">(</span><span class="nf">un-register</span> <span class="nv">nodeid</span> <span class="nv">poolid</span><span class="p">)</span>
<span class="p">(</span><span class="nf">close-all!</span> <span class="nv">reqs</span> <span class="nv">allreqs</span> <span class="nv">vols</span> <span class="nv">reqs+vols</span> <span class="nv">ctl</span><span class="p">))</span>
<span class="p">(</span><span class="nb">= </span><span class="nv">v</span> <span class="ss">:empty</span><span class="p">)</span> <span class="p">(</span><span class="k">do </span><span class="p">(</span><span class="nb">when </span><span class="nv">poolid</span>
<span class="p">(</span><span class="nf">debug</span> <span class="s">"distributor"</span> <span class="nv">nodeid</span> <span class="s">"is bored and volunteering with"</span> <span class="nv">poolid</span><span class="p">)</span>
<span class="p">(</span><span class="nf">lpush-and-set</span> <span class="o">@</span><span class="nv">back-end</span>
<span class="nv">poolid</span> <span class="ss">:volunteers</span> <span class="nv">nodeid</span>
<span class="nv">nodeid</span> <span class="ss">:busy</span> <span class="nv">nil</span><span class="p">))</span>
<span class="p">(</span><span class="nf">trace</span> <span class="s">"distributor"</span> <span class="nv">nodeid</span> <span class="s">"recurring"</span><span class="p">)</span>
<span class="p">(</span><span class="nf">recur</span> <span class="nv">true</span><span class="p">))</span>
<span class="ss">:else</span> <span class="p">(</span><span class="k">let </span><span class="p">[</span> <span class="p">[</span><span class="nv">reqid</span> <span class="nv">volid</span><span class="p">]</span> <span class="nv">v</span><span class="p">]</span>
<span class="p">(</span><span class="nf">debug</span> <span class="s">"distributor"</span> <span class="nv">nodeid</span> <span class="s">"pushing"</span> <span class="nv">reqid</span> <span class="s">"to request queue for"</span> <span class="nv">volid</span><span class="p">)</span>
<span class="p">(</span><span class="nb">when </span><span class="nv">poolid</span> <span class="p">(</span><span class="nf">set-val</span> <span class="o">@</span><span class="nv">back-end</span> <span class="nv">nodeid</span> <span class="ss">:busy</span> <span class="nv">true</span><span class="p">))</span>
<span class="p">(</span><span class="nf">lpush</span> <span class="o">@</span><span class="nv">back-end</span> <span class="nv">volid</span> <span class="ss">:requests</span> <span class="nv">reqid</span><span class="p">)</span>
<span class="p">(</span><span class="nf">clear-bak</span> <span class="o">@</span><span class="nv">back-end</span> <span class="p">[</span><span class="nv">nodeid</span> <span class="ss">:volunteers</span> <span class="nv">nodeid</span> <span class="ss">:requests</span><span class="p">])</span>
<span class="p">(</span><span class="nf">recur</span> <span class="nv">false</span><span class="p">)))))</span>
<span class="nv">ctl</span><span class="p">))</span>
</code></pre></div>
<p>We'll launch one of these with <code>(launch-distributor "poolname")</code>. Since distributors can distribute to other distributors, there can
also be a second argument, for the pool that this distributor belongs to.</p>
<p>The main excitement here is that we're listening simultaneously on multiple channels in a complicated way.
The request channel has already-claimed requests filtered out using <code>async/filter<</code>; stale volunteers
volunteers are weeded out in a similar fashion. Finally, vetted volunteer and request pairs are
<code>async/map vector</code>ed together into a single channel that disgorges <code>[request volunteer]</code> tuples:</p>
<div class="highlight"><pre><span></span><code><span class="nv">allreqs</span> <span class="p">(</span><span class="nf">crpop</span> <span class="o">@</span><span class="nv">back-end</span> <span class="nv">nodeid</span> <span class="ss">:requests</span><span class="p">)</span> <span class="c1">;; channel to receive all requests</span>
<span class="nv">reqs</span> <span class="p">(</span><span class="nf">async/filter<</span> <span class="nv">unclaimed</span> <span class="nv">allreqs</span><span class="p">)</span> <span class="c1">;; filter out the ones that nobody is working on yet</span>
<span class="nv">allvols</span> <span class="p">(</span><span class="nf">crpop</span> <span class="o">@</span><span class="nv">back-end</span> <span class="nv">nodeid</span> <span class="ss">:volunteers</span><span class="p">)</span> <span class="c1">;; channel to receive all volunteers</span>
<span class="nv">vols</span> <span class="p">(</span><span class="nf">async/filter<</span> <span class="nv">not-busy</span> <span class="nv">allvols</span><span class="p">)</span> <span class="c1">;; filter out volunteers who became busy after volunteering</span>
<span class="nv">reqs+vols</span> <span class="p">(</span><span class="nf">async/map</span> <span class="nb">vector </span><span class="p">[</span><span class="nv">reqs</span> <span class="nv">vols</span><span class="p">])]</span> <span class="c1">;; bundle together request and volunteer tuples</span>
</code></pre></div>
<p>Other than the fact that it receives these tuples rather than plain requests, the <code>async/alt!</code> logic is about
the same as it was in the worker. When we get a live tuple, we simply push the <code>reqid</code> onto the <code>volid</code>
queue and move-along.</p>
<h3>Re-entrant requests</h3>
<p>As in the examples, we might encounter <code>requests</code> within a <code>cdefn</code> function.
In these cases, the macro will have been called without a nodeid argument, but we have one from the thread-local
binding, so, <code>requests</code> expands eventually into a call to </p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="nf">enqueue-reentrant</span> <span class="nv">reqs</span> <span class="nv">*nodeid*</span> <span class="nv">*reqchan*</span><span class="p">)))</span>
</code></pre></div>
<p>which was shown last week, but should make more sense now:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">enqueue-reentrant</span> <span class="p">[</span><span class="nv">reqs</span> <span class="nv">nodeid</span> <span class="nv">reqchan</span><span class="p">]</span>
<span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">out</span> <span class="p">(</span><span class="nf">chan</span><span class="p">)]</span>
<span class="p">(</span><span class="nf">go</span>
<span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">results</span> <span class="p">(</span><span class="nf">async/map</span> <span class="nb">vector </span><span class="p">(</span><span class="nb">map </span><span class="o">#</span><span class="p">(</span><span class="nf">enqueue</span> <span class="nv">nodeid</span> <span class="nv">%</span><span class="p">)</span> <span class="nv">reqs</span><span class="p">))]</span>
<span class="p">(</span><span class="nf">async/go-loop</span> <span class="p">[]</span>
<span class="p">(</span><span class="k">let </span><span class="p">[</span> <span class="p">[</span><span class="nv">v</span> <span class="nv">c</span><span class="p">]</span> <span class="p">(</span><span class="nf">async/alts!</span> <span class="p">[</span><span class="nv">results</span> <span class="nv">reqchan</span><span class="p">])]</span>
<span class="p">(</span><span class="k">if </span><span class="p">(</span><span class="nb">= </span><span class="nv">c</span> <span class="nv">results</span><span class="p">)</span>
<span class="p">(</span><span class="k">do </span><span class="p">(</span><span class="nf">>!</span> <span class="nv">out</span> <span class="nv">v</span><span class="p">)</span> <span class="p">(</span><span class="nf">close!</span> <span class="nv">c</span><span class="p">)</span> <span class="p">(</span><span class="nf">close!</span> <span class="nv">out</span><span class="p">)</span>
<span class="p">(</span><span class="k">do </span><span class="p">(</span><span class="nf"><!</span> <span class="p">(</span><span class="nf">process-reqid</span> <span class="nv">nodeid</span> <span class="nv">reqchan</span> <span class="nv">v</span><span class="p">))</span>
<span class="p">(</span><span class="nf">recur</span><span class="p">))))))))</span>
<span class="nv">out</span><span class="p">))</span>
</code></pre></div>
<p>As before, it dumps the requests onto the local queue, and then begins pulling from that same queue.
The crucial thing to note is that <code>process-reqid</code> is called both here <em>and</em> by <code>launch-worker</code>,
the difference being that <code>launch-worker</code> will only get work that it's volunteered for, while, here,
we already know that there's work on the queue, because we put it there.</p>
<h3>Sharing the load</h3>
<p>As illustrated so far, a worker might end up with a very long queue indeed due to nested re-entrant requests .
This is supposed
to be a distributed system, so we want some way to take share the load. The helper process</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">launch-helper</span> <span class="p">[</span><span class="nv">nodeid</span> <span class="nv">cycle-msec</span><span class="p">]</span>
<span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">ctl</span> <span class="p">(</span><span class="nf">lchan</span> <span class="p">(</span><span class="nb">str </span><span class="s">"launch-helper nodeid"</span><span class="p">))]</span>
<span class="p">(</span><span class="nf">debug</span> <span class="s">"worker"</span> <span class="nv">nodeid</span> <span class="s">"starting"</span><span class="p">)</span>
<span class="p">(</span><span class="nf">async/go-loop</span> <span class="p">[]</span>
<span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">member-nodeids</span> <span class="p">(</span><span class="nf">get-members</span> <span class="o">@</span><span class="nv">back-end</span> <span class="nv">nodeid</span> <span class="ss">:volunteers</span><span class="p">)</span>
<span class="nv">in-our-queue</span> <span class="p">(</span><span class="nb">set </span><span class="p">(</span><span class="nf">qall</span> <span class="o">@</span><span class="nv">back-end</span> <span class="nv">nodeid</span> <span class="ss">:requests</span><span class="p">))</span>
<span class="nv">in-member-queues</span> <span class="p">(</span><span class="nb">apply </span><span class="nv">clojure.set/union</span>
<span class="p">(</span><span class="nb">map </span><span class="o">#</span><span class="p">(</span><span class="nb">set </span><span class="p">(</span><span class="nf">qall</span> <span class="o">@</span><span class="nv">back-end</span> <span class="nv">%</span> <span class="ss">:requests</span><span class="p">))</span> <span class="nv">member-nodeids</span><span class="p">))</span>
<span class="nv">additions</span> <span class="p">(</span><span class="nf">clojure.set/difference</span> <span class="nv">in-member-queues</span> <span class="nv">in-our-queue</span><span class="p">)]</span>
<span class="p">(</span><span class="nb">when </span><span class="p">(</span><span class="nb">seq </span><span class="nv">additions</span><span class="p">)</span>
<span class="p">(</span><span class="nf">debug</span> <span class="s">"Helper"</span> <span class="nv">nodeid</span> <span class="s">"lifting requests"</span> <span class="nv">additions</span><span class="p">)</span>
<span class="p">(</span><span class="nf">lpush-many</span> <span class="o">@</span><span class="nv">back-end</span> <span class="nv">nodeid</span> <span class="ss">:requests</span> <span class="p">(</span><span class="nf">vec</span> <span class="nv">additions</span><span class="p">))))</span>
<span class="p">(</span><span class="k">if </span><span class="p">(</span><span class="nf">closed?</span> <span class="nv">ctl</span><span class="p">)</span>
<span class="p">(</span><span class="nf">debug</span> <span class="s">"Closing helper"</span> <span class="nv">nodeid</span><span class="p">)</span>
<span class="p">(</span><span class="k">do </span>
<span class="p">(</span><span class="nf">trace</span> <span class="s">"helper"</span> <span class="nv">nodeid</span> <span class="s">"waiting"</span> <span class="nv">cycle-msec</span><span class="p">)</span>
<span class="p">(</span><span class="nf"><!</span> <span class="p">(</span><span class="nf">timeout</span> <span class="nv">cycle-msec</span><span class="p">))</span>
<span class="p">(</span><span class="nf">recur</span><span class="p">))))</span>
<span class="nv">ctl</span><span class="p">))</span>
</code></pre></div>
<p>wakes up on a configurable cycle, looks at the queues of all nodes in its pool and simply <em>copies</em>
anything it finds there onto the end of the distributor's queue, whence, as described just above,
it will dole out requests to any workers volunteering. So, basically, we're setting up a race where
the original worker has a bit of a head start. Again, the worst case is that the same calculation gets done
twice, and the main extra cost is in checking to see if a request is in process before acting on it.</p>
<p>Should the distributor belong to another pool up the hierarchy, then its queue will similarly be copied up.</p>
<p>The main idea here is that we generally try to do calculations locally, because</p>
<ol>
<li>there's some latency involved in distributing them (and we have logic to inform local
listeners on an express path, so results don't need to flow through Redis),</li>
<li>the next step in this project will use caching strategies such that the values Redis
holds are only unique ids.</li>
</ol>
<p>The system should be tuned such that distribution occurs roughly at the point where things are taking
about as much time as the latency that distribution will introduce.</p>
<h3>Phew!</h3>
<p>Next time, I'll talk a little bit about testing this on 100 AWS nodes. It should be a lot shorter
than this.</p>Quick tip on getting lein to recognize your new dependency2014-08-24T00:00:00-04:002014-08-24T00:00:00-04:00Peter Fraenkeltag:blog.podsnap.com,2014-08-24:/clj-deps.html<p>Ever realized you needed to modify some dependency (e.g. I needed to add weak cache support to <code>core.cache</code>), go a head with forking on github, <code>lein install</code> your new version, modify your <code>project.clj</code> to pick it up, and then... somehow... you still seem to get the old version?
That's usually because some <strong>other</strong> dependency is explicitly requesting a version of that library. If you do <code>lein classpath</code>, you'll see your library show up, in two different versions, with the one you don't want first.</p>
<p>In my case, the relevant bit of classpath shows</p>
<div class="highlight"><pre><span></span><code> /Users/pnf/.m2/repository …</code></pre></div><p>Ever realized you needed to modify some dependency (e.g. I needed to add weak cache support to <code>core.cache</code>), go a head with forking on github, <code>lein install</code> your new version, modify your <code>project.clj</code> to pick it up, and then... somehow... you still seem to get the old version?
That's usually because some <strong>other</strong> dependency is explicitly requesting a version of that library. If you do <code>lein classpath</code>, you'll see your library show up, in two different versions, with the one you don't want first.</p>
<p>In my case, the relevant bit of classpath shows</p>
<div class="highlight"><pre><span></span><code> /Users/pnf/.m2/repository/org/clojure/core.cache/0.6.3/core.cache-0.6.3.jar: ... :/Users/pnf/.m2/repository/core/cache/core.cache/0.6.5-pnf-SNAPSHOT/core.cache-0.6.5-pnf-SNAPSHOT.jar
</code></pre></div>
<p>So when I try to use <code>core.cache/weak-cache-factory</code>, I'm told it doesn't exist. How could this happen?</p>
<p>To further debug the problem</p>
<div class="highlight"><pre><span></span><code> lein pom
mvn dependency:tree
</code></pre></div>
<p>UPDATE: Per a comment below, you can get the hierarchy directly</p>
<div class="highlight"><pre><span></span><code> lein deps :tree
</code></pre></div>
<p>with 50% less typing. Those of us who got A's in 8th-grade typing class may have mixed
feeligns about this.</p>
<p>After downloading lots of <code>pom.xml</code>s, this will print out a nice tree structure, the relevant bit for me was</p>
<div class="highlight"><pre><span></span><code><span class="o">[</span><span class="n">INFO</span><span class="o">]</span><span class="w"> </span><span class="o">+-</span><span class="w"> </span><span class="n">org</span><span class="p">.</span><span class="nl">clojure</span><span class="p">:</span><span class="n">core</span><span class="p">.</span><span class="nl">async</span><span class="p">:</span><span class="nl">jar</span><span class="p">:</span><span class="mf">0.1.303.0</span><span class="o">-</span><span class="mi">886421</span><span class="o">-</span><span class="nl">alpha</span><span class="p">:</span><span class="n">compile</span><span class="w"></span>
<span class="o">[</span><span class="n">INFO</span><span class="o">]</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="err">\</span><span class="o">-</span><span class="w"> </span><span class="n">org</span><span class="p">.</span><span class="nl">clojure</span><span class="p">:</span><span class="n">tools</span><span class="p">.</span><span class="n">analyzer</span><span class="p">.</span><span class="nl">jvm</span><span class="p">:</span><span class="nl">jar</span><span class="p">:</span><span class="mf">0.1.0</span><span class="o">-</span><span class="nl">beta12</span><span class="p">:</span><span class="n">compile</span><span class="w"></span>
<span class="o">[</span><span class="n">INFO</span><span class="o">]</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="o">+-</span><span class="w"> </span><span class="n">org</span><span class="p">.</span><span class="nl">clojure</span><span class="p">:</span><span class="n">tools</span><span class="p">.</span><span class="nl">analyzer</span><span class="p">:</span><span class="nl">jar</span><span class="p">:</span><span class="mf">0.1.0</span><span class="o">-</span><span class="nl">beta12</span><span class="p">:</span><span class="n">compile</span><span class="w"></span>
<span class="o">[</span><span class="n">INFO</span><span class="o">]</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="o">+-</span><span class="w"> </span><span class="n">org</span><span class="p">.</span><span class="nl">clojure</span><span class="p">:</span><span class="n">core</span><span class="p">.</span><span class="nl">memoize</span><span class="p">:</span><span class="nl">jar</span><span class="p">:</span><span class="mf">0.5.6</span><span class="err">:</span><span class="n">compile</span><span class="w"></span>
<span class="o">[</span><span class="n">INFO</span><span class="o">]</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="err">\</span><span class="o">-</span><span class="w"> </span><span class="n">org</span><span class="p">.</span><span class="nl">clojure</span><span class="p">:</span><span class="n">core</span><span class="p">.</span><span class="nl">cache</span><span class="p">:</span><span class="nl">jar</span><span class="p">:</span><span class="mf">0.6.3</span><span class="err">:</span><span class="n">compile</span><span class="w"></span>
<span class="o">[</span><span class="n">INFO</span><span class="o">]</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="err">\</span><span class="o">-</span><span class="w"> </span><span class="n">org</span><span class="p">.</span><span class="nl">clojure</span><span class="p">:</span><span class="k">data</span><span class="p">.</span><span class="n">priority</span><span class="o">-</span><span class="k">map</span><span class="err">:</span><span class="nl">jar</span><span class="p">:</span><span class="mf">0.0.2</span><span class="err">:</span><span class="n">compile</span><span class="w"></span>
</code></pre></div>
<p>which meant that my <code>[core.cache "0.6.5-pnf-SNAPSHOT"]</code> was made irrelevant. To fix the issue, I added an <code>:exclusions</code>
block to my <code>core.async</code> dependency```:</p>
<div class="highlight"><pre><span></span><code> [org.clojure/core.async "0.1.303.0-886421-alpha"
:exclusions [[org.clojure/core.cache]]]
</code></pre></div>
<p>Now, <strong>my</strong> <code>core.cache</code> is the only one in the class-path, and all is well.</p>
<p>Wish my <a href="http://dev.clojure.org/jira/browse/CCACHE-35">JIRA enchancement</a> good luck!</p>Beauty and the Beast - Distributed functional programming, part 12014-08-15T00:00:00-04:002014-08-15T00:00:00-04:00Peter Fraenkeltag:blog.podsnap.com,2014-08-15:/girder.html<h3>Update 2015-01-12</h3>
<p>The algorithm as it exists in HEAD is somewhat different from the below,
in ways that I'll describe (eventually) in an another post. In some ways, it's
closer to Fork-Join, but with important differences to support reentrancy,
share results of duplicate requests and adjust for the costs of distribution.</p>
<h3>OSS and Commercial Grids</h3>
<p><a href="https://en.wikipedia.org/wiki/Grid_computing">Grid computing</a> has
always suffered the reputation of a buzzword that one suspects might
not actually mean anything, but it has become especially ill-defined
with the rise of open-source distributed computation frameworks like
<a href="http://spark.apache.org/">Spark</a>, <a href="http://stormcomputing.org/">Storm</a>
and Grampa <a href="https://hadoop.apache.org/">Hadoop</a>. These extensively documented
systems don't need much …</p><h3>Update 2015-01-12</h3>
<p>The algorithm as it exists in HEAD is somewhat different from the below,
in ways that I'll describe (eventually) in an another post. In some ways, it's
closer to Fork-Join, but with important differences to support reentrancy,
share results of duplicate requests and adjust for the costs of distribution.</p>
<h3>OSS and Commercial Grids</h3>
<p><a href="https://en.wikipedia.org/wiki/Grid_computing">Grid computing</a> has
always suffered the reputation of a buzzword that one suspects might
not actually mean anything, but it has become especially ill-defined
with the rise of open-source distributed computation frameworks like
<a href="http://spark.apache.org/">Spark</a>, <a href="http://stormcomputing.org/">Storm</a>
and Grampa <a href="https://hadoop.apache.org/">Hadoop</a>. These extensively documented
systems don't need much explanation from me, other than to note that
they all basically pump data through a pre-designed graph, whose topology
is often defined by map-reduce keys extracted from the input data.</p>
<p>Traditional, commercial grid products like
<a href="http://www.tibco.com/products/cloud">Tibco Cloud, né DataSynapse</a>,
and
<a href="http://www-03.ibm.com/systems/platformcomputing/products/symphony/">Symphony</a>
are adapting, at least in terms of marketing, but they still suffer
from a general lack of understanding about what they're supposed to do.</p>
<p>The way I've observed DataSynapse and Symphony in action is as a kind
of fancy <a href="https://en.wikipedia.org/wiki/Remote_Shell">rsh</a>. For
example,
<a href="https://en.wikipedia.org/wiki/All_persons_fictitious_disclaimer">Behemoth Incorporated</a>
has a vast collection of servers, of wide-ranging vintage, running
different operating systems, financed through complex amortization
that makes ownership difficult to establish, and in wildly-fluctuating
demand on an unsteady schedule that generally clusters around the
end-of-day in various global business centers. Departments within BI
have huge numbers of batch jobs, of hypothetically varying priority
but in practice all deemed urgent in the extreme. These batch jobs
generally involve running an executable that reads a bunch of stuff
from a database (loosely defined), chews cpu for O(10-100) minutes and puts its output somewhere. The job of
the grid scheduler is to find a places to where these jobs will be able to run
and do some kind of tracking of whether they actually ran. Part of finding places
to run is what's known as "cycle scavenging," which means about what you
think it would but often meets resistance from the scavengees.</p>
<p>Sarcasm notwithstanding, these products are solving problems that really
exist, even if that existence tends to reflect organizational choices, with which
it is not always entirely impossible to find fault.</p>
<h3>But what about beauty?</h3>
<p>Consider another computing paradigm, where we write pure functions
that call other pure functions. This blisteringly novel idea doesn't
yet have a name, but for now let's call it <strong>Functional
Programming</strong>. With this "FP" thing that
I, myself, invented right now,<sup id="fnref:paranoia"><a class="footnote-ref" href="#fn:paranoia">1</a></sup>
we may write beautiful
programs that do complicated work but can still be understood easily,
without having to consider a thousand possible ways in which shared
state might be mucked up by some process we forgot about. As always, there's a
"graph" beneath it all, but it emerges naturally from the stack of function calls,
and we almost don't need to think about it.</p>
<p>Complications arise, however, when our computer is insufficiently speedy, and
we find ourselves needing to distribute the load across multiple threads or
even multiple machines, and the paradigms to deal with these complications tend
to make our programs less beautiful. Leaving aside the huge cognitive dissonance
that untrained individuals
experience when trying to use the word <a href="http://www.reactivemanifesto.org/">manifesto</a>
unironically, the <a href="https://en.wikipedia.org/wiki/Actor_model">actor model</a>, even in
<a href="http://akka.io/">a truly elegant implementation</a>, is not as beautiful as
the functional model. It's just not. Similarly, but, for me, more
painful to admit,
<a href="https://en.wikipedia.org/wiki/Communicating_sequential_processes">CSP</a>,
especially as embodied on Clojure by
<a href="https://clojure.github.io/core.async/">core.async</a>, is, while dazzling, not beautiful either.</p>
<p>Under an aesthetically caring god, we would not have
to re-imagine simple nested function calls as a flurry of notes, passed among
<a href="https://en.wikipedia.org/wiki/Homunculus">mechanical</a>
<a href="https://www.youtube.com/watch?v=2eKzMwPkt04">homunculi</a>.</p>
<h3>Girder</h3>
<p>I have been playing with an approach for distributed computing that, while implemented
using a variety of the techniques I have deemed unbeautiful, exposes an interface that
comes a bit closer to Plain Old FP.
With some limitations, you write code that looks a lot like POFP except that some of the
functions you call happen to get executed remotely.</p>
<p>The main design goals and assumptions were these:</p>
<ul>
<li>Calculation requests should be built with ordinary clojure functions and values.</li>
<li>Calculations are referentially transparent and idempotent, implying</li>
<li>we can cache results</li>
<li>repeating the evaluation of one will at worst result in unnecessary work</li>
<li>it is valid and safe to restart the system and make the initial requests again.</li>
<li>Support <em>emergent graphs</em>, i.e. computations running on the grid may
request and use the results of other computations on the grid,
but we don't claim to know the data-flow graph in advance or specify it any way other than through the stack of function calls. </li>
<li>We must therefore fully support re-entrant requests. When
functions make their own requests, we must not risk grid
"starvation" (where requests cannot be processed because all workers
claim to be busy, when they are in fact in a wait state),
but we must do so efficiently and with predictable load.</li>
<li>Support any emergent <em>directed acyclic</em> graph, i.e. different requests may make the
same further request, and they should be able share a memoized result.</li>
<li>One would generally prefer that re-entrant requests do <em>not</em> get
executed very remotely, but we want the grid to "help out" if we get
overburdened.</li>
<li>The system should operate at high throughput. Aim for several ms overhead in
a potentially remote call.</li>
<li>You're responsible for distributing jar files. We're not solving the code serialization
problem.</li>
<li>It is safe for any member of the grid to die and/or be restarted (though the process of recovery
is not something I've yet automated).</li>
</ul>
<p>Implementation also embodies a few central principles.</p>
<ul>
<li>Interface is generally via <code>core.async</code>. Requests for computations
return channels, which will eventually deliver results.</li>
<li>We rely internally on <code>async</code> to coordinate data flow without directly
playing with thread pools, thus references below to "blocking" are really "parking".</li>
<li>There will be a central statekeeper (Redis, at this point),
easing reliable coordination and facilitating reporting on the state of the system.</li>
</ul>
<h3>Topology of compute nodes</h3>
<p>While the topology of data flow is determined entirely by function execution, there is a defined
topology of machinery to execute those functions. I'll define a <strong>node</strong> as a single threaded
process that knows how to do something. Every node has a <strong>work queue</strong>, onto which <strong>requests</strong> can be
placed. There is also a central <strong>back end</strong>, which knows what's on all the queues as well as the
current status of any request.</p>
<h4>Worker Nodes</h4>
<p>unsurprisingly, do most of the work. They</p>
<ol>
<li>Have a name, like "w1".</li>
<li>Have one designated <strong>distributor node</strong>, which may be so designated by many worker node.</li>
<li>Pop requests only from their queue.</li>
<li>Execute these requests, which may themselves make requests reentrantly.</li>
<li>Push reentrant grid requests back onto their queue.</li>
<li>After making re-entrant requests, work through own queue (which will generally contain
the requests they just made).</li>
<li>Ignore work marked done or in progress but somehow acquire the results a synchronously. </li>
<li>When free, push themselves onto the <strong>volunteer queue</strong> of their distributor.</li>
</ol>
<h4>Distributor nodes</h4>
<ol>
<li>Act as a <strong>pool</strong> for worker nodes that designate them.</li>
<li>Have a name, like "pool".</li>
<li>Have, as just noted, a volunteer queue as well as a work queue.</li>
<li>Pop work requests from own work queue, discarding work that is already finished or in progress, and
pop volunteer requests from their volunteer queue, blocking
until they have at least one of each.</li>
<li>Push the work request down to the volunteer.</li>
<li>May have their own designated distributor node, at which they volunteer when not busy.</li>
<li>Periodically (eg. every 100ms) <em>copy</em> the the entirety of the work queues of all nodes
for which they act as distributor and push the requests onto their own work queue.</li>
</ol>
<h4>Back end</h4>
<ol>
<li>Maintains all the queues.</li>
<li>Maintains the <strong>state</strong> of every request, e.g. new, running or completed.</li>
<li>Maintains the cached value of every completed request.</li>
<li>Handles distribution of results</li>
<li>Is, as I said, at this point Redis, but doesn't have to be.</li>
</ol>
<p>Typical use would be to make top-level requests at a distributor pool, where they would be doled out to
idle workers that had volunteered. If workers themselves make requests, their queues may elongate, in which
case the requests will be hoisted up the distribution chain, to be pushed back down should anyone else volunteer.</p>
<h3>Requests</h3>
<p>A request in girder is a finite sequence, whose first element is a function and all of whose subsequent
elements must be serializable. At its simplest, a request doesn't look much different from a function call,
and, indeed,</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="nf">request</span> <span class="s">"pool"</span> <span class="p">(</span><span class="nb">inc </span><span class="mi">5</span><span class="p">))</span>
</code></pre></div>
<p>will duly return <code>6</code>, though the execution could have occurred at any worker in the pool
(or in one of the pool's pools, etc).</p>
<p>The <code>request</code> macro hides a lot of complexity, expanding the above into something like</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="nf">let*</span> <span class="p">[</span><span class="nv">req</span> <span class="p">[</span><span class="nb">+ </span><span class="mi">1</span> <span class="mi">2</span><span class="p">]</span>
<span class="nv">res</span> <span class="p">(</span><span class="nf"><!!</span> <span class="p">(</span><span class="nf">grid/enqueue</span> <span class="s">"pool"</span> <span class="nv">req</span><span class="p">))]</span>
<span class="p">(</span><span class="nb">if-not </span><span class="p">(</span><span class="ss">:error</span> <span class="nv">res</span><span class="p">)</span>
<span class="p">(</span><span class="ss">:value</span> <span class="nv">res</span><span class="p">)</span>
<span class="p">(</span><span class="nf">throw</span> <span class="p">(</span><span class="nf">ex-info</span> <span class="s">"Error during grid execution"</span> <span class="nv">res</span><span class="p">))))</span>
</code></pre></div>
<p>where <code>enqueue</code> is responsible for serializing the request into something remotely
detangleable (in this case, <code>"(\"clojure.core/+\" 1 2)"</code>), pushing it onto the
appropriate queue and subscribing to results matching this request.</p>
<p>One of Girder's main design goals is that a remotely executing function should be allowed to remotely
execute other functions. We want this to happen without gorging on threads, and we want to deal gracefully
with errors that may occur far down the call chain. To achieve this, the fundamental computational
unit is a function that</p>
<ol>
<li>Does its calculation in a <code>go</code> block.</li>
<li>Returns a channel to which results will eventually be delivered.</li>
<li>Wraps those results in a structure that allows us to differentiate success and failure, and preserve error information.</li>
</ol>
<p>In the example above, <code>+</code> gets wrapped up automatically, but in general we'll want
a macro to facilitate creation of such functions:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defmacro </span><span class="nv">cdefn</span> <span class="p">[</span><span class="nv">fun</span> <span class="nv">args</span> <span class="o">&</span> <span class="nv">forms</span><span class="p">]</span>
<span class="o">`</span><span class="p">(</span><span class="kd">defn </span><span class="o">~</span><span class="p">(</span><span class="nf">vary-meta</span> <span class="nv">fun</span> <span class="nb">assoc </span><span class="ss">:girded</span> <span class="nv">true</span><span class="p">)</span> <span class="o">~</span><span class="nv">args</span>
<span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">c#</span> <span class="p">(</span><span class="nf">chan</span><span class="p">)]</span>
<span class="p">(</span><span class="nf">go</span>
<span class="p">(</span><span class="nf">>!</span> <span class="nv">c#</span>
<span class="p">(</span><span class="nf">try</span> <span class="p">{</span><span class="ss">:value</span> <span class="p">(</span><span class="k">do </span><span class="o">~@</span><span class="nv">forms</span><span class="p">)}</span>
<span class="p">(</span><span class="nf">catch</span> <span class="nv">Exception</span> <span class="nv">e#</span> <span class="p">{</span><span class="ss">:error</span> <span class="p">(</span><span class="nf">format-exception</span> <span class="nv">e#</span><span class="p">)})))</span>
<span class="p">(</span><span class="nf">close!</span> <span class="nv">c#</span><span class="p">))</span>
<span class="nv">c#</span><span class="p">)))</span>
</code></pre></div>
<p>which I can use in this a contrived example, to define several financial instruments around
based on a criterion (e.g. a ticker) and then calculate their prices in parallel:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="nf">cdefn</span> <span class="nv">calc-price</span> <span class="p">[</span><span class="nv">deets</span><span class="p">]</span> <span class="p">(</span><span class="nf">calculate-something-expensive</span> <span class="nv">deets</span><span class="p">))</span>
<span class="p">(</span><span class="nf">cdefn</span> <span class="nv">calc-portfolio</span> <span class="p">[</span><span class="nv">spec</span><span class="p">]</span>
<span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">reqs</span> <span class="p">[[</span><span class="nv">calc-price</span> <span class="p">(</span><span class="nf">munge</span> <span class="nv">spec</span><span class="p">)]</span> <span class="p">[</span><span class="nv">calc-price</span> <span class="p">(</span><span class="nf">diddle</span> <span class="nv">spec</span><span class="p">)]]]</span>
<span class="p">(</span><span class="nb">reduce + </span><span class="p">(</span><span class="nf">requests</span> <span class="nv">reqs</span><span class="p">))))</span>
<span class="p">(</span><span class="nf">request</span> <span class="s">"pool"</span> <span class="p">(</span><span class="nf">calc-portfolio</span> <span class="s">"xyz"</span><span class="p">))</span>
</code></pre></div>
<p>The innocuous looking <code>requests</code> in <code>calc-portfolio</code>, after passing through a few more
macros, eventually calls and waits on (with some simplifications for clarity):</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">enqueue-reentrant</span> <span class="p">[</span><span class="nv">reqs</span> <span class="nv">nodeid</span> <span class="nv">reqchan</span><span class="p">]</span>
<span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">out</span> <span class="p">(</span><span class="nf">chan</span><span class="p">)]</span>
<span class="p">(</span><span class="nf">go</span>
<span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">results</span> <span class="p">(</span><span class="nf">async/map</span> <span class="nb">vector </span><span class="p">(</span><span class="nb">map </span><span class="o">#</span><span class="p">(</span><span class="nf">enqueue</span> <span class="nv">nodeid</span> <span class="nv">%</span><span class="p">)</span> <span class="nv">reqs</span><span class="p">))]</span>
<span class="p">(</span><span class="nf">async/go-loop</span> <span class="p">[]</span>
<span class="p">(</span><span class="k">let </span><span class="p">[[</span><span class="nv">v</span> <span class="nv">c</span><span class="p">]</span> <span class="p">(</span><span class="nf">async/alts!</span> <span class="p">[</span><span class="nv">results</span> <span class="nv">reqchan</span><span class="p">])]</span>
<span class="p">(</span><span class="k">if </span><span class="p">(</span><span class="nb">= </span><span class="nv">c</span> <span class="nv">results</span><span class="p">)</span>
<span class="p">(</span><span class="k">do </span><span class="p">(</span><span class="nf">>!</span> <span class="nv">out</span> <span class="nv">v</span><span class="p">)</span> <span class="p">(</span><span class="nf">close!</span> <span class="nv">c</span><span class="p">)</span> <span class="p">(</span><span class="nf">close!</span> <span class="nv">out</span><span class="p">)</span>
<span class="p">(</span><span class="k">do </span><span class="p">(</span><span class="nf"><!</span> <span class="p">(</span><span class="nf">process-reqid</span> <span class="nv">nodeid</span> <span class="nv">reqchan</span> <span class="nv">v</span><span class="p">))</span>
<span class="p">(</span><span class="nf">recur</span><span class="p">))))))))</span>
<span class="nv">out</span><span class="p">))</span>
</code></pre></div>
<p>This does two main things:</p>
<ol>
<li>Enqueue the requests at our local <code>nodeid</code>, waiting for on
channel <code>results</code> for a vector of answers.</li>
<li>Use <code>alts!</code> to pluck from <code>reqchan</code> any other requests that might be on
this worker's queue, and process them until we get back the <code>results</code>.</li>
</ol>
<p>Here, we benefit from the fact that the original <code>cdefn</code> put us in a <code>go</code> block,
so we can continue processing requests while waiting for an answer. These requests
may well be the ones that we just enqueued, though, depending on how long all these takes,
we may find that they've already been hoisted up by a distributor node and doled out to
someone else.</p>
<p><code>process-reqid</code> is where the calculation actually occurs, but explaining it will
require going over some other internals, which will happen in part 2 of this post.</p>
<p>Here's an example of how errors get caught:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="n">cdefn</span> <span class="n">divide</span> <span class="p">[</span><span class="n">x</span> <span class="n">y</span><span class="p">]</span> <span class="p">(</span><span class="kt">float</span> <span class="p">(</span><span class="o">/</span> <span class="n">x</span> <span class="n">y</span><span class="p">)))</span>
<span class="p">(</span><span class="n">cdefn</span> <span class="n">ratio</span> <span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="p">(</span><span class="n">request</span> <span class="p">(</span><span class="n">divide</span> <span class="n">i</span> <span class="p">(</span><span class="n">dec</span> <span class="n">i</span><span class="p">))))</span>
</code></pre></div>
<p>While <code>(request "pool" (ratio 5))</code> returns 1.25,
<code>(request "pool" (ratio 1))</code> throws (somewhat redacted) <code>ex-info</code>:</p>
<div class="highlight"><pre><span></span><code> <span class="p">{</span><span class="ss">:error</span>
<span class="p">{</span><span class="ss">:info</span>
<span class="p">{</span><span class="s">"(\"acyclic.girder.testscripts.boffo/divide\" 1 0)"</span>
<span class="p">{</span><span class="ss">:req</span> <span class="s">"(\"acyclic.girder.testscripts.boffo/divide\" 1 0)"</span>,
<span class="ss">:msg</span> <span class="s">"Divide by zero"</span>,
<span class="ss">:stack</span>
<span class="p">[</span><span class="s">"java.lang.ArithmeticException: Divide by zero"</span>
<span class="s">"clojure.lang.Numbers.divide(Numbers.java:156)"</span>
<span class="nv">...</span>
<span class="s">"java.lang.Thread.run(Thread.java:722)"</span><span class="p">]}}</span>,
<span class="ss">:req</span> <span class="s">"(\"acyclic.girder.testscripts.boffo/ratio\" 1)"</span>,
<span class="ss">:msg</span> <span class="s">"Error from request"</span>,
<span class="ss">:stack</span> <span class="p">[</span><span class="nv">....</span><span class="p">]}}</span>
</code></pre></div>
<p>so we know the error happened in <code>divide</code>.</p>
<p>Finally, here's an example where requests themselves make requests, some of which
might be made by other requests, and the subrequests results just get thrown into
a result string:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="nf">cdefn</span> <span class="nv">bogosity</span> <span class="p">[</span><span class="nv">msec</span> <span class="nv">jobnum</span> <span class="nv">reclevel</span> <span class="nv">numrecjobs</span> <span class="nv">args</span><span class="p">]</span>
<span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">reqs</span> <span class="p">(</span><span class="nb">map </span><span class="o">#</span><span class="p">(</span><span class="nb">vector </span><span class="nv">bogosity</span> <span class="nv">msec</span> <span class="nv">%</span> <span class="p">(</span><span class="nb">dec </span><span class="nv">reclevel</span><span class="p">)</span> <span class="nv">numrecjobs</span> <span class="nv">args</span><span class="p">)</span> <span class="p">(</span><span class="nb">range </span><span class="nv">reclevel</span><span class="p">))</span>
<span class="nv">vs</span> <span class="p">(</span><span class="nf">requests</span> <span class="nv">reqs</span><span class="p">)]</span>
<span class="p">(</span><span class="nf">Thread/sleep</span> <span class="nv">msec</span><span class="p">)</span>
<span class="p">(</span><span class="nb">str </span><span class="s">"Bogosity:"</span> <span class="nv">*nodeid*</span> <span class="s">":"</span> <span class="nv">jobnum</span> <span class="s">":"</span> <span class="nv">msec</span> <span class="s">":"</span> <span class="nv">reclevel</span> <span class="s">":"</span> <span class="nv">args</span> <span class="s">":["</span> <span class="p">(</span><span class="nf">clojure.string/join</span> <span class="s">","</span> <span class="p">(</span><span class="nb">map str </span><span class="nv">vs</span><span class="p">))</span> <span class="s">"]"</span><span class="p">)))</span>
</code></pre></div>
<p>The <code>reclevel</code> argument indicates how many levels of sub-requests to spawn
As you see, <code>cdefn</code> requests do have access to a locally bound
variable <code>*nodeid*</code>, indicating where the execution is taking place. This should be used for only debugging, but here
I've actually broken referential transparency by putting nondeterministic values into the results. That
necessitated the use of the last argument, just to differentiate multiple tests and thwart the cache.</p>
<p>Launching a whole bunch of these:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="nf">requests</span> <span class="s">"pool"</span> <span class="p">(</span><span class="nb">map </span><span class="o">#</span><span class="p">(</span><span class="nb">vector </span><span class="nv">bogosity</span> <span class="mi">1</span> <span class="nv">%</span> <span class="mi">3</span> <span class="mi">5</span> <span class="mi">222</span><span class="p">)</span> <span class="p">(</span><span class="nb">range </span><span class="mi">10</span><span class="p">)))</span>
</code></pre></div>
<p>eventually gets me</p>
<div class="highlight"><pre><span></span><code><span class="p">[{</span><span class="ss">:value</span> <span class="s">"Bogosity:w2:0:1:3:222:[Bogosity:w2:0:1:2:222:[Bogosity:w1:0:1:1:222:[Bogosity:w2:0:1:0:222:[]],Bogosity:w2:1:1:1:222:[Bogosity:w2:0:1:0:222:[]]],Bogosity:w1:1:1:2:222:[Bogosity:w1:0:1:1:222:[Bogosity:w2:0:1:0:222:[]],Bogosity:w2:1:1:1:222:[Bogosity:w2:0:1:0:222:[]]],Bogosity:w2:2:1:2:222:[Bogosity:w1:0:1:1:222:[Bogosity:w2:0:1:0:222:[]],Bogosity:w2:1:1:1:222:[Bogosity:w2:0:1:0:222:[]]]]"</span><span class="p">}</span> <span class="p">{</span><span class="ss">:value</span> <span class="s">"Bogosity:w1:1:1:3:222:[Bogosity:w2:0:1:2:222:[Bogosity:w1:0:1:1:222:[Bogosity:w2:0:1:0:222:[]],Bogosity:w2:1:1:1:222:[Bogosity:w2:0:1:0:222:[]]],Bogosity:w1:1:1:2:222:[Bogosity:w1:0:1:1:222:[Bogosity:w2:0:1:0:222:[]],Bogosity:w2:1:1:1:222:[Bogosity:w2:0:1:0:222:[]]],Bogosity:w2:2:1:2:222:[Bogosity:w1:0:1:1:222:[Bogosity:w2:0:1:0:222:[]],Bogosity:w2:1:1:1:222:[Bogosity:w2:0:1:0:222:[]]]]"</span><span class="p">}</span> <span class="p">{</span><span class="ss">:value</span> <span class="s">"Bogosity:w2:2:1:3:222:[Bogosity:w1:0:1:2:222:[Bogosity:w1:0:1:1:222:[Bogosity:w2:0:1:0:222:[]],Bogosity:w2:1:1:1:222:[Bogosity:w2:0:1:0:222:[]]],Bogosity:w1:1:1:2:222:[Bogosity:w1:0:1:1:222:[Bogosity:w2:0:1:0:222:[]],Bogosity:w2:1:1:1:222:[Bogosity:w2:0:1:0:222:[]]],Bogosity:w2:2:1:2:222:[Bogosity:w1:0:1:1:222:[Bogosity:w2:0:1:0:222:[]],Bogosity:w2:1:1:1:222:[Bogosity:w2:0:1:0:222:[]]]]"</span><span class="p">}</span> <span class="p">{</span><span class="ss">:value</span> <span class="s">"Bogosity:w1:3:1:3:222:[Bogosity:w1:0:1:2:222:[Bogosity:w1:0:1:1:222:[Bogosity:w2:0:1:0:222:[]],Bogosity:w2:1:1:1:222:[Bogosity:w2:0:1:0:222:[]]],Bogosity:w1:1:1:2:222:[Bogosity:w1:0:1:1:222:[Bogosity:w2:0:1:0:222:[]],Bogosity:w2:1:1:1:222:[Bogosity:w2:0:1:0:222:[]]],Bogosity:w2:2:1:2:222:[Bogosity:w1:0:1:1:222:[Bogosity:w2:0:1:0:222:[]],Bogosity:w2:1:1:1:222:[Bogosity:w2:0:1:0:222:[]]]]"</span><span class="p">}</span> <span class="p">{</span><span class="ss">:value</span> <span class="s">"Bogosity:w2:4:1:3:222:[Bogosity:w1:0:1:2:222:[Bogosity:w1:0:1:1:222:[Bogosity:w2:0:1:0:222:[]],Bogosity:w2:1:1:1:222:[Bogosity:w2:0:1:0:222:[]]],Bogosity:w1:1:1:2:222:[Bogosity:w1:0:1:1:222:[Bogosity:w2:0:1:0:222:[]],Bogosity:w2:1:1:1:222:[Bogosity:w2:0:1:0:222:[]]],Bogosity:w2:2:1:2:222:[Bogosity:w1:0:1:1:222:[Bogosity:w2:0:1:0:222:[]],Bogosity:w2:1:1:1:222:[Bogosity:w2:0:1:0:222:[]]]]"</span><span class="p">}</span> <span class="p">{</span><span class="ss">:value</span> <span class="s">"Bogosity:w1:5:1:3:222:[Bogosity:w1:0:1:2:222:[Bogosity:w1:0:1:1:222:[Bogosity:w2:0:1:0:222:[]],Bogosity:w2:1:1:1:222:[Bogosity:w2:0:1:0:222:[]]],Bogosity:w1:1:1:2:222:[Bogosity:w1:0:1:1:222:[Bogosity:w2:0:1:0:222:[]],Bogosity:w2:1:1:1:222:[Bogosity:w2:0:1:0:222:[]]],Bogosity:w2:2:1:2:222:[Bogosity:w1:0:1:1:222:[Bogosity:w2:0:1:0:222:[]],Bogosity:w2:1:1:1:222:[Bogosity:w2:0:1:0:222:[]]]]"</span><span class="p">}</span> <span class="p">{</span><span class="ss">:value</span> <span class="s">"Bogosity:w2:6:1:3:222:[Bogosity:w1:0:1:2:222:[Bogosity:w1:0:1:1:222:[Bogosity:w2:0:1:0:222:[]],Bogosity:w2:1:1:1:222:[Bogosity:w2:0:1:0:222:[]]],Bogosity:w1:1:1:2:222:[Bogosity:w1:0:1:1:222:[Bogosity:w2:0:1:0:222:[]],Bogosity:w2:1:1:1:222:[Bogosity:w2:0:1:0:222:[]]],Bogosity:w2:2:1:2:222:[Bogosity:w1:0:1:1:222:[Bogosity:w2:0:1:0:222:[]],Bogosity:w2:1:1:1:222:[Bogosity:w2:0:1:0:222:[]]]]"</span><span class="p">}</span> <span class="p">{</span><span class="ss">:value</span> <span class="s">"Bogosity:w1:7:1:3:222:[Bogosity:w1:0:1:2:222:[Bogosity:w1:0:1:1:222:[Bogosity:w2:0:1:0:222:[]],Bogosity:w2:1:1:1:222:[Bogosity:w2:0:1:0:222:[]]],Bogosity:w1:1:1:2:222:[Bogosity:w1:0:1:1:222:[Bogosity:w2:0:1:0:222:[]],Bogosity:w2:1:1:1:222:[Bogosity:w2:0:1:0:222:[]]],Bogosity:w2:2:1:2:222:[Bogosity:w1:0:1:1:222:[Bogosity:w2:0:1:0:222:[]],Bogosity:w2:1:1:1:222:[Bogosity:w2:0:1:0:222:[]]]]"</span><span class="p">}</span> <span class="p">{</span><span class="ss">:value</span> <span class="s">"Bogosity:w1:8:1:3:222:[Bogosity:w1:0:1:2:222:[Bogosity:w1:0:1:1:222:[Bogosity:w2:0:1:0:222:[]],Bogosity:w2:1:1:1:222:[Bogosity:w2:0:1:0:222:[]]],Bogosity:w1:1:1:2:222:[Bogosity:w1:0:1:1:222:[Bogosity:w2:0:1:0:222:[]],Bogosity:w2:1:1:1:222:[Bogosity:w2:0:1:0:222:[]]],Bogosity:w2:2:1:2:222:[Bogosity:w1:0:1:1:222:[Bogosity:w2:0:1:0:222:[]],Bogosity:w2:1:1:1:222:[Bogosity:w2:0:1:0:222:[]]]]"</span><span class="p">}</span> <span class="p">{</span><span class="ss">:value</span> <span class="s">"Bogosity:w2:9:1:3:222:[Bogosity:w1:0:1:2:222:[Bogosity:w1:0:1:1:222:[Bogosity:w2:0:1:0:222:[]],Bogosity:w2:1:1:1:222:[Bogosity:w2:0:1:0:222:[]]],Bogosity:w1:1:1:2:222:[Bogosity:w1:0:1:1:222:[Bogosity:w2:0:1:0:222:[]],Bogosity:w2:1:1:1:222:[Bogosity:w2:0:1:0:222:[]]],Bogosity:w2:2:1:2:222:[Bogosity:w1:0:1:1:222:[Bogosity:w2:0:1:0:222:[]],Bogosity:w2:1:1:1:222:[Bogosity:w2:0:1:0:222:[]]]]"</span><span class="p">}]</span>
</code></pre></div>
<p>where you see that execution was spread across two workers, lots of recursion occurred, and somehow the two workers managed to stay busy
calculating sub-requests, even while apparently blocking on them.</p>
<h3>But how does it work?</h3>
<p>That's a secret, until my next post, but it will revolve around the mysterious <code>enqueue</code> and <code>process-reqid</code> methods.
We'll also go into the way we use <code>core.async</code> to interact with Redis, try it out on a hundred AWS machines and go into
some of the other "tricks" involved in implementing all this. That sounds like a lot, so maybe there will be three posts altogether.</p>
<div class="footnote">
<hr>
<ol>
<li id="fn:paranoia">
<p>As with most of my original ideas, this one is bound to be stolen by others, who cunningly post-date their books and papers to make it look like they came first. <a class="footnote-backref" href="#fnref:paranoia" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
</ol>
</div>Yeah, yeah, I should have used perl2014-08-12T00:00:00-04:002014-08-12T00:00:00-04:00Peter Fraenkeltag:blog.podsnap.com,2014-08-12:/slackbot2.html<h2>Overkill</h2>
<p>After reading my <a href="http://podsnap.com/slackbot.html">compelling post</a> on a clojure Slack-bot,
an astute reader<sup id="fnref:byme"><a class="footnote-ref" href="#fn:byme">1</a></sup> pointed out that using the fullblown apparatus of compojure, jetty, jvm, etc. for something this silly
is really only justified when the entire purpose of the exercise is procrastination. Well, dear astute reader, that was the
entire purpose, but you're right. Ok, here it is in perl,</p>
<div class="highlight"><pre><span></span><code> <span class="c1">#!/usr/bin/env perl</span>
<span class="k">use</span> <span class="nn">Mojolicious::Lite</span><span class="p">;</span>
<span class="k">use</span> <span class="nn">File::Slurp</span><span class="p">;</span>
<span class="k">my</span> <span class="nv">$token</span> <span class="o">=</span> <span class="n">read_file</span><span class="p">(</span><span class="s">"TOKEN"</span><span class="p">);</span>
<span class="k">my</span> <span class="nv">@lines</span> <span class="o">=</span> <span class="nb">split</span><span class="p">(</span><span class="sr">/\0/</span><span class="p">,</span><span class="n">read_file</span><span class="p">(</span><span class="s">"LINES"</span><span class="p">));</span>
<span class="n">post</span> <span class="s">'/slacks'</span> <span class="o">=></span> <span class="k">sub</span> <span class="p">{</span>
<span class="k">my</span> <span class="nv">$c</span> <span class="o">=</span> <span class="nb">shift</span><span class="p">;</span>
<span class="k">my</span> <span class="nv">$t</span> <span class="o">=</span> <span class="nv">$c</span><span class="o">-></span><span class="n">param</span><span class="p">(</span><span class="s">'token'</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="nv">$t</span> <span class="ow">eq</span> <span class="nv">$token</span><span class="p">)</span> <span class="p">{</span>
<span class="nv">$c</span><span class="o">-></span><span class="n">render</span><span class="p">(</span><span class="n">json</span> <span class="o">=></span> <span class="p">{</span><span class="n">text …</span></code></pre></div><h2>Overkill</h2>
<p>After reading my <a href="http://podsnap.com/slackbot.html">compelling post</a> on a clojure Slack-bot,
an astute reader<sup id="fnref:byme"><a class="footnote-ref" href="#fn:byme">1</a></sup> pointed out that using the fullblown apparatus of compojure, jetty, jvm, etc. for something this silly
is really only justified when the entire purpose of the exercise is procrastination. Well, dear astute reader, that was the
entire purpose, but you're right. Ok, here it is in perl,</p>
<div class="highlight"><pre><span></span><code> <span class="c1">#!/usr/bin/env perl</span>
<span class="k">use</span> <span class="nn">Mojolicious::Lite</span><span class="p">;</span>
<span class="k">use</span> <span class="nn">File::Slurp</span><span class="p">;</span>
<span class="k">my</span> <span class="nv">$token</span> <span class="o">=</span> <span class="n">read_file</span><span class="p">(</span><span class="s">"TOKEN"</span><span class="p">);</span>
<span class="k">my</span> <span class="nv">@lines</span> <span class="o">=</span> <span class="nb">split</span><span class="p">(</span><span class="sr">/\0/</span><span class="p">,</span><span class="n">read_file</span><span class="p">(</span><span class="s">"LINES"</span><span class="p">));</span>
<span class="n">post</span> <span class="s">'/slacks'</span> <span class="o">=></span> <span class="k">sub</span> <span class="p">{</span>
<span class="k">my</span> <span class="nv">$c</span> <span class="o">=</span> <span class="nb">shift</span><span class="p">;</span>
<span class="k">my</span> <span class="nv">$t</span> <span class="o">=</span> <span class="nv">$c</span><span class="o">-></span><span class="n">param</span><span class="p">(</span><span class="s">'token'</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="nv">$t</span> <span class="ow">eq</span> <span class="nv">$token</span><span class="p">)</span> <span class="p">{</span>
<span class="nv">$c</span><span class="o">-></span><span class="n">render</span><span class="p">(</span><span class="n">json</span> <span class="o">=></span> <span class="p">{</span><span class="n">text</span> <span class="o">=></span> <span class="nv">$lines</span><span class="p">[</span><span class="nb">int</span><span class="p">(</span><span class="nb">rand</span><span class="p">(</span><span class="nv">$#lines</span><span class="o">+</span><span class="mi">1</span><span class="p">))]})</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
<span class="nv">$c</span><span class="o">-></span><span class="n">render</span><span class="p">(</span><span class="n">text</span> <span class="o">=></span> <span class="s">'denied'</span><span class="p">,</span><span class="n">status</span> <span class="o">=></span> <span class="mi">403</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">};</span>
<span class="nn">app</span><span class="o">-></span><span class="n">start</span><span class="p">;</span>
</code></pre></div>
<p>and you run it as <code>./slackbot.pl daemon -l 'http://localhost:3001'</code></p>
<p>You shouldn't need to actually know perl to read this, but if any haters out there want to call it illegible,
I'm happy as long is makes them happy to do so.</p>
<h2>Plump it up</h2>
<p>I could fill up a bit more space by providing counsel on the virtues of <a href="http://perlbrew.pl/">perlbrew</a> and
noting that the main thing that changes in the commando recipe is</p>
<div class="highlight"><pre><span></span><code><span class="nb">export</span> <span class="nv">PERLBREW_ROOT</span><span class="o">=</span><span class="si">${</span><span class="nv">HOME</span><span class="si">}</span>/perl5/perlbrew
<span class="nb">export</span> <span class="nv">PERLBREW_HOME</span><span class="o">=</span>/tmp/.perlbrew
<span class="nb">source</span> <span class="si">${</span><span class="nv">PERLBREW_ROOT</span><span class="si">}</span>/etc/bashrc
Daemon -D <span class="si">${</span><span class="nv">DIR</span><span class="si">}</span> -O <span class="si">${</span><span class="nv">DIR</span><span class="si">}</span>/STDOUT -E <span class="si">${</span><span class="nv">DIR</span><span class="si">}</span>/STDERR -- perlbrew <span class="nb">exec</span> --with perl-5.21.2 ./slackbot.pl daemon -l <span class="s1">'http://localhost:3001'</span>
</code></pre></div>
<p>but I can't think of much else.</p>
<div class="footnote">
<hr>
<ol>
<li id="fn:byme">
<p>Ok, it was me. <a class="footnote-backref" href="#fnref:byme" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
</ol>
</div>Procrastination with a clojure M-x yow slackbot2014-08-11T00:00:00-04:002014-08-11T00:00:00-04:00Peter Fraenkeltag:blog.podsnap.com,2014-08-11:/slackbot.html<h2>Procrastination</h2>
<p>I really should be working on this obscure distributed RDF/FRP thing, but for various
reasons my head isn't working properly write now. So I did this other stupid thing
instead.</p>
<h2>Zippy</h2>
<p>Once upon a time, <code>M-x yow</code> in emacs would deliver a nice random quote
from <a href="http://www.zippythepinhead.com/">Zippy the Pinhead</a>. Nowadays, you just get</p>
<div class="highlight"><pre><span></span><code>Yow! Legally-imposed CULTURE-reduction is CABBAGE-BRAINED!
</code></pre></div>
<p>which has something to do with copyright law. More specifically, the file <code>yow.lines</code>
in emacs' <code>data-directory</code> now contains only the opinion expressed above, rather than
the original seven-hundred or so precious epigrams, delimited by <code>\000</code>.
I have heard dark …</p><h2>Procrastination</h2>
<p>I really should be working on this obscure distributed RDF/FRP thing, but for various
reasons my head isn't working properly write now. So I did this other stupid thing
instead.</p>
<h2>Zippy</h2>
<p>Once upon a time, <code>M-x yow</code> in emacs would deliver a nice random quote
from <a href="http://www.zippythepinhead.com/">Zippy the Pinhead</a>. Nowadays, you just get</p>
<div class="highlight"><pre><span></span><code>Yow! Legally-imposed CULTURE-reduction is CABBAGE-BRAINED!
</code></pre></div>
<p>which has something to do with copyright law. More specifically, the file <code>yow.lines</code>
in emacs' <code>data-directory</code> now contains only the opinion expressed above, rather than
the original seven-hundred or so precious epigrams, delimited by <code>\000</code>.
I have heard dark whispers about
so-called "free thinkers," who have managed to procure an original list and thereby restore
the functionality of yore. These stories may not be entirely apocryphal, as the internet
has a way of not forgetting things.</p>
<p>Of course, you can always create your own <code>yow.lines</code>, with <em>bons mots</em> of your own
device, or of incontrovertible public domain provenance,
and that probably happens too. All of the above
applies, with minor variation, to the unix <code>fortune</code> command as well.</p>
<h2>Slack</h2>
<p><a href="http://slack.com">Slack</a> is a versatile chat system and messaging
system with all sorts of clever archiving and search facilities. It's
better than anything I've used in the corporate world, including
(especially) Lync, and, in truth, I'm finding it hard not to like it
better than systems in the same space written by people I know and
like personally.</p>
<p>I'd never heard of it until I read this
<a href="http://www.wired.com/2014/08/the-most-fascinating-profile-youll-ever-read-about-a-guy-and-his-boring-startup/">Wired article</a>. Slack is the brainchild of the creator of Flickr, and, like
Flickr, you can use it for free, unless you have important business needs, and then you can
pay a lot of money for it. All I wanted to do was recreate the more frivolous aspects
of a chat room at a company where I once worked. Since neither I nor most of the other
participants work there anymore, it would be necessary to set up something independent.
I'd considered Google Hangouts, but, even ignoring the increasingly obviously repellent nature
of a business model built on invasion of privacy, the whole Google+ ecosystem is a hot mess,
with every aspect subject to random deprecation when they realize it doesn't help the
bottom line of serving up advertisements to the poor and ignorant. So Slack it was.</p>
<p>Slack also has a lovely <a href="https://api.slack.com/">API</a>, supporting integration with
practically anything with technological sophistication at or above the level of
<code>curl</code>. From the perspective of the yowserati, one most interesting features is
<a href="https://q-t.slack.com/services/new/outgoing-webhook">outgoing webhooks</a>. Basically,
they let you define words, the use of which on chat channels will trigger <code>POST</code>
requests to a URL of your choice, which can then respond with something appropriate,
wrapped up pretty simply in JSON. The POST message includes a secret token, and the URLs
may be SSL'd, so the whole protocol is reasonably comforting.</p>
<p>Around 4pm yesterday, I decided to drop everything I was doing and set up a slackbot
on a Digital Ocean droplet, powered by a trivial clojure app. I convinced myself that,
in addition to the challenge of doing it as quickly as possible, it would give me a chance
to solidify my knowledge of a technology stack that, while, not particularly complicated,
I've had a tendency to slop my way through, fiddling with this and that until it all works
and then stepping slowly away before anything breaks.</p>
<h2>Stack</h2>
<ol>
<li>Compojure</li>
<li>Clojure</li>
<li>Nginx</li>
<li>Digital Ocean</li>
<li>Commando</li>
</ol>
<h2>Clojure bits</h2>
<p>My "application" differs almost insignificantly from the one you get by typing
<code>lein new compojure-app</code>. The vaguely interesting bits have to do with
data that I don't necessarily want to commit to github, namely the secret token that
Slack will be sending and the database of <a href="http://dictionary.reference.com/browse/shavian">Shavian</a> gems.</p>
<p>So I have a small module:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">ns </span><span class="nv">slacks.quotes</span><span class="p">)</span>
<span class="p">(</span><span class="k">def </span><span class="o">^</span><span class="p">{</span><span class="ss">:private</span> <span class="nv">true</span><span class="p">}</span> <span class="nv">quotes</span> <span class="p">(</span><span class="nf">atom</span> <span class="nv">nil</span><span class="p">))</span>
<span class="p">(</span><span class="k">def </span><span class="o">^</span><span class="p">{</span><span class="ss">:private</span> <span class="nv">true</span><span class="p">}</span> <span class="nv">token</span> <span class="p">(</span><span class="nf">atom</span> <span class="nv">nil</span><span class="p">))</span>
<span class="p">(</span><span class="kd">defn </span><span class="nv">init</span> <span class="p">[]</span>
<span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">y</span> <span class="p">(</span><span class="nb">slurp </span><span class="s">"LINES"</span><span class="p">)</span>
<span class="nv">ys</span> <span class="p">(</span><span class="nf">clojure.string/split</span> <span class="nv">y</span> <span class="o">#</span><span class="s">"\000"</span><span class="p">)</span>
<span class="nv">ys</span> <span class="p">(</span><span class="nb">drop </span><span class="mi">1</span> <span class="nv">ys</span><span class="p">)</span> <span class="c1">; by convention, documentation</span>
<span class="nv">t</span> <span class="p">(</span><span class="nb">slurp </span><span class="s">"TOKEN"</span><span class="p">)]</span>
<span class="p">(</span><span class="nb">println </span><span class="s">"Read"</span> <span class="p">(</span><span class="nb">count </span><span class="nv">ys</span><span class="p">)</span> <span class="s">"lines and token"</span> <span class="o">@</span><span class="nv">token</span><span class="p">)</span>
<span class="p">(</span><span class="nf">reset!</span> <span class="nv">quotes</span> <span class="nv">ys</span><span class="p">)</span>
<span class="p">(</span><span class="nf">reset!</span> <span class="nv">token</span> <span class="nv">t</span><span class="p">)))</span>
<span class="p">(</span><span class="kd">defn </span><span class="nv">get-quote</span> <span class="p">[]</span> <span class="p">(</span><span class="nf">rand-nth</span> <span class="o">@</span><span class="nv">quotes</span><span class="p">))</span>
<span class="p">(</span><span class="kd">defn </span><span class="nv">request-ok?</span> <span class="p">[</span><span class="nv">params</span><span class="p">]</span> <span class="p">(</span><span class="nb">= </span><span class="p">(</span><span class="ss">:token</span> <span class="nv">params</span><span class="p">)</span> <span class="o">@</span><span class="nv">token</span><span class="p">))</span>
</code></pre></div>
<p>The quotes and top-secret token are <code>slurp</code>ed from files that are supposed to be locally available.
It's probably overkill to put them in atoms, but old habits die hard.</p>
<p>There's one, very boring route definition:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">slacks-json</span> <span class="p">[</span><span class="nv">params</span><span class="p">]</span>
<span class="p">(</span><span class="k">if </span><span class="p">(</span><span class="nf">slacks.quotes/request-ok?</span> <span class="nv">params</span><span class="p">)</span>
<span class="p">{</span><span class="ss">:status</span> <span class="mi">200</span>
<span class="ss">:headers</span> <span class="p">{</span><span class="s">"Content-Type"</span> <span class="s">"application/json"</span><span class="p">}</span>
<span class="ss">:body</span> <span class="p">(</span><span class="nf">clj-json.core/generate-string</span> <span class="p">{</span><span class="s">"text"</span> <span class="p">(</span><span class="nf">slacks.quotes/get-quote</span><span class="p">)})}</span>
<span class="p">{</span><span class="ss">:status</span> <span class="mi">403</span>
<span class="ss">:body</span> <span class="s">"Buzz off"</span><span class="p">}))</span>
<span class="p">(</span><span class="nf">defroutes</span> <span class="nv">home-routes</span>
<span class="p">(</span><span class="nf">POST</span> <span class="s">"/slacks"</span> <span class="p">{</span><span class="nv">params</span> <span class="ss">:params</span><span class="p">}</span> <span class="p">(</span><span class="nf">slacks-json</span> <span class="nv">params</span><span class="p">)))</span>
</code></pre></div>
<p>Slack is expecting a response of the form <code>{"text" : "clever quote"}</code>, which is most safely generated
via <code>clj-json</code>, since the clever quotes may turn out to contain gruesome punctuation.</p>
<p>And that's about it. You can launch the thing as usual with <code>lein ring server-headless PORT</code>, but it seems a bit icky to
run <code>lein</code> in production, or even in "production," made <code>lein ring uberjar</code> instead.</p>
<h2>Digital Ocean</h2>
<p>I have a droplet for this sort of thing. It's the cheapo 512MB version, running Ubuntu. At some point, I
set it up for running and managing JVM apps:</p>
<div class="highlight"><pre><span></span><code>apt-get install git-core maven leiningen nginx
</code></pre></div>
<p>To keep things moderately secure, I set <code>PasswordAuthentication no</code> in <code>/etc/ssh/sshd_config</code> and
then maybe kind of subvert that with <code>echo 'pnf ALL=(ALL) NOPASSWD:ALL' > /etc/sudoers.d/pnf</code>.</p>
<p>For today's purposes, I'll start by cloning my yowsabot repo: <code>git clone https://github.com/pnf/slacks.git</code></p>
<h2>nginx</h2>
<p>Like many people these days, I'd rather deploy web apps as small programs expose themselves to userland ports on <code>localhost</code>, and then use
<code>nginx</code> to proxy external requests to them. There's much more flexibility, in case I ever want to proxy to multiple
machines in a private network, and it's nice to have a few, small processes running, rather than one multi-threaded Goliath.</p>
<p>It's also pretty easy to configure SSL with <code>nginx</code>, and it would clearly be irresponsible to allow the Slack token to go out
in plaintext, thereby exposing my powerful apothegms to malefactors of all stripes. Digital Ocean has a
<a href="https://www.digitalocean.com/community/tutorials/how-to-create-a-ssl-certificate-on-nginx-for-ubuntu-14-04">nice tutorial</a>
for creating the needed SSL certificate, and you can blindly follow their instructions up to the point where they
say what to put in the <code>nginx</code> config, since their example is for static content.</p>
<p>In our case, we're going to redirect incoming requests to my compojure app, which for some reason or another I decided to have listen on 3001.
<code>/etc/nginx/sites-enabled/slacks</code> is a soft link to <code>/etc/nginx/sites-available/slacks</code>, which contains <em>in toto</em>:</p>
<div class="highlight"><pre><span></span><code><span class="nt">server</span> <span class="p">{</span>
<span class="err">listen</span> <span class="err">443</span><span class="p">;</span>
<span class="err">server_name</span> <span class="err">slacks.whatever.com</span><span class="p">;</span>
<span class="err">if</span> <span class="err">($host</span> <span class="err">!~</span> <span class="err">^(slacks.whatever.com)$</span> <span class="err">)</span> <span class="err">{</span>
<span class="err">return</span> <span class="err">444</span><span class="p">;</span>
<span class="p">}</span>
<span class="nt">if</span> <span class="o">($</span><span class="nt">request_method</span> <span class="o">!~</span> <span class="o">^</span><span class="nt">POST</span><span class="o">$</span> <span class="o">)</span> <span class="p">{</span>
<span class="err">return</span> <span class="err">444</span><span class="p">;</span>
<span class="p">}</span>
<span class="nt">if</span> <span class="o">($</span><span class="nt">http_user_agent</span> <span class="o">~*</span> <span class="nt">LWP</span><span class="p">::</span><span class="nd">Simple</span><span class="o">|</span><span class="nt">BBBike</span><span class="o">|</span><span class="nt">wget</span><span class="o">)</span> <span class="p">{</span>
<span class="err">return</span> <span class="err">403</span><span class="p">;</span>
<span class="p">}</span>
<span class="nt">ssl</span> <span class="nt">on</span><span class="o">;</span>
<span class="nt">ssl_certificate</span> <span class="o">/</span><span class="nt">etc</span><span class="o">/</span><span class="nt">nginx</span><span class="o">/</span><span class="nt">ssl</span><span class="o">/</span><span class="nt">server</span><span class="p">.</span><span class="nc">crt</span><span class="o">;</span>
<span class="nt">ssl_certificate_key</span> <span class="o">/</span><span class="nt">etc</span><span class="o">/</span><span class="nt">nginx</span><span class="o">/</span><span class="nt">ssl</span><span class="o">/</span><span class="nt">server</span><span class="p">.</span><span class="nc">key</span><span class="o">;</span>
<span class="nt">location</span> <span class="o">/</span> <span class="p">{</span>
<span class="err">proxy_pass</span> <span class="n">http</span><span class="p">:</span><span class="o">//</span><span class="n">localhost</span><span class="o">:</span><span class="mi">3001</span><span class="p">;</span>
<span class="err">#proxy_redirect</span> <span class="err">default</span><span class="p">;</span>
<span class="err">proxy_redirect</span> <span class="err">off</span><span class="p">;</span>
<span class="err">proxy_buffering</span> <span class="err">off</span><span class="p">;</span>
<span class="err">proxy_set_header</span> <span class="err">Host</span> <span class="err">slacks.whatever.com</span><span class="p">;</span>
<span class="err">proxy_set_header</span> <span class="err">X-Real-IP</span> <span class="err">$remote_addr</span><span class="p">;</span>
<span class="err">proxy_set_header</span> <span class="err">X-Forwarded-For</span> <span class="err">$proxy_add_x_forwarded_for</span><span class="p">;</span>
<span class="p">}</span>
<span class="err">}</span>
</code></pre></div>
<p>and <code>/etc/nginx/sites-available/default</code> has been removed.</p>
<p>As <code>pnf</code>, I can <code>sudo /etc/init.d/nginx restart</code> and all is well. There basically nothing you can do other than post to
<code>https://slacks.whatever.com/slacks</code>, and you'll get 403'd if you don't provide the right token.</p>
<h2>commando.io</h2>
<p><a href="https://commando.io/">commando.io</a> is a probably not necessary in this equation, but I've been playing with it recently and decided to
let it play. It's supposed to function as an online control panel for controlling software deployed around the cloud, which it does by
letting you define "recipes" and then executing them over a ssh (which obviously requires putting their public key on the servers in
question).</p>
<p>My recipe just pulls the latest code down from git, builds an uberjar and then runs it as a daemon:</p>
<div class="highlight"><pre><span></span><code> <span class="nv">DIR</span><span class="o">=</span><span class="si">${</span><span class="nv">HOME</span><span class="si">}</span>/slacks
ps augxw <span class="p">|</span> grep slacks.jar <span class="p">|</span> grep -v grep <span class="p">|</span> awk <span class="s1">'{print "kill",$2}'</span> <span class="p">|</span> sh -v
sleep <span class="m">10</span>
<span class="nb">cd</span> <span class="si">${</span><span class="nv">DIR</span><span class="si">}</span>
git pull
lein ring uberjar
<span class="nb">export</span> <span class="nv">PORT</span><span class="o">=</span><span class="m">3001</span>
/bin/rm -f STDERR STDOUT
daemon -D <span class="si">${</span><span class="nv">DIR</span><span class="si">}</span> -O <span class="si">${</span><span class="nv">DIR</span><span class="si">}</span>/STDOUT -E <span class="si">${</span><span class="nv">DIR</span><span class="si">}</span>/STDERR -- java -jar <span class="si">${</span><span class="nv">DIR</span><span class="si">}</span>/target/slacks.jar
</code></pre></div>
<p>Occasionally, I might need to log into the server to mess with something, but most changes to the bot can be administered via git and commando.</p>
<h2>So there you are</h2>
<p>As noted above, the bot code lives at https://github.com/pnf/slacks.git.</p>
<p>I suppose I have to do something useful now.</p>Boyhood vs Little House Smackdown2014-08-05T12:00:00-04:002014-08-05T12:00:00-04:00Peter Fraenkeltag:blog.podsnap.com,2014-08-05:/boyhood.html<blockquote>
<p>She was glad that the cozy house, and Pa and Ma and the firelight and
the music, were now. They could not be forgotten, she thought, because
now is now. It can never be a long time ago.</p>
</blockquote>
<p>Or, put another way, "It's constant, the moments, it's just - it's
like it's always right now, you know?"</p>
<p>So little Laura figures it all out at the age of 5, while Mason comes to the realization only
after implanting an ear gauge, leaving for college and getting high in a location of
staggering natural beauty. But it's not like it's a competition …</p><blockquote>
<p>She was glad that the cozy house, and Pa and Ma and the firelight and
the music, were now. They could not be forgotten, she thought, because
now is now. It can never be a long time ago.</p>
</blockquote>
<p>Or, put another way, "It's constant, the moments, it's just - it's
like it's always right now, you know?"</p>
<p>So little Laura figures it all out at the age of 5, while Mason comes to the realization only
after implanting an ear gauge, leaving for college and getting high in a location of
staggering natural beauty. But it's not like it's a competition or anything. Neither
of them is real anyway.</p>
<p>But it could be a competition! That could be fun! Let's examine other points of comparison.</p>
<h1>Watering down the Bechdel Test</h1>
<p>Maybe it isn't fair to apply the Bechdel Test to a movie called "Boyhood."
If two female characters were to talk about something other than a boy, they would
at the very least be veering off topic. While there would seem to be at least a little
time for digression in a 165 minutes, benefit of the doubt is, in this case, a costless
commodity. Instead, let's consider progressively watered down versions of the test.</p>
<h3>Are there two named women who talk to each other about something other than a man?</h3>
<p>No, not in Boyhood. That one's easy. Little House, both in the original books and various
franchises, features numerous female characters and conversations. Admittedly, the topics of
choice are usually related to cookery and other domestic pursuits, but not exclusively
about men.</p>
<h3>Do two named female characters ever have a conversation that might even be about a man?</h3>
<p>Arguably, in Boyhood. Olivia escapes from her abusive professor husband with the help of a friend, with
whom some coordination is necessary, although most of it occurs off screen. Still, some conversation
about man is implied.
From Little House, of course, the collected conversations about Pa alone would make a book of respectable length.</p>
<h3>Are two female characters ever in proximity long enough to have possibly conducted a conversation?</h3>
<p>In Boyhood, sort of, but most of the characters in question are extras, so, for all intents and purposes, no.
In Little House, most definitely, interminably even.</p>
<h3>Does any single named female character ever participate in a conversation, with a man, but not about a man.</h3>
<p>Kudos to Team Linklater! Olivia scores some brief interactions with Mason about his homework and curfew,
and the young Samantha makes same sassy remarks before her character more or less shuts down (unhelpfully
promulgating the myth that girls must shed all semblance of personality before growing up).</p>
<p>However, I think I'm going to give the point to Wilder, because the conversations are significantly longer and
deeper, touching on issues as complex as the rights of indigenous Americans.</p>
<h3>Does any single female character participate in a conversation with a man, about a man?</h3>
<p>Score! Sheena listens appreciatively to Mason's Linklateresque ramblings, at lengths that would
bring a weaker character to her knees. Nothing in Little House approaches such virtuosic abnegation, and your
humble blogger was brought to tears by Zoe Graham's portrayal of a true American hero.</p>
<h3>Do any male characters participate in a conversation about a woman?</h3>
<p>Yes, though I recall most being along the lines of, "I think she digs you." There's that father and son
bonding session after Sheena finally comes to her senses, but the Mason Jr.'s depiction of an adolescent in the throws
of romantic grief is possibly the worst ever committed to film, and that may disqualify the whole conversation.</p>
<p>What about Little House? I must confess that my recall of the opus does not include the answer to this question. I am,
after all, a boy.</p>
<h1>Emotional resonance</h1>
<p>We've all been children, so why do we like to see childhood depicted in film? I think the answer is that our memories of childhood
are lossily compressed and nearly obscured by interpretation and revision. We have memories from childhood, but we've lost the memory of
<em>being</em> a child, and we mourn that loss. The best movies about childhood take us back to a mindset that we can no longer summon on our own,
tricking us for a moment into viewing the world without the lens of subsequent experience:
Antoine Doinel in <a href="http://www.imdb.com/title/tt0053198/">The 400 Blows</a>,
Scout in <a href="http://www.imdb.com/title/tt0056592">To Kill a Mockingbird</a>, even the young Jack in <a href="http://www.imdb.com/title/tt0478304">Tree of Life</a> or
Danny in <a href="http://www.imdb.com/title/tt0081505/">The Shining</a> make us
experience the unexperienceable for just a moment: not summarizing a memory but revisiting the past.</p>
<p>Boyhood, by comparison, feels like a movie about a boyhood as recalled. Mason is someone about whom we accumulate observations, but we never see the
world through his eyes. Late in the film, there's an attempt to accomplish this trick literally, by showing us the framing of photographs that Mason
takes, but this technique is as ineffective as it is shopworn. When Mason takes a well composed shot of some rusty Americana, what we learn is is complimentary
to the cinematographers but nearly irrelevant to Mason's character.<sup id="fnref:photodork"><a class="footnote-ref" href="#fn:photodork">2</a></sup></p>
<p>It has been pointed out to me that the film does do a good job of capturing Texas - in a way explaining the nostalgia that expatriate Texans feel, even
expatriates who object to quite a lot about Texas</p>
<p>I'll go out on a limb here and predict that, 50 years from now, Laura Ingalls Wilder will occupy an order of magnitude greater heartspace than Mason.</p>
<h1>Credit for originality</h1>
<p>Ethan Hawke wrote in a <a href="http://www.reddit.com/r/IAmA/comments/1fq1h6/i_am_ethan_hawke_amaa/">Reddit AMA</a>
that "Richard Linklater and I have made a short film every
year for the last 11 years, one more to go, that follows the
development of a young boy from age 6 to 18."
In making series of films depicting the same characters, portrayed by the same actors, over a substantial period of time,
Linklater is not breaking ground. Truffaut did it with <a href="https://en.wikipedia.org/wiki/Antoine_Doinel#The_Adventures_of_Antoine_Doinel">Antoine Doinel</a>, starting
with The 400 Blows<sup id="fnref:sf"><a class="footnote-ref" href="#fn:sf">1</a></sup>, mentioned above. The Linklater gang themselves contributed to the genre in <a href="http://www.imdb.com/title/tt0112471">their</a>
<a href="http://www.imdb.com/title/tt2209418/">Before</a> <a href="http://www.imdb.com/title/tt0381681/">series</a>, and it would be unfair not to mention
Michael Apted's brilliant $7\times n$ <a href="http://www.imdb.com/title/tt0058578">Up</a> documentaries.
Boyhood's original twist, therefore, is holding on to the footage and releasing it all at once, which is really just a marketing innovation.</p>
<p>Outside of film, the technique is even less original. While Proust, Dickens and the Bible come most easily to mind,
Little House itself is an obvious example, so again, point goes to Wilder.</p>
<h1>Salvage</h1>
<p>There are a few good points.</p>
<p>First, Patricia Arquette has demonstrated an uncanny ability to bring vivid humanity to an underwritten, mechanical role, and she will have earned the
Supporting Actress Oscar that is surely coming her way.
It is one of cinema's great tragedies that we will never know the depths she might have brought to R2D2.</p>
<p>Second, Ethan Hawke does a credible job, creating a character of some complexity and evolving believably over the course of the film. It's not brilliant,
but it's not annoying either.</p>
<h1>And the winner is...</h1>
<p>Little House on the Prairie. Choose your medium.</p>
<div class="footnote">
<hr>
<ol>
<li id="fn:sf">
<p>Thanks to Stephanie Friedman for suggesting 400 Blows and making me look clever here. <a class="footnote-backref" href="#fnref:sf" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
<li id="fn:photodork">
<p>Other than reminding us of every other pretentious dork who carries around a larger-than-necessary camera and likes to look serious. <a class="footnote-backref" href="#fnref:photodork" title="Jump back to footnote 2 in the text">↩</a></p>
</li>
</ol>
</div>Fortran, Clojure, Haskell and Julia are not at war2014-06-02T00:00:00-04:002014-06-02T00:00:00-04:00Peter Fraenkeltag:blog.podsnap.com,2014-06-02:/arsticle.html<h2>TL;DR</h2>
<ul>
<li>There was this perplexing article in Ars Technica about replacing Fortran
for scientific computation.</li>
<li>The main performance advantage of Fortran is that its restrictive
data structures facilitate vector optimization.</li>
<li>There are many ways to match and even outperform Fortran, while using
other languages.</li>
<li>But it's OK to use Fortran if you want to.</li>
<li>Ars Technica notwithstanding, Clojure, Haskell and Julia are not
locked in competition to replace Fortran.</li>
<li>Ars Technica notwithstanding, Fibonacci sequences may be something
we learned about once, but it is difficult to disguise their
irrelevance to scientific computing.</li>
</ul>
<h2>Preamble</h2>
<p>Ten or fifteen pages into a …</p><h2>TL;DR</h2>
<ul>
<li>There was this perplexing article in Ars Technica about replacing Fortran
for scientific computation.</li>
<li>The main performance advantage of Fortran is that its restrictive
data structures facilitate vector optimization.</li>
<li>There are many ways to match and even outperform Fortran, while using
other languages.</li>
<li>But it's OK to use Fortran if you want to.</li>
<li>Ars Technica notwithstanding, Clojure, Haskell and Julia are not
locked in competition to replace Fortran.</li>
<li>Ars Technica notwithstanding, Fibonacci sequences may be something
we learned about once, but it is difficult to disguise their
irrelevance to scientific computing.</li>
</ul>
<h2>Preamble</h2>
<p>Ten or fifteen pages into a typical Ars Technica piece on the latest
generation of graphics cards, I am forced to sigh, acknowledging that years of
grueling study would never enable to me write with the erudition and
verbosity of the author I'm reading.</p>
<p>Came along
<a href="http://arstechnica.com/science/2014/05/scientific-computings-future-can-any-coding-language-top-a-1950s-behemoth/">this</a> survey of languages for scientific computation, and I finally saw the light. The trick is that you don't always have to be right.
With that liberating thought, now it's my turn.</p>
<h2>Scientific Computing</h2>
<p>For concreteness, we'll define scientific computing as
computer programs written by physical and natural scientists (not
mathematicians or "data scientists") to
perform time-consuming numerical calculations and simulations, which
need to run as fast as possible, and where throwing more hardware at
the problem is not economically feasible.</p>
<p>These conditions do not always hold, but they provide a useful
framework for discussion, and they seem to be the implicit assumptions
made in the Ars Technica article.</p>
<h2>Not using Fortran</h2>
<p>I cut my teeth on Fortran. It's where I wrote my very first (and
only) bubble sort and (I admit this is peculiar) quite a few device
drivers. I still have an uncanny ability to judge the exact moment
when auto-repeat has emitted precisely six spaces, which accounts for
my reputation as the life of any party. Notwithstanding this jolly
history, I'm very OK charting a life course whereon I expect never to
type another line of Fortran.</p>
<p>In circumstances requiring the performance said to be delivered only
by Fortran, I would go one of three ways: First - honestly the most
likely - I would discover that someone already wrote Fortran code for
the same problem 40 years ago. Second, I could write it in C++, by
which I mostly but not quite mean C. Third, I'd notice that the core cycle
consumption of my code is in stuff that could be replaced by calls to
libraries that were customized to my hardware platform, such as
Intel's Math Kernel Library or ATLAS (discussed below),
whereupon it wouldn't matter much from a
performance perspective what I wrote the rest of the code in. These
strategies will work because the performance benefits of Fortran lie
almost entirely in keyhole optimization - maximally exploiting the
processor in the tight loops. Most of these optimizations are
performed by multi-language OSS toolchains like gcc and llvm; almost all of
them are performed by cpu vendor C/C++ compilers like icc; and absolutely all
of them are employed in the hand-tuned code of cpu vendor math
libraries.</p>
<h2>Go ahead, use Fortran</h2>
<p>The one innate advantage of Fortran from a performance perspective is
that its most complex data structures available are arrays of basic
numerical types. Results and inputs can never overlap, and data are
always aligned to be maximally digestible by the processor's vector instructions. In less
restrictive languages, you have to follow instructions like
<a href="https://software.intel.com/sites/products/documentation/studio/composer/en-us/2011Update/compiler_c/optaps/common/optaps_vec_keys.htm">these</a>,
which aren't particularly onerous.</p>
<p>Other than not having to read the foregoing link, the main reasons to use
Fortran are</p>
<ol>
<li>It's what you know, everyone around you is using it.</li>
<li>You don't really care about code quality and extensibility as long
as you can get it to work.</li>
</ol>
<p>For the record, <strong>these are very good reasons</strong>. They explain my
Fortran device drivers, which I wrote largely to support my own
graduate research. Most scientists are writing code that will not
used for very long or by many other people and will be ultimately
verified by independent implementation in other labs anyway.
Training
themselves to reconceptualize programming would be a waste of time.
Personally, I found it to be an enjoyable waste of time, which is why
I'm no longer a scientist. (Also, I didn't want to live in Podunk.)</p>
<h2>"Using" Fortran</h2>
<p>If you don't love Fortran but have an existing mass of gnarly Fortran code that you don't want to rewrite, it
is very much an option to call it from another language. Fortran's calling convention
is essentially C's, except every argument has to be a pointer and 2D arrays are transposed.
Any language that can call C can call Fortran too. Julia <a href="http://julia.readthedocs.org/en/latest/manual/calling-c-and-fortran-code/">makes</a>
the process ridiculously easy, but it's never more than a few googles away in any language.</p>
<p>If you just want to be part of the club, consider using, in the language of your choice,
single letters from i through n as integer counter variables
and array subscripts. Your comp sci friends may snicker, but the code
will look a lot more like the math behind it.</p>
<h2>ATLAS shrugged - Outperforming Fortran</h2>
<p>Have you ever wondered why it takes so long to install NumPy from source?
Most of the time is spent in <a href="http://acts.nersc.gov/atlas/">ATLAS</a>,
which does a brute force search over combinations of loop
unrolling, multithreading and various other code substitutions,
in search of the optimal implementations for a host of numerical
routines (mostly matrix related). The reason they don't do this
work on their own time and hardware is that optimality is
highly dependent on exact configurations of CPU, cache and memory.</p>
<p>When you compare ATLAS-optimized to pre-compiled BLAS operations
(irrespective of the language they were pre-compiled from), the
difference can be staggering, far more than you can get by manual
fiddling and experimenting with compiler flags, and often better than
vendor-supplied libraries.</p>
<p>This, by the way, explains why JVM-based matrix math performance is so crappy.
With the hard constraint of doing optimization just-in-time, there's no way
to compete with someone who had hours to do it.</p>
<p>Another approach to linear algebra optimization is taken by the
<a href="http://eigen.tuxfamily.org/index.php?title=Main_Page">Eigen template library</a>
for C++. Here, the advantage comes from template metaprogramming,
which allows custom rearrangement and inlining, specializing to fixed
size operations, eliding temporaries by combining successive
operations and spoon-feeding the optimizing compiler.</p>
<p>With a lengthy template instantiation stage, compilation takes
longer, but of course not as long as ATLAS would have spent, and the
resulting optimizations, while different from ATLAS's, seem to be as
valuable. </p>
<p>Among ATLAS, Eigen and vendor libraries, which one benchmarks fastest will
depend on specific circumstances, the most crucial of which is whether
the benchmarks are being presented by, respectively, ATLAS, Eigen or
the vendor. But no matter who does the benchmarking, the results are vastly
better than what you would achieve coding the algorithms yourself in Fortran,
despite its ostensible inherent advantages.</p>
<h2>Functional Programming and Concurrency</h2>
<p>When thinking about scientific computing, we should keep in mind that
concurrency comes in three flavors:</p>
<ol>
<li>Managing a large number of asynchronous clients and events, and
keeping them from destroying each other.</li>
<li>Extracting the maximum floating-point operations per second from a
single computer.</li>
<li>Dividing a calculation into multiple, identical pieces that are
"embarrassingly parallel" in that they don't interact until their
results are harvested.</li>
</ol>
<p>When we say that a language, particularly a functional language, makes
it easy to "reason about" concurrency, we almost should mean the first
of these. It's an extremely difficult and important problem in real
world applications, but it's generally not what scientists are worried
about.</p>
<p>Maximizing FLOPS, as we've discussed, is a complicated game. The main
reason that functional programming is mentioned in this context is
aliasing: as with Fortran, you don't have to worry about result arrays
overlapping with input, but in the case of FP it's because the arrays
are immutable. Immutability is not the easiest, and certainly not the
cheapest way to deal with aliasing, and if it's achieved with
pointer-rich persistent data structures, it won't even help you
with vectorizing.</p>
<p>Embarrassingly parallel computations, it goes without saying, do not
become vastly easier or more difficult as a result of language choice.</p>
<h2>The Arsticle</h2>
<p>Assuming that it wasn't all a stunt by Marcel Duchamp's geeky younger brother,
here, apparently, is what you need to know about scientific computing:</p>
<ul>
<li>There are exactly three contenders for the Fortran crown: Clojure, Haskell and Julia. </li>
<li>The best way to compare them is by using them to compute the
Fibonacci series to arbitrary precision.</li>
</ul>
<p>This is asinine. Clojure, Haskell and Julia are wonderful languages,
but they are not competing to displace Fortran. If you really need to
view scientific computing from a Game of Thrones perspective, it would
be dangerous to ignore Python and C++ (House Targaryen and White
Walkers, respectively). Whether you like them or not, both of these
languages are actually <em>used</em> by scientists, which is why
things like Numpy and Eigen even exist.</p>
<p>Of the three ostensible combatants, Julia is the most likely to appeal
to current users of Fortran, if only because they'll be able to read
it easily and because, as mentioned earlier, it's extremely compatible
with existing Fortran libraries. The REPL experience alone may be enough to
draw them in, because the typical scientist's approach to code
involves a lot of trial and error, which is vastly more pleasant
without a compile cycle. I like multiple dispatch, but I don't think
that's why scientists will like Julia.
And now I'm going to stop talking about Julia, because I
haven't actually coded in it, even though that isn't a good enough
reason to stop some people.</p>
<p>Clojure is gorgeous. I love it deeply, and we're probably moving in
together. The article notes that it was designed to make it easy to reason
about concurrency, and this is indeed true, but it's the first type of
concurrency listed above, not the kind that scientists care about. The fact most
pertinent to discussion of Clojure for scientific computing is that it
runs on a virtual machine, so if you were going to discuss it from that
perspective, you would probably cover Java (or Javascript) first.</p>
<p>Haskell is a terrific workout partner. I won't dispute that it <em>can</em> be
employed to write useful code, but most people who learn it will tell
you that it made them a better programmer - in another language.
Most of what Ars Technica has to say about Haskell is accurate, but it
does make one wonder why they chose to mention it in an article about
Fortran and scientific computing. The fact that some guy at
<a href="http://mas.lvc.edu/~walck/phy261/syl261.html">Lebanon Valley College</a>
taught a course using it is not compelling.</p>
<h2>Fibonacci Sequences</h2>
<p>It's hard to imagine a worse example to use when evaluating languages
for scientific computing. The irreducible challenge in computing
Fibonacci sequences is efficiently maintaining an arbitrary precision
integer, which has little to do with the vectorized
floating point calculations so important to the simulation of physical
systems. If, however, you decide to shove Fibonacci code examples into an
article about Fortran, and you are for some reason OK with having
all the heavy lifting done in someone else's BigInt implementation,
you should probably provide examples that don't crash within seconds.
If your program never makes it past a minute of execution, why the
hell were you worried about its computational efficiency?</p>
<p>Ars gives us</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="k">def </span><span class="nv">fibs</span> <span class="p">(</span><span class="nb">lazy-cat </span><span class="p">[</span><span class="mi">0</span><span class="nv">N</span> <span class="mi">1</span><span class="nv">N</span><span class="p">]</span> <span class="p">(</span><span class="nb">map + </span><span class="nv">fibs</span> <span class="p">(</span><span class="nb">rest </span><span class="nv">fibs</span><span class="p">))))</span>
<span class="p">(</span><span class="k">def </span><span class="nv">fib</span> <span class="p">[</span><span class="nv">n</span><span class="p">]</span> <span class="p">(</span><span class="nb">nth </span><span class="nv">fibs</span> <span class="nv">n</span><span class="p">))</span>
</code></pre></div>
<p>and</p>
<div class="highlight"><pre><span></span><code><span class="nf">fib</span> <span class="ow">::</span> <span class="kt">Integer</span> <span class="ow">-></span> <span class="kt">Integer</span>
<span class="nf">fib</span> <span class="mi">0</span> <span class="ow">=</span> <span class="mi">1</span>
<span class="nf">fib</span> <span class="mi">1</span> <span class="ow">=</span> <span class="mi">1</span>
<span class="nf">fib</span> <span class="n">n</span> <span class="ow">=</span> <span class="n">fib</span> <span class="p">(</span><span class="n">n</span><span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="o">+</span> <span class="n">fib</span> <span class="p">(</span><span class="n">n</span><span class="o">-</span><span class="mi">2</span><span class="p">)</span>
</code></pre></div>
<p>For any challengingly large n, both of these programs will gorge on
memory until they die. The Clojure example will run out of heap,
having created a sort of snake that never relinquishes its head, while
the Haskell version will exhaust stack. Of course, when they don't die,
you end up with an approximately 2^n log(2)/log(10) digit number,
so if you try this at home be sure to do something with the result
other than printing it out.</p>
<p>While it still doesn't have anything to do with scientific computing,
you can of course find Fibonacci numbers in Clojure and Haskell without
running out of memory immediately. Of course, you're going to end up
the way you would have done it procedurally, and then tie it up in a
functional bow. Of the top of my head:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">fib2</span> <span class="p">[</span><span class="nv">n</span><span class="p">]</span> <span class="p">(</span><span class="nf">second</span> <span class="p">(</span><span class="nb">nth </span><span class="p">(</span><span class="nb">iterate </span><span class="p">(</span><span class="kd">fn </span><span class="p">[[</span><span class="nv">i1</span> <span class="nv">i2</span><span class="p">]]</span> <span class="p">[</span><span class="nv">i2</span> <span class="p">(</span><span class="nf">+</span> <span class="nv">i1</span> <span class="nv">i2</span><span class="p">)])</span> <span class="p">[</span><span class="mi">0</span><span class="nv">N</span> <span class="mi">1</span><span class="nv">N</span><span class="p">])</span> <span class="nv">n</span><span class="p">)))</span>
</code></pre></div>
<div class="highlight"><pre><span></span><code><span class="n">fib2</span> <span class="o">::</span> <span class="n">Int</span> <span class="o">-></span> <span class="kt">Integer</span>
<span class="n">fib2</span> <span class="n">n</span> <span class="o">=</span> <span class="n">fib</span><span class="s">' n 0 1</span>
<span class="s"> where fib'</span> <span class="o">::</span> <span class="n">Int</span> <span class="o">-></span> <span class="kt">Integer</span> <span class="o">-></span> <span class="kt">Integer</span> <span class="o">-></span> <span class="kt">Integer</span>
<span class="n">fib</span><span class="s">' 0 _ i2 = i2</span>
<span class="s"> fib'</span> <span class="n">n</span> <span class="n">i1</span> <span class="n">i2</span> <span class="o">=</span> <span class="n">fib</span><span class="s">' (n-1) i2 (i1+i2)</span>
</code></pre></div>
<h2>Big Reveal</h2>
<p>The Fibonacci Sequence is an even better example of a bad example than
I thought. Recently, Intel introduced
<a href="http://www.intel.com/content/www/us/en/intelligent-systems/intel-technology/ia-large-integer-arithmetic-paper.html">three new instructions</a>
to facilitate arbitrary precision integer arithmetic. On Intel
hardware, presumably, the efficiency
of any non-pathological Fibonacci generator will depend
entirely on whether the library or classes you use to manipulate large
integers employs. As with matrix operations, the choice of
language fades in importance compared to the choice of library.</p>Sign of the Times - Managing inhomogeneously bitemporal data with Datomic and Clojure2014-05-22T00:00:00-04:002014-05-22T00:00:00-04:00Peter Fraenkeltag:blog.podsnap.com,2014-05-22:/bitemp.html<p>Time is confusing. Sometimes, it's so confusing that our thinking about it changes... over time. Which makes
it triply confusing, or something. In this post, I want to talk about two aspects of temporal data, both
common, useful and often misunderstood:</p>
<ul>
<li>Homogeneous vs Inhomogeneous</li>
<li>Temporal vs Bitemporal</li>
</ul>
<h2>Homogeneous Temporal Data</h2>
<p><img style="float: left" src="./images/bitemp-milk.png" width="20%"/>
The first thing to stress is that we're talking about
<strong>homogeneous</strong> data, because "homogenous" means something <a href="http://www.collinsdictionary.com/dictionary/english/homogeny">completely different</a>.</p>
<p>In a homogeneous, temporal data set, one column will contain a known sequence of
discrete time values, and the other
columns will contain whatever information is associated with those times.</p>
<p/>
<p>You would …</p><p>Time is confusing. Sometimes, it's so confusing that our thinking about it changes... over time. Which makes
it triply confusing, or something. In this post, I want to talk about two aspects of temporal data, both
common, useful and often misunderstood:</p>
<ul>
<li>Homogeneous vs Inhomogeneous</li>
<li>Temporal vs Bitemporal</li>
</ul>
<h2>Homogeneous Temporal Data</h2>
<p><img style="float: left" src="./images/bitemp-milk.png" width="20%"/>
The first thing to stress is that we're talking about
<strong>homogeneous</strong> data, because "homogenous" means something <a href="http://www.collinsdictionary.com/dictionary/english/homogeny">completely different</a>.</p>
<p>In a homogeneous, temporal data set, one column will contain a known sequence of
discrete time values, and the other
columns will contain whatever information is associated with those times.</p>
<p/>
<p>You would
query for an exact match in time, e.g., in SQL</p>
<div class="highlight"><pre><span></span><code> <span class="k">select</span> <span class="o">*</span> <span class="k">where</span> <span class="k">time</span><span class="o">=</span><span class="mi">3</span>
</code></pre></div>
<p>and expect to find data exactly where you asked for it.
<img alt="" src="./images/bitemp-homog.png">
Such a data set might contain a daily closing stock price, or the maximum temperature observed in each
hour of the day -- anything, as long as the time values are discrete and known, and you expect to find
data at every one of them.</p>
<h2>Inhomogeneous Temporal Data</h2>
<p>By contrast, <strong>in</strong>homogeneous temporal data holds observations that occur at arbitrary times.
You typically query for information <em>as of</em> a point in time, expecting to get back the most
recent observation that isn't after the query time:</p>
<p><img alt="" src="./images/bitemp-inhom.png"></p>
<p>In SQL, the query is a bit awkward</p>
<div class="highlight"><pre><span></span><code> <span class="k">select</span> <span class="o">*</span> <span class="k">where</span> <span class="k">time</span> <span class="o"><=</span> <span class="mi">3</span><span class="p">.</span><span class="mi">25</span>
<span class="k">order</span> <span class="k">by</span> <span class="k">time</span> <span class="k">desc</span>
<span class="k">limit</span> <span class="mi">1</span>
</code></pre></div>
<p>but would be reasonably efficient in many databases, as long as <code>time</code> was indexed properly.
There's an an O(log N) to find the first acceptable time value, and then we just return it. (Some
databases will pathologically insist on realizing the entire set of previous times, in which case
you'll have to do something cleverer.)</p>
<p>Such a data set might contain posted limit orders for a given stock, or someone's home address indexed by
the time they moved into it. The time values are generally not known in advance, and you certainly don't
expect to find data at every possible time.</p>
<h2>Bitemporal Data</h2>
<p>Bitemporal data is inhomogeneous data where we care not just about the time to which the data pertains, but
also about the time when we started thinking that did.</p>
<p>True story. Earlier this year, I decided to write down the recent history of my residences:</p>
<div class="highlight"><pre><span></span><code> Move-in Address
20100608 705 Hauser Street, Queens
20111013 212B Baker Street, London
20131125 33 Windsor Gardens, London
20140101 742 Evergreen Terrace, Springfield
</code></pre></div>
<p>Last week, a friend reminded me that I was briefly the Prime Minister of England, so I had to revise the
list. Then, last night while I was falling asleep, I suddenly remembered that I had moved in with the
Browns on Hallowe'en, and that their address was actually 32. At this point, it seemed like a good idea to
record what I had thought and when, if only so I could reconstruct which lies I had told on
which credit card applications:</p>
<div class="highlight"><pre><span></span><code> TT TV Address
20140111 20100608 705 Hauser Street, Queens
20140111 20111013 212B Baker Street, London
20140512 20120714 10 Downing Street, London
20140111 20131125 33 Windsor Gardens, London
20140512 20131031 32 Windsor Gardens, London
20140111 20140101 742 Evergreen Terrace, Springfield
</code></pre></div>
<p>In the spirit of academic rigor, I labeled my columns with the standard convention of <code>TV</code> for
the time to which the <strong>value</strong> pertains and <code>TT</code> for the <strong>transaction</strong> time when it was
entered into the ledger.</p>
<h2>Purely functional queries</h2>
<p>Enough about me; let's switch to some fictional data. The TV-axis below denotes the time at which
a measurement was taken. The value of the measurement is recorded as vertical height, and the time
it was recorded is on the TT-axis.</p>
<p><img alt="" src="./images/bitemp-2.png"></p>
<p>As of TT=10, we knew about points at TV=0.6, 1.5, 2.5, 4.4 and 5.6; at TT=12, we learned about a new
measurement at TV=3.5; at TT=17, we revised the measurement at TV=5.6 to a slightly higher value.</p>
<p>Now, imagine a query for TV=4, TT=10. It's not particularly graceful to cram this into SQL, but
if we did, the query would be something like:</p>
<div class="highlight"><pre><span></span><code> <span class="k">select</span> <span class="o">*</span> <span class="k">where</span> <span class="n">TV</span><span class="o"><=</span><span class="mi">4</span> <span class="k">and</span> <span class="n">TT</span><span class="o"><=</span><span class="mi">10</span>
<span class="k">order</span> <span class="k">by</span> <span class="n">TV</span> <span class="k">desc</span><span class="p">,</span> <span class="n">TT</span> <span class="k">desc</span>
<span class="k">limit</span> <span class="mi">1</span>
</code></pre></div>
<p>Note that this won't always be efficient, for reasons we'll discuss below. Graphically,
the blue arrows below start at the point of the query and end at the resolved value for queries
as of (TV=4, TT=10) and (TV=6, TT=10):</p>
<p><img alt="" src="./images/bitemp-3.png"></p>
<p>If we executed the same queries but with TT=20, the resolutions would be different, as shown in
red:</p>
<p><img alt="" src="./images/bitemp-4.png"></p>
<p>Now, the most recent value visible as of TV=4 is from 3.5 rather than 2.5; similarly, the query at
TV=6 continues to see the point from 5.6, but now it has been "overwritten" to a different value.</p>
<p>Of course, nothing was really overwritten: if we repeated the query at TT=10, we'd still get back the
blue results. If we can promise that</p>
<ol>
<li>transactions occur at strictly increasing TTs and</li>
<li>we never query for TT in the future beyond the next transaction TT</li>
</ol>
<p>which is most easily achieved by</p>
<ol>
<li>using the physical clock time for TT in transactions and</li>
<li>never querying for TT in the future,</li>
</ol>
<p>then a query for any given (TT,TV) pair will always return the same result, and querying
is a <strong>purely functional</strong> operation.</p>
<h2>Asymmetry</h2>
<p>Here's a 2D view from below of a similar dataset: TV is on the x-axis;
TT is on the y-axis, and we're recording the color of a lamp, which can be green, blue, red or splotchy
purplish.
Here, the region below the diagonal line is forbidden by causality, since we can't possibly record something
before we observe it.</p>
<p><img alt="" src="./images/bitemp-causality.png"></p>
<p>As of TT=3, there are two observations recorded, green at (TV=1.9, TT=2.9) and blue at (TV=2.3, TT=2.7)
A query for (TV=3, TT=3) will find the blue point at (TV=2.3, TT=2.7).</p>
<p>Why not green at (1.9, 2.9), which
is further from TV but closer to TT? Because
TT and TV are not precisely symmetric. We are asking the question, "at TT=3.0, what did
we think the color of the lamp was at TV=3.0?".
The alternate question, "what is the most recently recorded information as of TT=3.0
concerning the lamp's status at some time at or before TV=3.0?" is poseable, but not usually what we want.</p>
<p><img src="http://upload.wikimedia.org/wikipedia/commons/thumb/a/a7/Rashomon_poster_2.jpg/83px-Rashomon_poster_2.jpg" style="float: right"/>
At 3.3, we we spoke to a passing
<a href="http://www.imdb.com/name/nm0605270/">samurai</a>
who told us that, notwithstanding
<a href="http://www.imdb.com/name/nm0001536/">previous</a>
<a href="http://www.imdb.com/name/nm0477553">reports</a>,
the lamp was actually red.
Finally, at 4.25, a local
<a href="http://www.imdb.com/name/nm0793766/">woodcutter</a>
suggested that we split (get it?) the difference and call it splotchy purple.</p>
<p>Based on this revision history, queries for TV=3 as of TT=3, 4 and 5, respectively, would
return blue, red and purple.</p>
<h2>Efficiency</h2>
<p>As noted above, a single-sided SQL range query for both TV and TT is likely to be quite wasteful.
Typically, the optimizer will</p>
<ol>
<li>Pick one of the two temporal columns, say TV for concreteness, but it could be either.</li>
<li>Do an O(log N) search to find the most recent acceptable TV value.</li>
<li>Scan down through TV, assembling a realization of all M rows matching the TT and TV constraints, taking
O(N) time and O(M) space.</li>
<li>Sort them by TT, taking O(M log M) time.</li>
<li>Take the first and discard the rest of the realization.</li>
</ol>
<p>This is unacceptable for all but the tiniest data sets.</p>
<p>For many use cases, we can do better. Commonly, we can assume:</p>
<ul>
<li>TT >= TV - i.e. we are storing observations, not predictions. (As in the examples above.)</li>
<li>Revisions are infrequent.</li>
</ul>
<p>Typically, then, every datapoint will have just one observation, with TT slightly later than its TV
This can be then be nicely represented in a
<a href="http://www.postgresql.org/docs/8.2/static/indexes-multicolumn.html">multi-column index</a>, which essentially
stores a sorted structure of concatenations of TV and TT. For the address example, it would look like</p>
<div class="highlight"><pre><span></span><code><span class="n">TV</span><span class="o">:</span><span class="n">TT</span> <span class="n">Address</span>
<span class="mi">20100608</span><span class="o">:</span><span class="mi">20140111</span> <span class="mi">705</span> <span class="n">Hauser</span> <span class="n">Street</span><span class="o">,</span> <span class="n">Queens</span>
<span class="mi">20111013</span><span class="o">:</span><span class="mi">20140111</span> <span class="mi">212</span><span class="n">B</span> <span class="n">Baker</span> <span class="n">Street</span><span class="o">,</span> <span class="n">London</span>
<span class="mi">20120714</span><span class="o">:</span><span class="mi">20140512</span> <span class="mi">10</span> <span class="n">Downing</span> <span class="n">Street</span><span class="o">,</span> <span class="n">London</span>
<span class="mi">20131125</span><span class="o">:</span><span class="mi">20140111</span> <span class="mi">33</span> <span class="n">Windsor</span> <span class="n">Gardens</span><span class="o">,</span> <span class="n">London</span>
<span class="mi">20131031</span><span class="o">:</span><span class="mi">20140512</span> <span class="mi">32</span> <span class="n">Windsor</span> <span class="n">Gardens</span><span class="o">,</span> <span class="n">London</span>
<span class="mi">20140101</span><span class="o">:</span><span class="mi">20140111</span> <span class="mi">742</span> <span class="n">Evergreen</span> <span class="n">Terrace</span><span class="o">,</span> <span class="n">Springfield</span>
</code></pre></div>
<p>If I were looking, say, for what I thought on Valentines day 2014 that my address had been on
Christmas of 2013, the query would</p>
<ol>
<li>Perform O(ln N) search for the last TV:TT that is alphabetically less than or equal to "20131225:20140214".</li>
<li>Find "20131031:20140512", which violates the TT constraint.</li>
<li>Scan backwards until TT <= 20140214.</li>
<li>Hit "20131125:20140111" immediately.</li>
<li>Bother Paddington's neighbor.</li>
</ol>
<p>This index and query would work even without the assumptions above, but it might be more costly. E.g.</p>
<ul>
<li>If there were a huge number of revisions, we would have to scan through them all.</li>
<li>If we queried for a TT very much earlier than the desired TV, we might well scan backwards through
the entire data set before concluding that there was no such entry.</li>
</ul>
<h2>Bitemporality with Datomic</h2>
<p><a href="http://www.datomic.com/">Datomic</a> is a commercial database, invented by Rich Hickey and sold by
Cognitect. It has many desirable features, but for our purposes, three are important:</p>
<ul>
<li>The database is a <a href="http://en.wikipedia.org/wiki/Persistent_data_structure">persistent data structure</a>.</li>
<li>It has built-in TT-style temporality.</li>
<li>It has a nice clojure API.</li>
</ul>
<p>So there is a distinction between a <strong>connection</strong></p>
<div class="highlight"><pre><span></span><code> <span class="ss">(</span><span class="nv">def</span> <span class="nv">conn</span> <span class="ss">(</span><span class="nv">d</span><span class="o">/</span><span class="k">connect</span> <span class="s2">"</span><span class="s">datomic:free://localhost:4334/bitemp</span><span class="s2">"</span><span class="ss">))</span>
</code></pre></div>
<p>and a <strong>database</strong>.</p>
<div class="highlight"><pre><span></span><code> (def db (d/db conn))
</code></pre></div>
<p>The latter is a private, persistent, immutable object, and nothing anybody else does, now or in
the future, can affect the results of queries we make against it, as long as its in scope. Of course,
if we call <code>d/db</code> sometime later, its return value will "contains" any transactions executed
in the interim. We can, if we want, be precise about exactly what interim means:</p>
<div class="highlight"><pre><span></span><code> (def db (d/as-of (d/db conn) tx-time))
</code></pre></div>
<p>All <code>db</code>s thus obtained with a particular <code>tx-time</code> will contain exactly the same data.
By implication, anything placed into Datomic is immutable. An existent value can't change, but a new
one can be added.</p>
<p>For consistency, transaction time is strictly increasing across the entire system;
it is exactly what we're looking for in a TT. TV, however, we'll have to roll ourselves.
As a concrete example, let's implement a simple bitemporal store. It could hold things like
stock prices for random tickers at random times of the day,
colors of lamps, addresses - anything observed at discrete times, then recorded and
possibly revised later. The code, such as it is, is on
<a href="https://github.com/pnf/clojure-playground/blob/master/src/clj/playground/bitemp.clj">github</a>;
what I show here will be abbreviated for clarity.</p>
<p>Stripped of some boilerplate (i.e. this won't actually run), the schema defines four attributes:</p>
<div class="highlight"><pre><span></span><code> <span class="p">[{</span><span class="ss">:db/ident</span> <span class="ss">:bitemp/k</span>
<span class="ss">:db/valueType</span> <span class="ss">:db.type/string</span>
<span class="ss">:db/doc</span> <span class="s">"Non-temporal part of the key"</span><span class="p">}</span>
<span class="p">{</span><span class="ss">:db/ident</span> <span class="ss">:bitemp/tv</span>
<span class="ss">:db/valueType</span> <span class="ss">:db.type/instant</span>
<span class="ss">:db/doc</span> <span class="s">"Time for which data is relevant"</span>
<span class="ss">:db/index</span> <span class="nv">true</span><span class="p">}</span>
<span class="p">{</span><span class="ss">:db/ident</span> <span class="ss">:bitemp/index</span>
<span class="ss">:db/valueType</span> <span class="ss">:db.type/string</span>
<span class="ss">:db/doc</span> <span class="s">"This is the mysterious and exciting part."</span>
<span class="ss">:db/unique</span> <span class="ss">:db.unique/identity</span><span class="p">}</span>
<span class="p">{</span><span class="ss">:db/ident</span> <span class="ss">:bitemp/value</span>
<span class="ss">:db/valueType</span> <span class="ss">:db.type/string</span>
<span class="ss">:db/doc</span> <span class="s">"The value"</span><span class="p">}</span>
<span class="p">]</span>
</code></pre></div>
<p>The <code>k</code> field is the non-temporal identifier of whatever we're measuring. It could be a stock
ticker, the name of the person whose address we're tracking, the location of colorful lamp, etc.</p>
<p>We explicitly keep <code>tv</code>; its <code>instant</code> type is basically a java data, which is basically
a number of milliseconds since the great epoch. Datomic will take care of TT.</p>
<p>Per the comments, <code>:bitemp/index</code> attribute is mysterious and exciting, so ignore what it does for the
moment.</p>
<p>We can insert a value with something like:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">insert-tx</span> <span class="p">[</span><span class="nv">k</span> <span class="nv">tv</span> <span class="nv">value</span><span class="p">]</span>
<span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">tv</span> <span class="p">(</span><span class="nf">jd</span> <span class="nv">tv</span><span class="p">)</span>
<span class="nv">idx</span> <span class="p">(</span><span class="nf">idxid</span> <span class="nv">k</span> <span class="nv">tv</span><span class="p">)]</span>
<span class="p">{</span><span class="ss">:db/id</span> <span class="p">(</span><span class="nf">d/tempid</span> <span class="ss">:bitemp</span><span class="p">)</span>
<span class="ss">:bitemp/index</span> <span class="nv">idx</span>
<span class="ss">:bitemp/k</span> <span class="nv">k</span>
<span class="ss">:bitemp/tv</span> <span class="nv">tv</span>
<span class="ss">:bitemp/value</span> <span class="nv">value</span><span class="p">}))</span>
<span class="p">(</span><span class="kd">defn </span><span class="nv">insert-value</span> <span class="p">[</span><span class="nv">conn</span> <span class="nv">k</span> <span class="nv">tv</span> <span class="nv">value</span><span class="p">]</span>
<span class="p">(</span><span class="nb">-> </span><span class="o">@</span><span class="p">(</span><span class="nf">d/transact</span> <span class="nv">conn</span> <span class="p">[(</span><span class="nf">insert-tx</span> <span class="nv">k</span> <span class="nv">tv</span> <span class="nv">value</span><span class="p">)])</span>
<span class="ss">:tx-data</span> <span class="nb">first </span> <span class="nv">.v</span><span class="p">))</span>
</code></pre></div>
<p>The return value will be the transaction time.</p>
<h2>Magic, part I</h2>
<p>If we knew the exact TV we were looking for, we could query straightforwardly.
In Datalog:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">query-exact</span> <span class="p">[</span><span class="nv">conn</span> <span class="nv">k</span> <span class="nv">tv</span> <span class="nv">tt</span><span class="p">]</span>
<span class="p">(</span><span class="nf">d/q</span> <span class="o">'</span><span class="p">[</span><span class="ss">:find</span> <span class="nv">?v</span> <span class="ss">:in</span> <span class="nv">$</span> <span class="nv">?k</span> <span class="nv">?tv</span> <span class="ss">:where</span> <span class="p">[</span><span class="nv">?e</span> <span class="ss">:bitemp/k</span> <span class="nv">?k</span><span class="p">]</span>
<span class="p">[</span><span class="nv">?e</span> <span class="ss">:bitemp/tv</span> <span class="nv">?tv</span><span class="p">]</span>
<span class="p">[</span><span class="nv">?e</span> <span class="ss">:bitemp/v</span> <span class="nv">?v</span><span class="p">]]</span>
<span class="p">(</span><span class="nf">d/as-of</span> <span class="p">(</span><span class="nf">d/db</span> <span class="nv">conn</span><span class="p">)</span> <span class="nv">tt</span><span class="p">)</span> <span class="nv">k</span> <span class="nv">tv</span><span class="p">))</span>
</code></pre></div>
<p>This finds a database as of <code>tt</code>, looks in it for an entity (as inserted above) with a
specific <code>k</code> and <code>tv</code>, binds its ``v<code>to</code>?v``` and returns it.</p>
<p>Generally, we don't know exactly what TV to look for, which is where <code>index</code> comes in.
Datomic has an <code>index-range</code> function, which, for an indexed attribute, returns a lazy sequence
starting at the first entry <strong>greater than or equal</strong> to a specified value and increasing.
We want to devise entries
for this index such that this function will quickly locate that an exact match for <code>k</code> and
the first <code>tv</code> <strong>less than or equal</strong> to a target. The comparison value of our entries must
therefore <strong>decrease</strong> with increasing <code>tv</code>. We construct the field as follows:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">k-hash</span> <span class="p">[</span><span class="nv">k</span><span class="p">]</span> <span class="p">(</span><span class="nf">digest/md5</span> <span class="nv">k</span><span class="p">))</span>
<span class="p">(</span><span class="kd">defn </span><span class="nv">long->str</span> <span class="p">[</span><span class="nv">l</span><span class="p">]</span>
<span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">li</span> <span class="p">(</span><span class="nb">- </span><span class="mi">100000000000000</span> <span class="nv">l</span><span class="p">)]</span>
<span class="p">(</span><span class="nb">apply str </span><span class="p">(</span><span class="nb">map </span><span class="o">#</span><span class="p">(</span><span class="nf">->></span> <span class="nv">%</span> <span class="p">(</span><span class="nb">bit-shift-right </span><span class="nv">li</span><span class="p">)</span> <span class="p">(</span><span class="nb">bit-and </span><span class="mi">0</span><span class="nv">xFF</span><span class="p">)</span> <span class="nv">char</span><span class="p">)</span> <span class="p">[</span><span class="mi">56</span> <span class="mi">48</span> <span class="mi">40</span> <span class="mi">32</span> <span class="mi">24</span> <span class="mi">16</span> <span class="mi">8</span> <span class="mi">0</span><span class="p">]))))</span>
<span class="p">(</span><span class="kd">defn </span><span class="nv">t-format</span> <span class="p">[</span><span class="nv">t</span><span class="p">]</span> <span class="p">(</span><span class="nb">-> </span><span class="nv">t</span> <span class="nv">.getTime</span> <span class="nv">long->str</span><span class="p">))</span>
<span class="p">(</span><span class="k">def </span><span class="nv">idx-sep</span> <span class="s">"-"</span><span class="p">)</span>
<span class="p">(</span><span class="kd">defn </span><span class="nv">idxid</span> <span class="p">[</span><span class="nv">k</span> <span class="nv">tv</span><span class="p">]</span> <span class="p">(</span><span class="nb">str </span><span class="p">(</span><span class="nf">k-hash</span> <span class="nv">k</span><span class="p">)</span> <span class="nv">idx-sep</span> <span class="p">(</span><span class="nf">t-format</span> <span class="nv">tv</span><span class="p">)))</span>
</code></pre></div>
<p>The <code>index</code> will look like "e757fd4fedc4fe825bb81b1b466a0947-^@^@(#&%($"
Before the hyphen, we have
the md5 hash of the non-temporal part of the key. After the hyphen, we have TV in milliseconds,
subtracted from 10^14, and crammed into 8 characters.</p>
<p>For an exact <code>k</code> and a target <code>tv</code>, I
can construct the string and take the head of the sequence returned by <code>index-range</code>:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">get-at</span> <span class="p">[</span><span class="nv">conn</span> <span class="nv">k</span> <span class="nv">tv</span> <span class="nv">tt</span><span class="p">]</span>
<span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">tvf</span> <span class="p">(</span><span class="nf">t-format</span> <span class="nv">tv</span><span class="p">)</span>
<span class="nv">kh</span> <span class="p">(</span><span class="nf">k-hash</span> <span class="nv">k</span><span class="p">)</span>
<span class="nv">idx</span> <span class="p">(</span><span class="nb">str</span> <span class="nv">kh</span> <span class="nv">idx-sep</span> <span class="nv">tvf</span><span class="p">)</span>
<span class="nv">db</span> <span class="p">(</span><span class="k">-> </span><span class="nv">conn</span> <span class="nv">db</span> <span class="p">(</span><span class="nf">d/as-of</span> <span class="p">(</span><span class="k">first </span> <span class="nv">tt</span><span class="p">)))</span>
<span class="nv">e</span> <span class="p">(</span><span class="nf">some-></span> <span class="p">(</span><span class="nf">d/index-range</span> <span class="nv">db</span> <span class="ss">:bitemp/index</span> <span class="nv">idx</span> <span class="nv">nil</span><span class="p">)</span>
<span class="nv">seq</span> <span class="k">first </span><span class="nv">.v</span><span class="p">)]</span>
<span class="p">(</span><span class="nf">or</span> <span class="p">(</span><span class="nf">and</span> <span class="nv">e</span> <span class="p">(</span><span class="nf">.startsWith</span> <span class="nv">es</span> <span class="nv">kh</span><span class="p">)</span>
<span class="p">(</span><span class="k">first </span><span class="p">(</span><span class="nf">q</span> <span class="o">'</span><span class="p">[</span><span class="ss">:find</span> <span class="nv">?tv</span> <span class="nv">?v</span>
<span class="ss">:in</span> <span class="nv">$</span> <span class="nv">?i</span> <span class="nv">?k</span>
<span class="ss">:where</span>
<span class="p">[</span><span class="nv">?e</span> <span class="ss">:bitemp/index</span> <span class="nv">?i</span><span class="p">]</span>
<span class="p">[</span><span class="nv">?e</span> <span class="ss">:bitemp/k</span> <span class="nv">?k</span><span class="p">]</span>
<span class="p">[</span><span class="nv">?e</span> <span class="ss">:bitemp/value</span> <span class="nv">?v</span><span class="p">]</span>
<span class="p">[</span><span class="nv">?e</span> <span class="ss">:bitemp/tv</span> <span class="nv">?tv</span><span class="p">]]</span>
<span class="nv">db</span> <span class="nv">e</span> <span class="nv">k</span><span class="p">)))</span>
<span class="nv">nil</span><span class="p">)))</span>
</code></pre></div>
<p>The <code>(or (and e (.startsWith es kh)...</code> business takes care of the possibilities that
<code>index-range</code> returned <code>false</code>, or that there were no appropriate <code>tv</code>s, so it fell through
to the next <code>k</code>.</p>
<p>When we call <code>get-at</code>, the TT part of the query is handled by Datomic's <code>as-of</code>, while the
TV part uses our fancy composite index.</p>
<h2>Magic, part II (or the lack thereof)</h2>
<p>It's all very well and good to say that Datomic is "handling" the TT query, but how exactly
is it doing that? Datomic's persistent data structures are clever, but under the covers they must be
susceptible to the complexities discussed above in the <strong>Efficiency</strong> section.<br>
Datomic being closed source, it's hard to find out exactly how it executes queries, but some
information has been <a href="http://tonsky.me/blog/unofficial-guide-to-datomic-internals/">gleaned</a> from
forums and lectures. It seems that</p>
<div class="highlight"><pre><span></span><code> <span class="nv">To</span> <span class="nv">answer</span> <span class="nv">as</span><span class="o">-</span><span class="nv">of</span> <span class="nv">query</span> <span class="k">for</span> <span class="nv">moment</span> <span class="nv">T</span>, <span class="nv">current</span>, <span class="nv">in</span><span class="o">-</span><span class="nv">memory</span> <span class="nv">and</span>
<span class="nv">history</span> <span class="nv">parts</span> <span class="nv">get</span> <span class="nv">merged</span>, <span class="nv">and</span> <span class="k">then</span> <span class="nv">all</span> <span class="nv">data</span> <span class="nv">with</span> <span class="nv">timestamp</span> <span class="nv">after</span>
<span class="nv">moment</span> <span class="nv">T</span> <span class="nv">is</span> <span class="nv">ignored</span>. <span class="nv">Note</span> <span class="nv">that</span> <span class="nv">as</span><span class="o">-</span><span class="nv">of</span> <span class="nv">queries</span> <span class="k">do</span> <span class="nv">not</span> <span class="nv">require</span>
<span class="nv">older</span> <span class="nv">versions</span> <span class="nv">of</span> <span class="nv">current</span> <span class="nv">index</span>, <span class="nv">they</span> <span class="nv">use</span> <span class="nv">most</span> <span class="nv">recent</span> <span class="nv">current</span>
<span class="nv">index</span> <span class="nv">and</span> <span class="nv">filter</span> <span class="nv">it</span> <span class="nv">by</span> <span class="nv">time</span>, <span class="nv">deducing</span> <span class="nv">the</span> <span class="nv">previous</span> <span class="nv">view</span> <span class="nv">of</span> <span class="nv">the</span>
<span class="nv">database</span>.
</code></pre></div>
<p>So we're implicitly making the assumption proposed above, of infrequent revisions. If revisions for the same
query parameters are rampant, then Datomic will scan over the set of all versions and "filter it by time",
which will, as in the SQL case, be dreadful.</p>
<p>To test this implementation realistically, we'll want to insert a lot of entries, and to do so in reasonable
batch sizes. A single transaction can of course encompass multiple operations:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">insert-values</span> <span class="p">[</span><span class="nv">conn</span> <span class="nv">k-tv-values</span><span class="p">]</span>
<span class="s">"Insert multiple values at [[k1 tv1 v1] [k2 tv2 v2] ...] and return the transaction single time"</span>
<span class="p">(</span><span class="k">-> </span><span class="o">@</span><span class="p">(</span><span class="nf">d/transact</span> <span class="nv">conn</span> <span class="p">(</span><span class="nb">map</span> <span class="p">(</span><span class="nf">partial</span> <span class="nv">apply</span> <span class="nv">insert-tx</span><span class="p">)</span> <span class="nv">k-tv-values</span><span class="p">))</span>
<span class="ss">:tx-data</span> <span class="k">first </span> <span class="nv">.v</span><span class="p">))</span>
</code></pre></div>
<p>Since I can't think of any more funny addresses, the test will involve random data.
In a single transaction, we'll insert things like <code>nKeys</code> different keys, of the form <code>Thing${kKey}</code>;
we'll do this over <code>nTv</code> explicitly different times; and we'll repeat the exercise <code>nTt</code> times,
implicitly picking up increasing transaction times, which we then return. The values will be of
the form <code>Thing${kKey}v${jTv}t${iTt}</code> so we can easily verify later if they're correct.</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">insert-lots</span> <span class="p">[</span><span class="nv">conn</span> <span class="nv">nKeys</span> <span class="nv">nTv</span> <span class="nv">nTt</span><span class="p">]</span>
<span class="s">"Insert lots of stuff. nKeys unique keys, nTv tv's, nTt nt's in batches of nKeys.</span>
<span class="s">Returns a list of nTt transaction times suitable for passing into query-lots."</span>
<span class="p">(</span><span class="k">for</span> <span class="p">[</span><span class="nv">iTt</span> <span class="p">(</span><span class="nb">range</span> <span class="nv">nTt</span><span class="p">)]</span>
<span class="p">(</span><span class="nf">last</span> <span class="p">(</span><span class="k">for</span> <span class="p">[</span><span class="nv">jTv</span> <span class="p">(</span><span class="nb">range</span> <span class="nv">nTv</span><span class="p">)]</span>
<span class="p">(</span><span class="k">do </span> <span class="p">(</span><span class="nf">insert-values</span> <span class="nv">conn</span>
<span class="p">(</span><span class="k">for</span> <span class="p">[</span><span class="nv">k</span> <span class="p">(</span><span class="nb">range</span> <span class="nv">nKeys</span><span class="p">)]</span>
<span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">k</span> <span class="p">(</span><span class="nb">str</span> <span class="s">"Thing"</span> <span class="nv">k</span><span class="p">)</span>
<span class="nv">tv</span> <span class="p">(</span><span class="nf">jd</span> <span class="p">(</span><span class="nf">*</span> <span class="mi">10</span> <span class="nv">jTv</span><span class="p">))</span>
<span class="nv">v</span> <span class="p">(</span><span class="nb">str</span> <span class="nv">k</span> <span class="s">"v"</span> <span class="nv">jTv</span> <span class="s">"t"</span> <span class="nv">iTv</span><span class="p">)]</span>
<span class="p">[</span><span class="nv">k</span> <span class="nv">tv</span> <span class="nv">v</span><span class="p">]))))))))</span>
</code></pre></div>
<p>Note that we've made explicit choices about how easy to make this for Datomic.
Remember that the index is going to be stored in an order that is first increasing in <code>kKey</code>
and then decreasing in <code>jTv</code>. For every <code>(kKey,jTv)</code>, there will be a block of differing
transaction times. So our insertions are in a raster pattern:</p>
<p><img alt="" src="./images/bitemp-raster.png"></p>
<p>Depending on our planned access patterns, we might want to restructure our index key someday, so that
temporally proximate insertions are spatially proximate as well. For example, if we were storing stock
prices that tended to arrive in increasing TV, we might put that part of the key first. Our decision
will reflect both our access patterns (which we may be able to know in advance) and Datomic's caching
strategy (which is a little less clear and, unfortunately, not directly observable).</p>
<p>For batches of 20 or so keys, on the free, local storage variant of Datomic, insertions stabilize at around
750-1000 microseconds. Random reads (note the <code>assert</code>ion that we're finding the values we looked for)</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">query-lots</span> <span class="p">[</span><span class="nv">conn</span> <span class="nv">nKeys</span> <span class="nv">nTv</span> <span class="nv">tts</span> <span class="nv">n</span><span class="p">]</span>
<span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">tts</span> <span class="p">(</span><span class="nf">vec</span> <span class="nv">tts</span><span class="p">)</span>
<span class="nv">nTt</span> <span class="p">(</span><span class="nf">count</span> <span class="nv">tts</span><span class="p">)]</span>
<span class="p">(</span><span class="nf">dotimes</span> <span class="p">[</span><span class="nv">_</span> <span class="nv">n</span><span class="p">]</span>
<span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">i</span> <span class="p">(</span><span class="nf">rand-int</span> <span class="nv">nTt</span><span class="p">)</span>
<span class="nv">tt</span> <span class="p">(</span><span class="k">get </span><span class="nv">tts</span> <span class="nv">i</span><span class="p">)</span>
<span class="nv">j</span> <span class="p">(</span><span class="nf">rand-int</span> <span class="nv">nTv</span><span class="p">)</span>
<span class="nv">tv</span> <span class="p">(</span><span class="nf">jd</span> <span class="p">(</span><span class="nf">*</span> <span class="mi">10</span> <span class="nv">j</span><span class="p">))</span>
<span class="nv">k</span> <span class="p">(</span><span class="nb">str</span> <span class="s">"Thing"</span> <span class="p">(</span><span class="nf">rand-int</span> <span class="nv">nKeys</span><span class="p">))</span>
<span class="nv">v</span> <span class="p">(</span><span class="nf">second</span> <span class="p">(</span><span class="nf">get-at</span> <span class="nv">conn</span> <span class="nv">k</span> <span class="nv">tv</span> <span class="nv">tt</span><span class="p">))</span>
<span class="nv">ve</span> <span class="p">(</span><span class="nb">str</span> <span class="nv">k</span> <span class="s">"v"</span> <span class="nv">j</span> <span class="s">"t"</span> <span class="nv">i</span><span class="p">)</span> <span class="p">]</span>
<span class="p">(</span><span class="k">assert</span> <span class="p">(</span><span class="nf">=</span> <span class="nv">v</span> <span class="nv">ve</span><span class="p">))</span>
<span class="p">))))</span>
</code></pre></div>
<p>stabilize at around 500-750 microseconds, even for data volumes in excess of JVM heap size.</p>
<p>In conclusion, Datomic is a very promising option for storing inhomogeneous bitemporal data, but we
must always think carefully about how are access will actually be implemented, to avoid cache thrashing.
The presentation of the database as a persistent data structure, with scrolling access to previous states,
naturally captures the <code>TT</code> dimension, while <code>TV</code> needs to be implemented schematically; the
assymetrical relationship between the two times means we can't do it the other way around.</p>
<!-- LocalWords: txt TT Hauser SQL sql desc dataset poseable img tv
-->
<!-- LocalWords: src datapoint multi clj ident bitemp valueType tt
-->
<!-- LocalWords: defn conn tempid idxid Datalog str li xFF getTime
-->
<!-- LocalWords: idx sep tvf kh startsWith
-->Orders of Magnitude for Free!2014-03-28T00:00:00-04:002014-03-28T00:00:00-04:00Peter Fraenkeltag:blog.podsnap.com,2014-03-28:/entropy.html<p>In honor of <a href="http://bangbangcon.com">!!con</a>, I was trying to think of programming-relevant uses
of double exclamation marks. Other than denoting the end of a particularly funny programming joke
(for example, a mangled homage to a <a href="http://ttic.uchicago.edu/~dreyer/course/papers/wadler.pdf">famous paper</a>),
it seemed the best place to look might be in the world of
<a href="http://en.wikipedia.org/wiki/Double_factorial">double factorials</a>. As it turns out, that wasn't as fruitful
as I'd hoped, but I learned something interesting on the way to finding that out, and here it is.</p>
<h1>TL;DR</h1>
<p>You can learn interesting things about the order of complexity of an algorithm without knowing anything about how it works …</p><p>In honor of <a href="http://bangbangcon.com">!!con</a>, I was trying to think of programming-relevant uses
of double exclamation marks. Other than denoting the end of a particularly funny programming joke
(for example, a mangled homage to a <a href="http://ttic.uchicago.edu/~dreyer/course/papers/wadler.pdf">famous paper</a>),
it seemed the best place to look might be in the world of
<a href="http://en.wikipedia.org/wiki/Double_factorial">double factorials</a>. As it turns out, that wasn't as fruitful
as I'd hoped, but I learned something interesting on the way to finding that out, and here it is.</p>
<h1>TL;DR</h1>
<p>You can learn interesting things about the order of complexity of an algorithm without knowing anything about how it works.</p>
<h1>Informational entropy</h1>
<p>In statistical mechanics, the entropy $S$ of a physical system that can exist in any of multiple states $i$,
with probability $p_i$ is</p>
<p>$$
S = - k_B \sum_i p_i \ln p_i
$$</p>
<p>where $k_B$ is the Boltzmann constant. If you're not a physics person, don't worry about $k_B$, or about
why it turns out that this is the same entropy that relates temperature $T$ to work $Q$:</p>
<p>$$
\delta Q = T \delta S
$$</p>
<p>This thermodynamic relationship tells us that
the amount of work you have to do to move a smidgen of entropy $\delta S$ around is proportional to $\delta S$.</p>
<p><a href="http://en.wikipedia.org/wiki/Claude_Shannon">Claude Shannon</a> defined informational entropy for a variable with multiple
discrete values $i$ that it can assume with probability $p_i$ in almost the same way,</p>
<p>$$ S = - \sum_i p_i \log p_i $$</p>
<p>the difference being a trivial multiplicative constant, since we're not specific about the base of the $\log$, and we've dropped
the Boltzmann constant (which is an artifact of the units we use to measure energy).</p>
<p>Note that if you have a system with exactly one state, its entropy is zero, since the probability of that state is $p=1$ and
$$\log 1 = 0$. Also, if you have a system with $N$ equally probable states, its entropy is just $\log N$:</p>
<p>$$
S_N = \sum_{i=1}^N \frac{1}{N} \log \frac{1}{N} = \frac{N}{N} \log N = \log N
$$</p>
<h1>Sorting Algorithms</h1>
<p>Suppose my variable is a list of $n$ comparable values, all different, in random order. The number of possible random
arrangements is $N_n = n!$, and therefore the entropy is $S = \log n!$:</p>
<p>$$
\log n! = \log n + \log (n-1) + \cdots
$$</p>
<p>This looks like it will come in close to $n \log n$. More exactly and formally, that result is known
as <a href="http://en.wikipedia.org/wiki/Stirling's_approximation">Sterling's approximation</a>,
which is not horribly difficult to derive, but we'll just take it as given that</p>
<p>$$
S_n = \log n! = n \log n - n + \mathcal{O}(\log n)
$$</p>
<p>If we were to sort this list, we would reduce its entropy to $\log 1 = 0$, so we conclude that the amount of work that
must be done to sort a random list is proportional to $(n \log n - n)$, which for large $n$ is really
$$n \log n$. Of course you
could expend more work sorting the list if you don't know what you're doing.</p>
<p>Knowing this, it is not surprising that the non-awful sorting algorithms are all $\mathcal{O}(n \log n)$, but I think it
<em>is</em> surprising that we could come to this conclusions without even looking at the algorithms themselves.</p>
<h1>The mystery of heapsort</h1>
<p>You know about <a href="http://en.wikipedia.org/wiki/Heapsort">heapsort</a>. Once our list of numbers is in a binary heap,
then removing the minimum entry is an $\mathcal{O}(\log n)$ operation, and doing that n times takes
$\mathcal{O}(n \log n)$, which makes it a contender.</p>
<p>Something I've often wondered about is the step where we heapify the random list in the first place.
We start out with something that will take $\mathcal{O}(n \log n)$ to sort by any number of methods, then do some
non-trivial work, and end up with a heap, from which it will still take $\mathcal{O}(n \log n)$ to obtain the
sorted list. What's the point?</p>
<p>It's interesting to think about how the act of heapifying affects the entropy of the system, i.e. what is the
entropy of a heap compared to that of a totally shuffled list. I am indebted to David Blackston, who suggested
an interesting approach to calculating this entropy. Unfortunately, he's a luddite, so I can't link to him. Here
is my somewhat physicsy (i.e. despicably over-simplified) variation on his calculation:</p>
<p>First, we convince ourselves that if we can write $n=2k+1$, the total number of possible heaps for a randomized list
satisfies the recurrence relation (Dave likes recurrence relations):</p>
<p>$$
H_{2k+1} = \binom{2k}{k} H_k^2
$$</p>
<p>The way to think about this is</p>
<ol>
<li>We know the smallest number is at the top of the heap, so that's fixed.</li>
<li>The remaining numbers can be divided equally in "2k choose k" different sets.</li>
<li>Each of those sets can be represented in $H_k$ possible heaps.</li>
</ol>
<p>Now let's add some apparently unnecessary complications to what we already know about the number of
possible random lists of length $n=2k+1$, multiplying and dividing by the same thing:</p>
<p>$$
\begin{align<em>}
N_{2k+1} &= (2k+1)!\
&= (2k+1) (2k)!\
&= (2k+1) \frac{(2k)!}{(k!)^2}k!^2\
&= (2k+1) \binom{2k}{k} N_k^2\
\end{align</em>}
$$</p>
<p>where in the last step we use $\binom{n}{m} = \frac{n!}{(n-m)!m!}$. Define</p>
<p>$$
\begin{align<em>}
\alpha_n &\equiv \frac{N_n}{H_n} \
\alpha_{2k+1} &= \frac{N_{2k+1}}{H_{2k+1}} \
&= \frac{(2k+1) \binom{2k}{k} N_k^2}{\binom{2k}{k} H_k^2}
\end{align</em>}
$$</p>
<p>so</p>
<p>$$
\begin{align<em>}
\alpha_{2k+1} &= (2k + 1) \alpha_k^2\
\log \alpha_{2k+1} &=
\log(2k + 1) + 2 \log \alpha_k\
\end{align</em>}
$$</p>
<p>Since k+1 is likely to be large, it's close enough to k for government work, and we'll
also drop the $\log k$ term since it looks like $\alpha_k$ will be larger.</p>
<p>$$
\log \alpha_{2k} \simeq 2 \log \alpha_k\
$$</p>
<p>So it must be, therefore, that $\log\alpha$ is linear in its subscript, for some constant a:</p>
<p>$$
\log \alpha_n \equiv \log N_{2k+1} - \log H_{2k+1} \simeq a \cdot n
$$</p>
<p>That is, heapify has removed entropy proportional to n from the fully randomized list. So now we know what the
point was. It did get us at least partially on the way down from the starting entropy. Furthermore, we also know
that heapifying really oughtn't cost more than $\mathcal{O}(n)$.</p>
<p>That last sentence might seem counter-intuitive, since adding an element to a heap takes $\mathcal{O}(\log n)$, so it
would seem that building up a heap of n elements should take $\mathcal{O}(n \log n)$. Actually, it
doesn't. You can <a href="http://www.cs.umd.edu/~meesh/351/mount/lectures/lect14-heapsort-analysis-part.pdf">analyze heapify carefully</a>,
but the gist is that, since the vast majority of elements in a heap end up living in the bottom few rows, you rarely
climb the full $\log n$ when inserting.</p>
<p>Once again, without knowing anything about the algorithm, we've been able to pronounce authoritatively on
its order of complexity. I think this is cool.</p>
<h1>!!</h1>
<p>The double factorial is defined for odd numbers n as</p>
<p>$$
n!! = n\cdot(n-2)\cdot(n-5)\cdot\cdot\cdot 3
$$</p>
<p>As it happens, the number of possible <em>heap ordered trees</em> with $k+1$ nodes is $(2k+1)!!$. When I first learned this,
I didn't immediately realize that such trees are not just binary heaps: any node can be of any order. So, while
good times were had calculating the entropy of an object known only to be a heap ordered tree (aka "increasing
ordered tree"), the experience didn't prove helpful in understanding heapsort.</p>Scala for Beginners - The secret information "they" don't want you to know.2014-02-03T00:00:00-05:002014-02-03T00:00:00-05:00Peter Fraenkeltag:blog.podsnap.com,2014-02-03:/scala-conspiracy.html<p><a href="http://ljungblad.nu/">Marcus Ljungblad</a> is going to give an amazing
<a href="http://scaladays.org/">Scaladays</a>
talk about the process of learning Scala, and he put out a
call to his fellow <a href="http://hackerschool.com">hackerschool</a>ers for
suggestions. I was glad to chip in, because I enjoy the opportunity
to blurt out opinions without the obligation to structure them
eloquently, and talking in front of people is scary, so it's awesome
to have somebody else do it. Still, I kind of like my suggestions,
and with Marcus' permission, I'm going to offer them here. This isn't
a getting started guide, and it certainly isn't a tutorial. I'm assuming
you …</p><p><a href="http://ljungblad.nu/">Marcus Ljungblad</a> is going to give an amazing
<a href="http://scaladays.org/">Scaladays</a>
talk about the process of learning Scala, and he put out a
call to his fellow <a href="http://hackerschool.com">hackerschool</a>ers for
suggestions. I was glad to chip in, because I enjoy the opportunity
to blurt out opinions without the obligation to structure them
eloquently, and talking in front of people is scary, so it's awesome
to have somebody else do it. Still, I kind of like my suggestions,
and with Marcus' permission, I'm going to offer them here. This isn't
a getting started guide, and it certainly isn't a tutorial. I'm assuming
you have those, but that they do not sufficiently stress the following:</p>
<ol>
<li>
<p>As early as possible, learn how to use <a href="http://www.scala-sbt.org/">sbt</a>. Any OSS project you clone will have
a <code>build.sbt</code>, and anything you share yourself will be expected to have one too.</p>
</li>
<li>
<p>From a <code>build.sbt</code>, it is easy to create configuration files
for Eclipse (using
<a href="https://github.com/typesafehub/sbteclipse">sbteclipse</a>), IntelliJ
(the latest version allows you to import a <code>built.sbt</code>
directly), or
emacs/<a href="https://github.com/aemoncannon/ensime">ensime</a> (which you shouldn't use
unless you're a nut <sup id="fnref:1"><a class="footnote-ref" href="#fn:1">1</a></sup>) It is not
easy to go the other way. Whenever you change dependencies or versions,
make the changes in your <code>build.sbt</code> <strong>first</strong> regenerate the IDE configs there, and
bounce the IDE.</p>
<p>Otherwise, it will be hell to get back to a state that can be shared with the world.</p>
</li>
<li>
<p>Learn what <a href="http://en.wikipedia.org/wiki/Persistent_data_structure">persistent data structures</a> are, and
understand that Scala's <code>immutable</code> collections are implementations of them. Otherwise, you'll find
yourself wondering - as I did, nearly causing me to ditch the entire language in disgust - why a person would ever
want an immutable hash map. Clojure partisans justifiably make a bit deal of their persistent data structures and how they
allow you to code efficiently without mutable state. Scala largely uses the same algorithms but for some reason doesn't
trumpet the fact. If you keep it in mind, you'll be far less hesitant to dive into truly functional code.</p>
</li>
<li>
<p>You'll learn about the REPL soon enough. What you probably won't learn soon
is that</p>
<div class="highlight"><pre><span></span><code><span class="n">scala</span> <span class="o">-</span><span class="n">Xprint</span><span class="o">:</span><span class="n">parser</span> <span class="o">-</span><span class="n">e</span> <span class="s">"for (i <- 1 to 5; j <- 1 to 4) yield i*j // or whatever"</span><span class="n">``</span>`
</code></pre></div>
<p>will show you how Scala desugars your code, revealing what's really going on in some
sophisticated code snippet you're trying to grok. In
a similar vein <code>scala -Xshow-phases</code> lists all the other things you can <code>-Xprint</code>.</p>
</li>
<li>
<p>The preceding example, in particular, demonstrates something else to get under your belt as
soon as possible. <code>for</code> comprehensions are not just funny <code>for</code> loops. They are
handy sugar for <code>map</code> and <code>flatMap</code>, and they get used for much more than just looping.
Delaying this realization will make the language seem easier, but in the long run it'll cost
you. You need to know about this <code>map</code>ing business, because it shows up everywhere, for
example in the <code>Option</code> that gets returned when you do something innocuous like a hash lookup.</p>
</li>
<li>
<p>Which brings us to pattern matching, destructuring and case classes.
Things like</p>
<div class="highlight"><pre><span></span><code>whatever match {
case Some(Foo(x)) => doSomethingWith(x)
case None => doSomethingElse()
}
</code></pre></div>
<p>will show up over and over again, and it's tempting to think of them as a cool feature that maybe you'll use
someday. Actually, you'll use it immediately and again and again.</p>
</li>
<li>
<p>Stay away from DSLs as long as possible, which basically means that if you come across funny punctuation
symbols used in a weird way, you should wait until you've implemented something like that yourself.
<a href="http://akka.io">Akka</a> is amazing, but using it too early makes the language seem magical and suppresses
your deeper understanding of it.</p>
</li>
<li>
<p>Bookmark this <a href="http://docs.scala-lang.org/tutorials/FAQ/finding-symbols.html">FAQ</a> on the "standard" weird
symbols, and count on referring to the section on underscores relatively often.</p>
</li>
</ol>
<p>Bonne chance.</p>
<div class="footnote">
<hr>
<ol>
<li id="fn:1">
<p>Like me. <a class="footnote-backref" href="#fnref:1" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
</ol>
</div>FUNCTIONAL functional reactive programming, state monads and all that, in Clojure2014-01-24T16:00:00-05:002014-01-24T16:00:00-05:00Peter Fraenkeltag:blog.podsnap.com,2014-01-24:/reactive-clj.html<p>In <a href="http://blog.podsnap.com/reactive.html">my last post</a>, I talked a bit about how FRP might look if
state were maintained explicitly in persistent data structures, rather than in hidden mutable structures.
The accompanying code was in Scala, but my first implementation was actually in Clojure. I was originally
going to use the Clojure code in the post, but, having taken motivating example code from a Scala
<a href="http://lampwww.epfl.ch/~imaier/pub/DeprecatingObserversTR2010.pdf">paper</a>, it felt lazy to switch to
Clojure just because I felt like it.</p>
<p>That said, it really was more fun to write in Clojure, and in some ways I think it is clearer. Additionally,
it seems …</p><p>In <a href="http://blog.podsnap.com/reactive.html">my last post</a>, I talked a bit about how FRP might look if
state were maintained explicitly in persistent data structures, rather than in hidden mutable structures.
The accompanying code was in Scala, but my first implementation was actually in Clojure. I was originally
going to use the Clojure code in the post, but, having taken motivating example code from a Scala
<a href="http://lampwww.epfl.ch/~imaier/pub/DeprecatingObserversTR2010.pdf">paper</a>, it felt lazy to switch to
Clojure just because I felt like it.</p>
<p>That said, it really was more fun to write in Clojure, and in some ways I think it is clearer. Additionally,
it seems more interesting to take about state monads in Clojure, and state monads are bound to come up
when you compose stateful operations.</p>
<p>The code is on <a href="https://github.com/pnf/clojure-playground/blob/master/src/clj/playground/reactive.clj">github</a>
with illustrative snippets below. As in the Scala post, I'll glide over a few details - some (but not all) of which are dealt
with properly in the comitted code.</p>
<p>This will all be easier to follow if you've read the previous post at least through the "FRP in Finance" section.</p>
<h2>Stop!!!</h2>
<p>Have you read <a href="http://blog.podsnap.com/reactive.html">the previous post</a> up through the "FRP in Finance" section?
The Scala content is minimal, I swear.</p>
<h2>Go</h2>
<p>Then, starting from here, we'll:</p>
<ol>
<li>Set out the basic framework.</li>
<li>Pretty it up with macros.</li>
<li>Show how the state monad might fit in but then decide not to use it.</li>
<li>Illustrate the advantages of a functional approach with an example from finance.</li>
</ol>
<h2>Basic structure</h2>
<p>The structure of the directed acyclic graph (<strong>the DAG</strong>) will be represented in a nested map that looks like this:</p>
<div class="highlight"><pre><span></span><code> <span class="p">{</span><span class="ss">:leafnode</span> <span class="p">{</span><span class="ss">:deps</span> <span class="o">#</span><span class="p">{</span><span class="ss">:nodethatdepends</span> <span class="ss">:anotherone</span><span class="p">}</span>
<span class="ss">:value</span> <span class="nv">VALUE</span> <span class="p">}</span>
<span class="ss">:nodethatdepends</span> <span class="p">{</span><span class="ss">:function</span> <span class="o">#</span><span class="nv"><somefunctionthing></span>
<span class="ss">:args</span> <span class="o">#</span><span class="p">{</span><span class="ss">:leafnode</span><span class="p">}</span>
<span class="ss">:dirty</span> <span class="nv">TRUEORFALSE</span><span class="p">}}</span>
</code></pre></div>
<p>We will write functions that:</p>
<ol>
<li>Build up the DAG, keeping track of what depends on what and how.</li>
<li>Propagate the <code>:dirty</code> flag through dependends whenever we change a leaf value.</li>
<li>Evaluate the functions at the dirty nodes when we retrieve a value.</li>
</ol>
<p>The first thing we might like to do is set a leaf value:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">set-val</span> <span class="p">[</span><span class="nv">dag</span> <span class="nv">k</span> <span class="nv">v</span><span class="p">]</span>
<span class="p">(</span><span class="nb">-> </span><span class="nv">dag</span>
<span class="p">(</span><span class="nf">assoc-in</span> <span class="p">[</span><span class="nv">k</span> <span class="ss">:value</span><span class="p">]</span> <span class="nv">v</span><span class="p">)</span>
<span class="p">(</span><span class="nf">sully</span> <span class="nv">k</span><span class="p">)))</span>
</code></pre></div>
<p>The job of <code>sully</code> is to propagate <code>:dirty</code>ness. We'll get to that in a moment. The main thing
is that it returns the DAG, just like all the standard <code>clojure.core</code> "change" functions, using
the <code>-></code> macro to thread through operators on it.</p>
<p>Setting a function is almost as easy (though, for the moment, the
functions themselves won't be very attractive).</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nb">set</span><span class="nv">-fn</span> <span class="p">[</span><span class="nv">dag</span> <span class="nv">k</span> <span class="nv">kargs</span> <span class="nv">dag->val</span><span class="p">]</span>
<span class="p">(</span><span class="k">-> </span><span class="nv">dag</span>
<span class="p">(</span><span class="nf">assoc-in</span> <span class="p">[</span><span class="nv">k</span> <span class="ss">:args</span><span class="p">]</span> <span class="nv">kargs</span><span class="p">)</span> <span class="c1">;[1]</span>
<span class="p">(</span><span class="nf">sully</span> <span class="nv">k</span><span class="p">)</span> <span class="c1">;[2]</span>
<span class="p">(</span><span class="nf">assoc-in</span> <span class="p">[</span><span class="nv">k</span> <span class="ss">:function</span><span class="p">]</span> <span class="nv">dag->val</span><span class="p">)</span> <span class="c1">;[3]</span>
<span class="p">(</span><span class="nb">set</span><span class="nv">-deps</span> <span class="nv">k</span> <span class="nv">kargs</span><span class="p">)))</span> <span class="c1">;[4]</span>
</code></pre></div>
<p>We store the arguments <strong>[1]</strong>, sully <strong>[2]</strong>
as necessary (but, again, forget about how that might work for a moment)
and store the function <strong>[3]</strong>. The purpose of calling
<code>set-deps</code> <strong>[4]</strong> is to add the keywords in <code>kargs</code> to all
the <code>:deps</code> sets of every node we depend on.</p>
<p>In <code>set-deps</code>, we iterate over the argument list, poking our id into every node on
which we depend. Of course we don't actually change any node: at each iteration we just create a new map containing
a new node containing a new set of dependents that now includes us. That's a lot of new maps, so
we're putting a lot of faith in the efficiency of the persistent data structure and the cleverness
of the JVM. (I did warn you that no attention would be paid performance.)</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nb">set</span><span class="nv">-deps</span> <span class="p">[</span><span class="nv">dag</span> <span class="nv">kf</span> <span class="nv">kargs</span><span class="p">]</span>
<span class="p">(</span><span class="nf">reduce</span> <span class="p">(</span><span class="kd">fn </span><span class="p">[</span><span class="nv">dag</span> <span class="nv">k</span><span class="p">]</span> <span class="c1">;[1]</span>
<span class="p">(</span><span class="nf">update-in</span> <span class="nv">dag</span> <span class="p">[</span><span class="nv">k</span> <span class="ss">:deps</span><span class="p">]</span> <span class="o">#</span><span class="p">(</span><span class="k">if</span> <span class="nv">%</span> <span class="p">(</span><span class="nf">conj</span> <span class="nv">%</span> <span class="nv">kf</span><span class="p">)</span> <span class="o">#</span><span class="p">{</span><span class="nv">kf</span><span class="p">})))</span> <span class="c1">;[2]</span>
<span class="nv">dag</span> <span class="nv">kargs</span><span class="p">))</span> <span class="c1">;[3]</span>
</code></pre></div>
<p>This is compactly
accomplished by reducing <strong>[1]</strong> over the <code>kargs</code>, starting with
the original <code>dag</code> <strong>[3]</strong>
and repeatedly using <code>update-in</code> to <code>conj</code> <strong>[2]</strong> in the new dependencies.</p>
<p>With these two functions, we can create a primitive DAG:</p>
<div class="highlight"><pre><span></span><code><span class="n">playground</span><span class="p">.</span><span class="n">reactive</span><span class="o">></span> <span class="p">(</span><span class="o">-></span> <span class="p">{}</span> <span class="p">(</span><span class="n">set</span><span class="o">-</span><span class="nl">val</span> <span class="p">:</span><span class="n">a</span> <span class="mf">3.0</span><span class="p">)</span>
<span class="p">(</span><span class="n">set</span><span class="o">-</span><span class="nl">fn</span> <span class="p">:</span><span class="n">c</span> <span class="p">[</span><span class="o">:</span><span class="nl">a</span> <span class="p">:</span><span class="n">b</span><span class="p">]</span> <span class="p">(</span><span class="n">fn</span> <span class="p">[</span><span class="n">dag</span><span class="p">]</span> <span class="p">(</span><span class="o">+</span> <span class="p">(</span><span class="nl">dag</span> <span class="p">:</span><span class="n">a</span><span class="p">)</span> <span class="p">(</span><span class="nl">dag</span> <span class="p">:</span><span class="n">b</span><span class="p">)))))</span>
<span class="p">{</span><span class="o">:</span><span class="n">c</span> <span class="p">{</span><span class="o">:</span><span class="n">function</span> <span class="err">#</span><span class="o"><</span><span class="n">reactive$eval10321$fn__10322</span> <span class="n">playground</span><span class="p">.</span><span class="n">reactive$eval10321$fn__10322</span><span class="mf">@5f</span><span class="mi">3</span><span class="n">acfa8</span><span class="o">></span><span class="p">,</span> <span class="o">:</span><span class="n">dirty</span> <span class="nb">true</span><span class="p">,</span> <span class="o">:</span><span class="n">args</span> <span class="p">[</span><span class="o">:</span><span class="nl">a</span> <span class="p">:</span><span class="n">b</span><span class="p">]},</span>
<span class="o">:</span><span class="n">a</span> <span class="p">{</span><span class="o">:</span><span class="n">deps</span> <span class="err">#</span><span class="p">{</span><span class="o">:</span><span class="n">c</span><span class="p">},</span> <span class="o">:</span><span class="n">value</span> <span class="mf">3.0</span><span class="p">},</span>
<span class="o">:</span><span class="n">b</span> <span class="p">{</span><span class="o">:</span><span class="n">deps</span> <span class="err">#</span><span class="p">{</span><span class="o">:</span><span class="n">c</span><span class="p">}}}</span>
</code></pre></div>
<p>Both <code>a</code> and <code>b</code> "know" that <code>c</code> is a dependent, so if either changes, we will
know that the latter must too.</p>
<p>Of course this won't do anything yet. There are two additional important functions to add.</p>
<p>First, the aforementioned <code>sully</code>.
This function will follow the trail of <code>:deps</code>, marking every node it finds as <code>:dirty</code>, i.e.
requiring calculation. This technique is usually called "dirty bit propagation," no my nomenclature
isn't inexcusably fanciful.</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn- </span><span class="nv">sully</span> <span class="p">[</span><span class="nv">dag</span> <span class="nv">k</span><span class="p">]</span>
<span class="p">(</span><span class="k">if </span><span class="p">(</span><span class="nf">get-in</span> <span class="nv">dag</span> <span class="p">[</span><span class="nv">k</span> <span class="ss">:dirty</span><span class="p">])</span> <span class="nv">dag</span> <span class="c1">;[1]</span>
<span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">isfn</span> <span class="p">(</span><span class="nf">get-in</span> <span class="nv">dag</span> <span class="p">[</span><span class="nv">k</span> <span class="ss">:function</span><span class="p">])</span>
<span class="nv">dag</span> <span class="p">(</span><span class="nb">if-not </span><span class="nv">isfn</span> <span class="nv">dag</span> <span class="p">(</span><span class="nf">assoc-in</span> <span class="nv">dag</span> <span class="p">[</span><span class="nv">k</span> <span class="ss">:dirty</span><span class="p">]</span> <span class="nv">true</span><span class="p">))</span> <span class="c1">;[2]</span>
<span class="nv">deps</span> <span class="p">(</span><span class="nf">get-in</span> <span class="nv">dag</span> <span class="p">[</span><span class="nv">k</span> <span class="ss">:deps</span><span class="p">])]</span>
<span class="p">(</span><span class="nb">reduce </span><span class="nv">sully</span> <span class="nv">dag</span> <span class="nv">deps</span><span class="p">))))</span> <span class="c1">;[3]</span>
</code></pre></div>
<p>This is a classic recursive depth-first-search, except that we're using <code>reduce</code> <strong>[3]</strong> to
iterate over children, and the recursion occurs in the reduction function.
The search gets truncated <strong>[1]</strong> if the node is already dirty; otherwise,
we dirty the function nodes <strong>[2]</strong>. Our reliance on an efficient persistent
map is even greater here than it was for <code>set-deps</code>.</p>
<p>Second, we need something to actually evaluate our functions. This is
also a depth first search, but it searches back along the <code>:args</code> trail,
rather than forward along <code>deps:</code>:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="n">defn</span><span class="o">-</span> <span class="n">ensure</span><span class="o">-</span><span class="n">val</span>
<span class="p">[</span><span class="n">dag</span> <span class="n">k</span><span class="p">]</span>
<span class="p">(</span><span class="n">let</span> <span class="p">[</span><span class="n">node</span> <span class="p">(</span><span class="n">get</span> <span class="n">dag</span> <span class="n">k</span><span class="p">)</span>
<span class="n">dag</span><span class="o">-></span><span class="n">val</span> <span class="p">(</span><span class="n">get</span> <span class="nl">node</span> <span class="p">:</span><span class="n">function</span><span class="p">)</span>
<span class="n">dirty</span> <span class="p">(</span><span class="n">get</span> <span class="nl">node</span> <span class="p">:</span><span class="n">dirty</span><span class="p">)]</span>
<span class="p">(</span><span class="k">if</span><span class="o">-</span><span class="n">not</span> <span class="p">(</span><span class="n">and</span> <span class="n">function</span> <span class="n">dirty</span><span class="p">)</span> <span class="n">dag</span> <span class="p">;[</span><span class="mi">1</span><span class="p">]</span>
<span class="p">(</span><span class="o">-></span> <span class="n">dag</span> <span class="p">;[</span><span class="mi">2</span><span class="p">]</span>
<span class="p">(</span><span class="n">as</span><span class="o">-></span> <span class="o">%</span> <span class="p">(</span><span class="n">reduce</span> <span class="n">ensure</span><span class="o">-</span><span class="n">val</span> <span class="o">%</span> <span class="p">(</span><span class="nl">node</span> <span class="p">:</span><span class="n">args</span><span class="p">)))</span> <span class="p">;[</span><span class="mi">3</span><span class="p">]</span>
<span class="p">(</span><span class="n">as</span><span class="o">-></span> <span class="o">%</span> <span class="p">(</span><span class="n">assoc</span><span class="o">-</span><span class="k">in</span> <span class="o">%</span> <span class="p">[</span><span class="nl">k</span> <span class="p">:</span><span class="n">value</span><span class="p">]</span> <span class="p">(</span><span class="n">dag</span><span class="o">-></span><span class="n">val</span> <span class="o">%</span><span class="p">)))</span> <span class="p">;[</span><span class="mi">4</span><span class="p">]</span>
<span class="p">(</span><span class="n">assoc</span><span class="o">-</span><span class="k">in</span> <span class="p">[</span><span class="nl">k</span> <span class="p">:</span><span class="n">dirty</span><span class="p">]</span> <span class="nb">false</span><span class="p">)))))</span> <span class="p">;[</span><span class="mi">5</span><span class="p">]</span>
</code></pre></div>
<p>If this is not function node or it's not dirty, there's nothing to do here, so
the search stopss <strong>[1]</strong>.
Otherwise, we'll thread <strong>[2]</strong> <code>dag</code> through some accretions,
first <strong>[3]</strong> ensuring recursively that any arguments are available,
then <strong>[4]</strong> evaluating the function and setting its return value in the node,
finally <strong>[5]</strong> clearing the dirty flag. When all is said and done, we'll
have a DAG from which the value for <code>k</code> may be safely plucked.</p>
<p>Now we need a way to get information out of the graph.
Since the heavy lifting is done by <code>ensure</code>, there's not a lot to this function,
other than remembering to return the new DAG along with the value:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">get-val</span>
<span class="p">[</span><span class="nv">dag</span> <span class="nv">k</span><span class="p">]</span>
<span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">dag</span> <span class="p">(</span><span class="nf">ensure-val</span> <span class="nv">dag</span> <span class="nv">k</span><span class="p">)]</span>
<span class="p">[</span><span class="nv">dag</span> <span class="p">(</span><span class="nf">get-in</span> <span class="nv">dag</span> <span class="p">[</span><span class="nv">k</span> <span class="ss">:value</span><span class="p">])]))</span>
</code></pre></div>
<p>Let's play.</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">pridentity</span> <span class="p">[</span><span class="nv">x</span><span class="p">]</span> <span class="p">(</span><span class="nf">println</span> <span class="nv">x</span><span class="p">)</span> <span class="nv">x</span><span class="p">)</span> <span class="c1">; useful</span>
<span class="p">(</span><span class="k">-> </span><span class="p">{}</span>
<span class="p">(</span><span class="nb">set</span><span class="nv">-val</span> <span class="ss">:a</span> <span class="mi">1</span><span class="p">)</span>
<span class="p">(</span><span class="nb">set</span><span class="nv">-val</span> <span class="ss">:b</span> <span class="mi">2</span><span class="p">)</span>
<span class="nv">pridentity</span>
<span class="p">(</span><span class="nb">set</span><span class="nv">-fn</span> <span class="ss">:c</span> <span class="p">[</span><span class="ss">:a</span> <span class="ss">:b</span><span class="p">]</span> <span class="p">(</span><span class="kd">fn </span><span class="p">[</span><span class="nv">dag</span><span class="p">]</span> <span class="p">(</span><span class="nf">+</span> <span class="p">(</span><span class="nf">get-in</span> <span class="nv">dag</span> <span class="p">[</span><span class="ss">:a</span> <span class="ss">:value</span><span class="p">])</span> <span class="p">(</span><span class="nf">get-in</span> <span class="nv">dag</span> <span class="p">[</span><span class="ss">:b</span> <span class="ss">:value</span><span class="p">]))))</span>
<span class="nv">pridentity</span>
<span class="p">(</span><span class="nf">ensure-val</span> <span class="ss">:c</span><span class="p">))</span>
</code></pre></div>
<p>This will create our simple <code>c=a+b</code> graph, printing it out at various stages of construction,</p>
<div class="highlight"><pre><span></span><code>{:b {:value 2}, :a {:value 1}}
{:c {:dirty true, :function #<reactive$eval5874>, :args [:a :b]}, :b {:deps #{:c}, :value 2}, :a {:deps #{:c}, :value 1}}
{:c {:value 3, :dirty false, :function #<reactive$eval5874>, :args [:a :b]}, :b {:deps #{:c}, :value 2}, :a {:deps #{:c}, :value 1}}
</code></pre></div>
<p>first with only the value nodes set, then with the unevaluated/dirty function, and finally after evaluation.
(I've truncated the anonymus function name for clarity).</p>
<h2>Pretty it up with macros</h2>
<p>I really don't like the <code>set-fn</code> line above; its complexity disguises what was really a very simple operation.
Wouldn't it be nice to write <code>(set-fn dag c [a b] (+ a b))</code> instead? Macros can help.</p>
<p>Let's rename the original version to <code>set-fn*</code> and build a macro around it:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defmacro </span><span class="nv">set-fn</span>
<span class="p">[</span><span class="nv">dagh</span> <span class="nv">k</span> <span class="nv">args</span> <span class="o">&</span> <span class="nv">forms</span><span class="p">]</span> <span class="c1">;[1]</span>
<span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">kargs</span> <span class="p">(</span><span class="nb">map </span><span class="o">#</span><span class="p">(</span><span class="nb">keyword </span><span class="nv">%</span><span class="p">)</span> <span class="nv">args</span><span class="p">)</span> <span class="c1">;[2]</span>
<span class="nv">vs</span> <span class="p">(</span><span class="nb">map </span><span class="p">(</span><span class="k">fn </span><span class="p">[</span><span class="nv">k</span><span class="p">]</span> <span class="o">`</span><span class="p">(</span><span class="nf">get-in</span> <span class="o">~</span><span class="ss">'dag</span> <span class="p">[</span><span class="o">~</span><span class="nv">k</span> <span class="ss">:value</span><span class="p">]))</span> <span class="nv">kargs</span><span class="p">)</span> <span class="c1">;[3]</span>
<span class="nv">bindings</span> <span class="p">(</span><span class="nb">interleave </span><span class="nv">args</span> <span class="nv">vs</span><span class="p">)]</span> <span class="c1">;[4]</span>
<span class="o">`</span><span class="p">(</span><span class="nf">set-fn*</span> <span class="o">~</span><span class="nv">dagh</span> <span class="o">~</span><span class="p">(</span><span class="nb">keyword </span><span class="nv">k</span><span class="p">)</span> <span class="p">[</span><span class="o">~@</span><span class="nv">kargs</span><span class="p">]</span> <span class="c1">;[5]</span>
<span class="p">(</span><span class="k">fn </span><span class="p">[</span><span class="o">~</span><span class="ss">'dag</span><span class="p">]</span> <span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="o">~@</span><span class="nv">bindings</span><span class="p">]</span> <span class="o">~@</span><span class="nv">forms</span><span class="p">)))))</span> <span class="c1">;[6]</span>
</code></pre></div>
<p>We will use the Scottish sounding <code>dagh</code> <strong>[1]</strong> as the macro argument, to distinguish
it from the <code>dag</code> the argument in the function we create.</p>
<p>First <strong>[2]</strong> we convert an argument vector like <code>[a b]</code> into a list of keywords <code>'(:a :b)</code>.</p>
<p>Then <strong>[3]</strong> for each keyword in that list, we generate the <code>get-in</code> call that
will retrieve the computed value from the nested map.
Clojure macro syntax may look a bit like <a href="http://www.perl.com/pub/2000/01/10PerlMyths.html#Perl_looks_like_line_noise">line noise</a>,
but what's going on here isn't too complicated. The leading backtick in <code>`(get-in ~'dag [~k :value])</code>
quotes the entire s-expression, so <code>get-in</code> and <code>:value</code> get treated as if they appeared in
regular Clojure. The punctuation in <code>~'dag</code> prevents <code>dag</code> from getting name-space qualified, which
is important, since it's going to be used as function argument. The tilde in <code>~k</code> just expands it to the corresponding
keyword element of <code>kargs</code>.</p>
<p>Interleaving <strong>[4]</strong> the arguments and the <code>get-in</code> expressions gives us bindings suitable for
insertion into a <code>let</code> form.</p>
<p>The <code>~dagh</code> in <strong>[5]</strong> will expand to the actual argument of the macro; <code>~(keyword k)</code> turns
<code>c</code> into <code>:c</code>; and we splice in the argument keywords to form <code>[:a :b]</code>.
The function itself uses the unqualified <code>~'dag</code> as an argument, splices in our bindings
from above, and snarfs the function forms as they were written.</p>
<p>Similarly renaming and wrapping the other functions (see the
<a href="https://github.com/pnf/clojure-playground/blob/master/src/clj/playground/reactive.clj">code</a>
if you care), lets us do this</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="n">def</span> <span class="n">d1</span> <span class="p">(</span><span class="o">-></span> <span class="p">{}</span>
<span class="p">(</span><span class="n">set</span><span class="o">-</span><span class="n">val</span> <span class="n">a</span> <span class="mi">1</span><span class="p">)</span>
<span class="p">(</span><span class="n">set</span><span class="o">-</span><span class="n">val</span> <span class="n">b</span> <span class="mi">2</span><span class="p">)</span>
<span class="p">(</span><span class="n">set</span><span class="o">-</span><span class="n">fn</span> <span class="n">c</span> <span class="p">[</span><span class="n">a</span> <span class="n">b</span><span class="p">]</span> <span class="p">(</span><span class="o">+</span> <span class="n">a</span> <span class="n">b</span><span class="p">))))</span>
<span class="p">(</span><span class="n">def</span> <span class="n">d2</span> <span class="p">(</span><span class="o">-></span> <span class="n">d1</span> <span class="p">(</span><span class="n">set</span><span class="o">-</span><span class="n">val</span> <span class="n">b</span> <span class="mi">3</span><span class="p">)))</span>
<span class="p">(</span><span class="n">println</span> <span class="p">(</span><span class="n">gv</span> <span class="n">d1</span> <span class="n">c</span><span class="p">)</span> <span class="p">(</span><span class="n">gv</span> <span class="n">d2</span> <span class="n">c</span><span class="p">))</span>
</code></pre></div>
<p>and get the expected <code>3 4</code>. </p>
<h2>The state monad</h2>
<p>The conclusion of this section is going to be that I can't see much point using the state monad
here, so feel free to skip it if anticlimaxes get you down. At the same time, it's not unlikely
that I'm wrong about this, and I'd really like someone to tell me why.</p>
<p>So far, we've used the <code>-></code> threading macro to perform successive operations on the DAG.
We could accomplish the same thing by <code>fmap</code>ing together functions that don't refer
to the DAG explicitly in their arguments.</p>
<p>The state monad stores values in closures, specifically in functions from state to a tuple
of the value and a potentially different state. So</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="k">fn </span><span class="p">[</span><span class="nv">dag</span><span class="p">]</span>
<span class="p">(</span><span class="k">let </span><span class="p">[[</span><span class="nv">dag</span> <span class="nv">v</span><span class="p">]</span> <span class="p">(</span><span class="nf">get-val*</span> <span class="nv">dag</span> <span class="ss">:c</span><span class="p">)]</span>
<span class="p">[</span><span class="nv">v</span> <span class="nv">dag</span><span class="p">]))</span>
</code></pre></div>
<p>is a monad holding the state-dependent value of the <code>c</code> node of some unspecified DAG.</p>
<p>Now we could write a function to generate one of these functions from an arbitrary symbol k instead of <code>:c</code>:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">get-val-s*</span> <span class="p">[</span><span class="nv">k</span><span class="p">]</span>
<span class="p">(</span><span class="kd">fn </span><span class="p">[</span><span class="nv">dag</span><span class="p">]</span>
<span class="p">(</span><span class="k">let </span><span class="p">[[</span><span class="nv">dag</span> <span class="nv">v</span><span class="p">]</span> <span class="p">(</span><span class="nf">get-val*</span> <span class="nv">dag</span> <span class="nv">k</span><span class="p">)]</span>
<span class="p">[</span><span class="nv">v</span> <span class="nv">dag</span><span class="p">])))</span>
</code></pre></div>
<p>Other than the fact that the thing being returned is a function, this isn't <em>that</em>
much different from a function that returns a more straightforward sort of monad,
like an option, e.g.</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">parse-int</span> <span class="p">[</span><span class="nv">s</span><span class="p">]</span>
<span class="p">(</span><span class="nf">try</span> <span class="p">[</span><span class="ss">:some</span> <span class="p">(</span><span class="nf">Integer/parseInt</span> <span class="nv">s</span><span class="p">)]</span>
<span class="p">(</span><span class="nf">catch</span> <span class="nv">Exception</span> <span class="nv">e</span> <span class="p">[</span><span class="ss">:none</span> <span class="nv">nil</span><span class="p">])))</span>
</code></pre></div>
<p>Writing similar "something -> state monad" functions for our other DAG operations,</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nb">set</span><span class="nv">-val-s*</span> <span class="p">[</span><span class="nv">k</span> <span class="nv">val</span><span class="p">]</span> <span class="p">(</span><span class="kd">fn </span><span class="p">[</span><span class="nv">dag</span><span class="p">]</span> <span class="p">[</span><span class="nv">nil</span> <span class="p">(</span><span class="nb">set</span><span class="nv">-val*</span> <span class="nv">dag</span> <span class="nv">k</span> <span class="nv">val</span><span class="p">)]))</span>
<span class="p">(</span><span class="kd">defn </span><span class="nb">set</span><span class="nv">-fn-s*</span> <span class="p">[</span><span class="nv">k</span> <span class="nv">args</span> <span class="nv">f</span><span class="p">]</span> <span class="p">(</span><span class="kd">fn </span><span class="p">[</span><span class="nv">dag</span><span class="p">]</span> <span class="p">[</span><span class="nv">nil</span> <span class="p">(</span><span class="nb">set</span><span class="nv">-fn*</span> <span class="nv">dag</span> <span class="nv">k</span> <span class="nv">args</span> <span class="nv">f</span><span class="p">)]))</span>
</code></pre></div>
<p>spiffing them up in macros,</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="n">defmacro</span> <span class="n">set</span><span class="o">-</span><span class="n">val</span><span class="o">-</span><span class="n">s</span> <span class="p">[</span><span class="n">k</span> <span class="n">v</span><span class="p">]</span> <span class="err">`</span><span class="p">(</span><span class="n">set</span><span class="o">-</span><span class="n">val</span><span class="o">-</span><span class="n">s</span><span class="o">*</span> <span class="o">~</span><span class="p">(</span><span class="n">keyword</span> <span class="n">k</span><span class="p">)</span> <span class="o">~</span><span class="n">v</span><span class="p">))</span>
<span class="p">(</span><span class="n">defmacro</span> <span class="n">set</span><span class="o">-</span><span class="n">fn</span><span class="o">-</span><span class="n">s</span> <span class="p">[</span><span class="n">k</span> <span class="n">args</span> <span class="o">&</span> <span class="n">forms</span><span class="p">]</span> <span class="err">`</span><span class="p">(</span><span class="n">set</span><span class="o">-</span><span class="n">fn</span><span class="o">-</span><span class="n">s</span><span class="o">*</span> <span class="o">~</span><span class="p">(</span><span class="n">keyword</span> <span class="n">k</span><span class="p">)</span> <span class="o">~</span><span class="p">(</span><span class="n">vec</span> <span class="p">(</span><span class="n">map</span> <span class="n">keyword</span> <span class="n">args</span><span class="p">))</span> <span class="p">(</span><span class="n">rfn</span> <span class="o">~</span><span class="n">args</span> <span class="o">~</span><span class="p">@</span><span class="n">forms</span><span class="p">)))</span>
<span class="p">(</span><span class="n">defmacro</span> <span class="n">get</span><span class="o">-</span><span class="n">val</span><span class="o">-</span><span class="n">s</span> <span class="p">[</span><span class="n">k</span><span class="p">]</span> <span class="err">`</span><span class="p">(</span><span class="n">get</span><span class="o">-</span><span class="n">val</span><span class="o">-</span><span class="n">s</span><span class="o">*</span> <span class="o">~</span><span class="p">(</span><span class="n">keyword</span> <span class="n">k</span><span class="p">)))</span>
</code></pre></div>
<p>and employing the for comprehension macro from <code>clojure.algo.monads</code>, the earlier example looks like</p>
<div class="highlight"><pre><span></span><code><span class="p">((</span><span class="nf">m/domonad</span> <span class="nv">m/state-m</span>
<span class="p">[</span><span class="nv">_</span> <span class="p">(</span><span class="nf">set-val-s</span> <span class="nv">a</span> <span class="mi">1</span><span class="p">)</span>
<span class="nv">_</span> <span class="p">(</span><span class="nf">set-val-s</span> <span class="nv">b</span> <span class="mi">2</span><span class="p">)</span>
<span class="nv">_</span> <span class="p">(</span><span class="nf">set-fn-s</span> <span class="nv">c</span> <span class="p">[</span><span class="nv">a</span> <span class="nv">b</span><span class="p">]</span> <span class="p">(</span><span class="nb">+ </span><span class="nv">a</span> <span class="nv">b</span><span class="p">))</span>
<span class="nv">v</span> <span class="p">(</span><span class="nf">get-val-s</span> <span class="nv">c</span><span class="p">)]</span>
<span class="nv">v</span><span class="p">)</span> <span class="p">{})</span>
</code></pre></div>
<p>That's a bit pointless, since everything but the final result is thrown away, but,
now that we're pretty far down the monadic contrivance hole, we could do some more
complicated manipulations</p>
<div class="highlight"><pre><span></span><code><span class="p">((</span><span class="n">m</span><span class="o">/</span><span class="n">domonad</span> <span class="n">m</span><span class="o">/</span><span class="n">state</span><span class="o">-</span><span class="n">m</span>
<span class="p">[</span><span class="n">_</span> <span class="p">(</span><span class="n">set</span><span class="o">-</span><span class="n">val</span><span class="o">-</span><span class="n">s</span> <span class="n">a</span> <span class="mi">1</span><span class="p">)</span>
<span class="n">_</span> <span class="p">(</span><span class="n">set</span><span class="o">-</span><span class="n">val</span><span class="o">-</span><span class="n">s</span> <span class="n">b</span> <span class="mi">2</span><span class="p">)</span>
<span class="n">_</span> <span class="p">(</span><span class="n">set</span><span class="o">-</span><span class="n">fn</span><span class="o">-</span><span class="n">s</span> <span class="n">c</span> <span class="p">[</span><span class="n">a</span> <span class="n">b</span><span class="p">]</span> <span class="p">(</span><span class="o">+</span> <span class="n">a</span> <span class="n">b</span><span class="p">))</span>
<span class="n">v</span> <span class="p">(</span><span class="n">get</span><span class="o">-</span><span class="n">val</span><span class="o">-</span><span class="n">s</span> <span class="n">c</span><span class="p">)</span>
<span class="n">_</span> <span class="p">(</span><span class="n">set</span><span class="o">-</span><span class="n">val</span><span class="o">-</span><span class="n">s</span> <span class="n">a</span> <span class="p">(</span><span class="o">*</span> <span class="n">v</span> <span class="n">v</span><span class="p">))</span>
<span class="n">w</span> <span class="p">(</span><span class="n">get</span><span class="o">-</span><span class="n">val</span><span class="o">-</span><span class="n">s</span> <span class="n">c</span><span class="p">)]</span>
<span class="p">(</span><span class="n">str</span> <span class="s">"got "</span> <span class="n">w</span><span class="p">))</span> <span class="p">{})</span>
</code></pre></div>
<p>and we'll find that we "got 11". That's neat, suppose.</p>
<h2>A more complicated example, but not with monads</h2>
<p>Now a more complicated example, one that is popular when demoing secdb-like systems, but this time
with a rich-man's "diddle scope." We'll construct a graph to price a call option using the
discredited <a href="http://en.wikipedia.org/wiki/Black%E2%80%93Scholes_model">Black Scholes</a> model:</p>
<p><img alt="" src="images/bs-formula.png"></p>
<p>S is a stock price and K is the "strike" ... think of this as some random formula if you like.
We can express it in graph form as follows:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">N</span> <span class="p">[</span><span class="nv">x</span><span class="p">]</span> <span class="p">(</span><span class="nb">/ </span><span class="p">(</span><span class="nb">+ </span><span class="mf">1.0</span> <span class="p">(</span><span class="nf">Erf/erf</span> <span class="p">(</span><span class="nb">/ </span><span class="nv">x</span> <span class="p">(</span><span class="nf">Math/sqrt</span> <span class="mf">2.0</span><span class="p">))))</span> <span class="mf">2.0</span><span class="p">))</span>
<span class="p">(</span><span class="kd">defn </span><span class="nv">option</span> <span class="p">[]</span>
<span class="p">(</span><span class="nb">-> </span><span class="p">{}</span>
<span class="p">(</span><span class="nf">set-val</span> <span class="nv">K</span> <span class="mf">101.0</span><span class="p">)</span>
<span class="p">(</span><span class="nf">set-val</span> <span class="nv">S</span> <span class="mf">100.0</span><span class="p">)</span>
<span class="p">(</span><span class="nf">set-val</span> <span class="nv">T</span> <span class="mf">1.0</span><span class="p">)</span>
<span class="p">(</span><span class="nf">set-val</span> <span class="nv">r</span> <span class="mf">0.01</span><span class="p">)</span>
<span class="p">(</span><span class="nf">set-val</span> <span class="nv">sigma</span> <span class="mf">0.35</span><span class="p">)</span>
<span class="p">(</span><span class="nf">set-fn</span> <span class="nv">d1</span> <span class="p">[</span><span class="nv">S</span> <span class="nv">T</span> <span class="nv">K</span> <span class="nv">r</span> <span class="nv">sigma</span><span class="p">]</span> <span class="p">(</span><span class="nb">/ </span><span class="p">(</span><span class="nb">+ </span><span class="p">(</span><span class="nf">Math/log</span> <span class="p">(</span><span class="nb">/ </span><span class="nv">S</span> <span class="nv">K</span><span class="p">))</span> <span class="p">(</span><span class="nb">* </span><span class="p">(</span><span class="nb">+ </span><span class="nv">r</span> <span class="p">(</span><span class="nb">/ </span><span class="p">(</span><span class="nb">* </span><span class="nv">sigma</span> <span class="nv">sigma</span><span class="p">)</span> <span class="mi">2</span><span class="p">))</span> <span class="nv">T</span><span class="p">))</span> <span class="p">(</span><span class="nb">* </span><span class="nv">sigma</span> <span class="p">(</span><span class="nf">Math/sqrt</span> <span class="nv">T</span><span class="p">))))</span>
<span class="p">(</span><span class="nf">set-fn</span> <span class="nv">d2</span> <span class="p">[</span><span class="nv">d1</span> <span class="nv">T</span> <span class="nv">sigma</span><span class="p">]</span> <span class="p">(</span><span class="nb">- </span><span class="nv">d1</span> <span class="p">(</span><span class="nb">* </span><span class="nv">sigma</span> <span class="p">(</span><span class="nf">Math/sqrt</span> <span class="nv">T</span><span class="p">))))</span>
<span class="p">(</span><span class="nf">set-fn</span> <span class="nv">c</span> <span class="p">[</span><span class="nv">S</span> <span class="nv">T</span> <span class="nv">K</span> <span class="nv">r</span> <span class="nv">d1</span> <span class="nv">d2</span><span class="p">]</span> <span class="p">(</span><span class="nb">- </span><span class="p">(</span><span class="nb">* </span><span class="nv">S</span> <span class="p">(</span><span class="nf">N</span> <span class="nv">d1</span><span class="p">))</span> <span class="p">(</span><span class="nb">* </span><span class="nv">K</span> <span class="p">(</span><span class="nf">Math/exp</span> <span class="p">(</span><span class="nb">* </span><span class="p">(</span><span class="nb">- </span><span class="nv">r</span><span class="p">)</span> <span class="nv">T</span><span class="p">))</span> <span class="p">(</span><span class="nf">N</span> <span class="nv">d2</span><span class="p">)))))</span>
<span class="o">#</span><span class="nv">_</span><span class="p">(</span><span class="nf">watch</span> <span class="nv">this</span> <span class="nv">spot</span><span class="p">))</span>
</code></pre></div>
<p>Not much going on here, other than the usual horror of writing arithmetic expressions in prefix notation,
but now for some fun.</p>
<p>It turns out that the derivative $\partial c/\partial S$ is useful to know, as it represents the amount of
stock you'd need in order to hedge (exactly counterbalance) fluctuations in the option price. In this simple
case, it's easy to take the derivative analytically, but, in general, we might calculate it by finite
differencing. In the diddle scope paradigm, that would be accomplished with an unsightly bump and restore,
but we can do much better. Let's add a node to the graph:</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="n">set</span><span class="o">-</span><span class="n">fn</span> <span class="n">delta</span> <span class="p">[</span><span class="n">c</span> <span class="n">S</span><span class="p">]</span> <span class="p">(</span><span class="o">/</span> <span class="p">(</span><span class="o">-</span> <span class="p">(</span><span class="o">-></span> <span class="n">dag</span> <span class="p">(</span><span class="n">set</span><span class="o">-</span><span class="nl">val</span> <span class="p">:</span><span class="n">S</span> <span class="p">(</span><span class="o">+</span> <span class="n">S</span> <span class="mf">0.01</span><span class="p">))</span> <span class="p">(</span><span class="n">gv</span> <span class="n">c</span><span class="p">))</span> <span class="n">c</span><span class="p">)</span> <span class="mf">0.01</span> <span class="p">))</span>
</code></pre></div>
<p>The anonymous function is retrieving the current value of S, creating a new graph where it's been increased
by 0.01 and extracting c's value reactively. It does this perfectly safely, with no possibility of corrupting
the original graph and no need to restore it explicitly. We could even have dispatched the delta calculation
to another thread, completely without worry. We get all this, because the graph is just a value!</p>
<h2><s>Rueful Conclusion</s> Never Graduate</h2>
<p>I believe that the foregoing could be industrialized, and I can imagine many useful applications.
However, I don't know that this is actually new, because I can't understand the literature
(for example, <a href="http://www.cs.rit.edu/~mtf/student-resources/20103_amsden_istudy.pdf">this survey</a>)
and I can't understand the literature, because it's all in Haskell. It's time to L[M]AHFGG.</p>FUNCTIONAL functional reactive programming2014-01-23T12:00:00-05:002014-01-23T12:00:00-05:00Peter Fraenkeltag:blog.podsnap.com,2014-01-23:/reactive.html<p>I totally agree with Paul Chiusano that the <a href="http://www.reactivemanifesto.org/">Reactive Manifesto</a>
<a href="http://pchiusano.blogspot.com/2013/11/the-reactive-manifesto-is-not-even-wrong.html">not even wrong</a>.
In addition to the breezy <a href="http://en.wikipedia.org/wiki/Not_even_wrong">non-falsifiability</a> of its assertions,
I have trouble with the name itself. Manifestos seldom work out well, in the sense of
there not being a lot of corpses. (Plus which, "reactive" is acually a word, and a "reactive manifesto"
doesn't sound like it would be very proactive, like.)</p>
<p><strong>BUT</strong> reactive programming is important, it's useful, and I need to understand it better.
Herewith, then, I shall:</p>
<ol>
<li>Very briefly introduce reactive programming as I understand it.</li>
<li>Complain that I don't understand why they call …</li></ol><p>I totally agree with Paul Chiusano that the <a href="http://www.reactivemanifesto.org/">Reactive Manifesto</a>
<a href="http://pchiusano.blogspot.com/2013/11/the-reactive-manifesto-is-not-even-wrong.html">not even wrong</a>.
In addition to the breezy <a href="http://en.wikipedia.org/wiki/Not_even_wrong">non-falsifiability</a> of its assertions,
I have trouble with the name itself. Manifestos seldom work out well, in the sense of
there not being a lot of corpses. (Plus which, "reactive" is acually a word, and a "reactive manifesto"
doesn't sound like it would be very proactive, like.)</p>
<p><strong>BUT</strong> reactive programming is important, it's useful, and I need to understand it better.
Herewith, then, I shall:</p>
<ol>
<li>Very briefly introduce reactive programming as I understand it.</li>
<li>Complain that I don't understand why they call it <em>Functional</em> Reactive Programming.</li>
<li>Hazard a guess at what the manifesterati were referring to when they mentioned Finance.</li>
<li>Build up toy frameworks in Scala and Clojure
that seems to me to be truly functional.</li>
<li><s>Encapsulate the state of the reactive graph in a state monad.</s> Defer discussion of
the state monad to the Clojure version.</li>
<li>Show some neat stuff that emerges.</li>
<li>Conclude that I need to learn Haskell in order to read the academic papers that will show me what I'm not understanding.</li>
</ol>
<h2>Reactive Programming</h2>
<p>There's been a lot written on this subject. Martin Odersky gave a whole course (which, fair warning, I did
not take), incorporating the concepts introduced in these papers from
<a href="http://lampwww.epfl.ch/~imaier/pub/DeprecatingObserversTR2010.pdf">2010</a>
and <a href="http://infoscience.epfl.ch/record/176887/files/DeprecatingObservers2012.pdf">2012</a>. (Actually, they're sort of
the same paper. The later one drops one of the authors, adds some implementation details, but seems to leave
out a few useful explanations from the earlier one.) This modified example from the 2010 paper gets the basic point across:</p>
<div class="highlight"><pre><span></span><code><span class="kd">val</span> <span class="n">a</span> <span class="o">=</span> <span class="k">new</span> <span class="nc">Var</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="kd">val</span> <span class="n">b</span> <span class="o">=</span> <span class="k">new</span> <span class="nc">Var</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span>
<span class="kd">val</span> <span class="n">c</span> <span class="o">=</span> <span class="k">new</span> <span class="nc">Var</span><span class="p">(</span><span class="mi">3</span><span class="p">)</span>
<span class="kd">val</span> <span class="n">sum</span> <span class="o">=</span> <span class="nc">Signal</span><span class="p">{</span> <span class="n">a</span><span class="p">()</span><span class="o">+</span><span class="n">b</span><span class="p">()</span> <span class="p">}</span>
<span class="kd">val</span> <span class="n">result</span> <span class="o">=</span> <span class="nc">Signal</span><span class="p">{</span> <span class="n">sum</span><span class="p">()</span> <span class="o">*</span> <span class="n">c</span><span class="p">()</span> <span class="p">}</span>
<span class="n">observe</span><span class="p">(</span><span class="n">result</span><span class="p">)</span> <span class="p">{</span> <span class="n">x</span> <span class="o">=></span> <span class="n">println</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> <span class="p">}</span>
<span class="n">a</span><span class="p">()</span><span class="o">=</span> <span class="mi">7</span>
<span class="n">b</span><span class="p">()</span><span class="o">=</span> <span class="mi">35</span>
<span class="n">c</span><span class="p">()</span><span class="o">=</span> <span class="mi">1</span>
</code></pre></div>
<p>Not worrying for the moment about how this might work, we expect the above code to print out</p>
<div class="highlight"><pre><span></span><code><span class="mf">27</span>
<span class="mf">126</span>
<span class="mf">42</span>
</code></pre></div>
<p>as inputs to the <code>Var</code> objects propagate into the calculation maintained by the <code>Signal</code>s.</p>
<p>Somewhere, obviously, is a hidden directed acyclic graph (DAG) structure containing directed links between
the various objects. </p>
<div class="highlight"><pre><span></span><code> <span class="n">a</span> <span class="o">-----></span> <span class="err">\</span>
<span class="o">|-----></span> <span class="nf">sum</span> <span class="o">----></span><span class="err">\</span>
<span class="n">b</span> <span class="o">-----></span> <span class="o">/</span> <span class="o">|----></span> <span class="n">result</span>
<span class="n">c</span><span class="o">---->/</span>
</code></pre></div>
<p>There's more to it than this. For example, I've completely ignored discrete <code>Events</code> in time or the ability to
"integrate" over a history of changes. Nonetheless, the illustrated concept, familiar to anyone who's ever
used a spreadsheet, is a valid example of Functional Reactive Programming.</p>
<h2>Functional Reactive Programming.</h2>
<p>That's what it's called, or FRP for short. The "functional" epithet seems to apply because the you write
functions that take input and emit things, while not themselves keeping any internal state.
<strong>But there's still state!</strong> It's just hiding in a giant, global, mutable thingamie whose content we can only infer by doing
clever experiments. For concreteness, we'll call the thingamie <strong>the DAG</strong> or <strong>the graph</strong>.</p>
<p>When we evaluate <code>a()=7</code>, the <code>7</code> is stored in the DAG, along with the new values
of <code>sum</code> and <code>result</code>.
If <code>a</code>, <code>b</code> and <code>c</code> were updated in different threads, with <code>sum</code>
and <code>result</code> consumed in still others, there could be all sorts of fascinating race conditions.
We'll suppose that the authors of the DAG have designed it so as not to crash under
such circumstances, but the system a whole is opaque and certainly stateful.</p>
<h2>FRP in Finance</h2>
<p>The manifestors have written:</p>
<div class="highlight"><pre><span></span><code> Finance and telecommunication were the first to adopt new
practices to satisfy the new requirements and others have
followed.
</code></pre></div>
<p>Finance comprises many people and many technologies, and there are surely
some sophisticated examples of reactivity in the service of Mammon, but
the only <em>notable</em> early use of reactivity in the financial
sector is a system developed in the mid-90s at Goldman Sachs called secdb.
I never worked at Goldman, so everything I say about secdb is hearsay; on the other
hand, many people have worked at Goldman, so much has been said and heard
(and of course I'm not bound by any sort of confidentiality agreement).
Secdb employs a proprietary language called slang
(having nothing whatsoever to do with <a href="http://www.jedsoft.org/slang/">s-lang</a>),
which supports a reactive programming style. Since I probably don't know
slang, let's pretend its syntax was similar to that of the scala DSL
from the Oderksy (et al) papers.</p>
<p>Slang had a twist that they called the "diddle scope" with scope in
sense often delimited by curly braces. Any "diddles" to the DAG
within this scope are undone on exiting it, and the language is
single-threaded, so nobody else can have done anything to the DAG
in the mean time. Pretending that the diddling
is scoped by <code><<</code>, and that scoping rules are otherwise pythonic,</p>
<div class="highlight"><pre><span></span><code>a = Var(1)
b = Var(2)
sum = Signal{ a()+b() }
<<
b() = 3
diddled = sum()
>>
diddless = sum()
println(diddled,diddless)
</code></pre></div>
<p>would print <code>4 3</code>.</p>
<p>The phantom DAG here is, in a way, a poor man's
<a href="https://en.wikipedia.org/wiki/Persistent_data_structure">persistent data structure</a>.
The richer man would have (a) had explicit access to the dag structure,
which he would (b) not mutate and restore but rather copy (inexpensively).
In fact, one of the key technology figures present at Goldman during the
birth of secdb was <a href="https://duckduckgo.com/?q=neil+sarnak+persistent">Neil Sarnak</a>, who,
as you can see, did more than dabble in persistent data structures while in academia.
I imagine that he would have wanted a true persistent DAG, but the implementations
at the time just weren't good enough.
Little regarding my fragile ego, Neil did not respond to a groupie-like request for
comment.</p>
<p>Today, one could imagine</p>
<div class="highlight"><pre><span></span><code>dag = Dag().set("a",1).set("b",2).set("a","b", _+_)
dag2 = dag.set("b",3)
println(dag2.get("c"), dag.get("c")
</code></pre></div>
<p>printing <code>4 3</code>, but we'll do better than imagine. In the next section, we'll hack up
a simple reactive framework and show what a fully persistent DAG might buy us.</p>
<h2>Would you prefer to continue in Clojure?</h2>
<p>Click here -->
<a href="http://blog.podsnap.com/reactive-clj.html"><img alt="" src="images/clojure-icon.gif"></a></p>
<h2>A FUNCTIONAL reactive framework in Scala</h2>
<p>As usual, the code is <a href="https://github.com/pnf/scala-playground/blob/master/src/main/scala/reactive.scala">github</a>,
with illustrative snippets below. I'll be omitting <s>a few</s> <strong>many</strong> important nuances,
like dealing with changes to the DAG after its
built, you will and no consideration whatever is paid to performance. As before, there will be <code>Var</code> and <code>Signal</code>
nodes, but we will not be manipulating them directly.</p>
<p>The basic structure of the DAG will be represented by a map of <code>Id[T]</code> to <code>Node[T]</code>,
where the latter can be either <code>Var</code> or <code>Signal</code> as in the examples above.
The difference here is that both are immutable case classes with immutable members.</p>
<div class="highlight"><pre><span></span><code><span class="k">sealed</span> <span class="k">trait</span> <span class="nc">Node</span><span class="p">[</span><span class="nc">T</span><span class="p">]</span>
<span class="k">case</span> <span class="k">class</span> <span class="nc">Id</span><span class="p">[</span><span class="o">+</span><span class="nc">T</span><span class="p">](</span><span class="n">id</span><span class="p">:</span><span class="nc">String</span><span class="p">)</span>
<span class="k">case</span> <span class="k">class</span> <span class="nc">Dag</span><span class="p">(</span><span class="kd">val</span> <span class="n">dag</span><span class="p">:</span> <span class="nc">Map</span><span class="p">[</span><span class="nc">Id</span><span class="p">[</span><span class="n">_</span><span class="p">],</span><span class="nc">Node</span><span class="p">[</span><span class="n">_</span><span class="p">]])</span>
<span class="k">case</span> <span class="k">class</span> <span class="nc">Var</span><span class="p">[</span><span class="nc">T</span><span class="p">](</span><span class="n">deps</span><span class="p">:</span> <span class="nc">Set</span><span class="p">[</span><span class="nc">Id</span><span class="p">[</span><span class="nc">T</span><span class="p">]],</span> <span class="n">value</span><span class="p">:</span> <span class="nc">T</span><span class="p">)</span> <span class="k">extends</span> <span class="nc">Node</span><span class="p">[</span><span class="nc">T</span><span class="p">]</span>
<span class="k">case</span> <span class="k">class</span> <span class="nc">Signal</span><span class="p">[</span><span class="nc">T</span><span class="p">](</span><span class="n">deps</span><span class="p">:</span> <span class="nc">Set</span><span class="p">[</span><span class="nc">Id</span><span class="p">[</span><span class="n">_</span><span class="p">]],</span> <span class="n">args</span><span class="p">:</span> <span class="nc">Set</span><span class="p">[</span><span class="nc">Id</span><span class="p">[</span><span class="n">_</span><span class="p">]],</span> <span class="n">fn</span><span class="p">:</span> <span class="nc">Dag</span> <span class="o">=></span> <span class="nc">T</span><span class="p">,</span><span class="n">value</span><span class="p">:</span> <span class="nc">Option</span><span class="p">[</span><span class="nc">T</span><span class="p">])</span> <span class="k">extends</span> <span class="nc">Node</span><span class="p">[</span><span class="nc">T</span><span class="p">]</span>
</code></pre></div>
<p>In both varieties of <code>Node</code>, the <code>deps</code> member, holds <code>Id</code>s of the dependents of
this node, i.e. the nodes that depend on this one. The linkages are via map lookups rather
than conventional references, because we want to hitch a free ride on the the persistent
implementation of <code>immutable.HashMap</code>. Note that, while the <code>Id</code>s are merely labels, they're
still typed by what they're allowed to label.</p>
<p>We will make "assignments" with functions that take one immutable <code>Dag</code> and return another.</p>
<div class="highlight"><pre><span></span><code><span class="n">def</span><span class="w"> </span><span class="k">set</span><span class="o">[</span><span class="n">T</span><span class="o">]</span><span class="p">(</span><span class="nl">dag</span><span class="p">:</span><span class="w"> </span><span class="n">Dag</span><span class="p">,</span><span class="w"> </span><span class="nl">id</span><span class="p">:</span><span class="w"> </span><span class="n">Id</span><span class="o">[</span><span class="n">T</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="k">value</span><span class="err">:</span><span class="w"> </span><span class="n">T</span><span class="p">)</span><span class="w"> </span><span class="err">:</span><span class="w"> </span><span class="n">Dag</span><span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nf">Var</span><span class="w"> </span><span class="n">nodes</span><span class="w"></span>
<span class="n">def</span><span class="w"> </span><span class="k">set</span><span class="o">[</span><span class="n">T</span><span class="o">]</span><span class="p">(</span><span class="nl">dag</span><span class="p">:</span><span class="w"> </span><span class="n">Dag</span><span class="p">,</span><span class="w"> </span><span class="nl">args</span><span class="p">:</span><span class="w"> </span><span class="k">Set</span><span class="o">[</span><span class="n">Id[_</span><span class="o">]</span><span class="err">]</span><span class="p">,</span><span class="w"> </span><span class="nl">fn</span><span class="p">:</span><span class="w"> </span><span class="n">Dag</span><span class="o">=></span><span class="n">T</span><span class="p">)</span><span class="w"> </span><span class="err">:</span><span class="w"> </span><span class="n">Dag</span><span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="n">Signal</span><span class="w"> </span><span class="n">nodes</span><span class="w"></span>
</code></pre></div>
<p>Actually, it will be more convenient to implement these as methods of <code>Dag</code>. Here's the first:</p>
<div class="highlight"><pre><span></span><code><span class="n">implicit</span><span class="w"> </span><span class="n">def</span><span class="w"> </span><span class="n">mapToDag</span><span class="p">(</span><span class="k">map</span><span class="err">:</span><span class="w"> </span><span class="k">Map</span><span class="o">[</span><span class="n">Id[_</span><span class="o">]</span><span class="p">,</span><span class="n">Node</span><span class="o">[</span><span class="n">_</span><span class="o">]</span><span class="err">]</span><span class="p">)</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Dag</span><span class="p">(</span><span class="k">map</span><span class="p">)</span><span class="w"></span>
<span class="n">def</span><span class="w"> </span><span class="k">set</span><span class="o">[</span><span class="n">T</span><span class="o">]</span><span class="p">(</span><span class="nl">id</span><span class="p">:</span><span class="w"> </span><span class="n">Id</span><span class="o">[</span><span class="n">T</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="k">value</span><span class="w"> </span><span class="err">:</span><span class="n">T</span><span class="p">)</span><span class="w"> </span><span class="err">:</span><span class="w"> </span><span class="n">Dag</span><span class="w"> </span><span class="o">=</span><span class="w"></span>
<span class="w"> </span><span class="n">dag</span><span class="p">.</span><span class="k">get</span><span class="p">(</span><span class="n">id</span><span class="p">)</span><span class="w"> </span><span class="k">match</span><span class="w"> </span><span class="err">{</span><span class="w"></span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="ow">Some</span><span class="p">(</span><span class="nl">leaf</span><span class="p">:</span><span class="nf">Var</span><span class="o">[</span><span class="n">T</span><span class="o">]</span><span class="p">)</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="p">(</span><span class="n">dag</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="p">(</span><span class="n">id</span><span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="n">leaf</span><span class="p">.</span><span class="n">copy</span><span class="p">(</span><span class="k">value</span><span class="o">=</span><span class="k">value</span><span class="p">))).</span><span class="n">sully</span><span class="p">(</span><span class="n">id</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="n">_</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="n">dag</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="p">(</span><span class="n">id</span><span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="nf">Var</span><span class="o">[</span><span class="n">T</span><span class="o">]</span><span class="p">(</span><span class="k">Set</span><span class="o">[</span><span class="n">Id[T</span><span class="o">]</span><span class="err">]</span><span class="p">(),</span><span class="k">value</span><span class="p">))</span><span class="w"></span>
<span class="w"> </span><span class="err">}</span><span class="w"></span>
</code></pre></div>
<p>Other than <code>sully</code>, which I'll get to in a minute, the basic logic should be clear.
If there's something there already, we use <code>Map.+</code> to make a new graph, which
contains a new node that is identical to the old one other than having a different value.</p>
<p>Setting a signal node is a little more complicated,
because we need to record our dependency on the arguments of the function:</p>
<div class="highlight"><pre><span></span><code><span class="w"> </span><span class="n">def</span><span class="w"> </span><span class="k">set</span><span class="o">[</span><span class="n">T</span><span class="o">]</span><span class="p">(</span><span class="nl">id</span><span class="p">:</span><span class="w"> </span><span class="n">Id</span><span class="o">[</span><span class="n">T</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="nl">args</span><span class="p">:</span><span class="w"> </span><span class="k">Set</span><span class="o">[</span><span class="n">Id[_</span><span class="o">]</span><span class="err">]</span><span class="p">,</span><span class="nl">fn</span><span class="p">:</span><span class="w"> </span><span class="n">Dag</span><span class="o">=></span><span class="n">T</span><span class="p">)</span><span class="w"> </span><span class="err">:</span><span class="w"> </span><span class="n">Dag</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="err">{</span><span class="w"></span>
<span class="w"> </span><span class="n">val</span><span class="w"> </span><span class="n">dag2</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">dag</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="p">(</span><span class="n">id</span><span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="n">Signal</span><span class="p">(</span><span class="k">Set</span><span class="o">[</span><span class="n">Id[_</span><span class="o">]</span><span class="err">]</span><span class="p">(),</span><span class="w"> </span><span class="n">args</span><span class="p">,</span><span class="w"> </span><span class="n">fn</span><span class="p">,</span><span class="w"> </span><span class="k">None</span><span class="p">))</span><span class="w"></span>
<span class="w"> </span><span class="n">args</span><span class="p">.</span><span class="n">foldLeft</span><span class="p">(</span><span class="n">dag2</span><span class="p">)</span><span class="err">{</span><span class="w"> </span><span class="p">(</span><span class="n">d</span><span class="p">,</span><span class="n">i</span><span class="p">)</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="n">d</span><span class="p">.</span><span class="k">get</span><span class="p">(</span><span class="n">i</span><span class="p">)</span><span class="w"> </span><span class="k">match</span><span class="w"> </span><span class="err">{</span><span class="w"></span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="ow">Some</span><span class="p">(</span><span class="n">node</span><span class="w"> </span><span class="err">@</span><span class="w"> </span><span class="nf">Var</span><span class="p">(</span><span class="n">deps</span><span class="p">,</span><span class="n">_</span><span class="p">))</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="n">d</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="n">node</span><span class="p">.</span><span class="n">copy</span><span class="p">(</span><span class="n">deps</span><span class="o">=</span><span class="n">deps</span><span class="o">+</span><span class="n">id</span><span class="p">))</span><span class="w"></span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="ow">Some</span><span class="p">(</span><span class="n">node</span><span class="w"> </span><span class="err">@</span><span class="w"> </span><span class="n">Signal</span><span class="p">(</span><span class="n">deps</span><span class="p">,</span><span class="n">__</span><span class="p">,</span><span class="n">_</span><span class="p">,</span><span class="n">_</span><span class="p">))</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="n">d</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="n">node</span><span class="p">.</span><span class="n">copy</span><span class="p">(</span><span class="n">deps</span><span class="o">=</span><span class="n">deps</span><span class="o">+</span><span class="n">id</span><span class="p">))</span><span class="w"></span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="k">None</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="vm">???</span><span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">abort</span><span class="err">!</span><span class="w"></span>
<span class="w"> </span><span class="err">}}</span><span class="w"></span>
<span class="w"> </span><span class="err">}</span><span class="w"></span>
</code></pre></div>
<p>The <code>foldLeft</code> operation iterates over the argument list, poking our id into every node on
which we depend. Of course we don't actually change any node: at each iteration we just create a new map containing
a new node containing a new set of dependents that now includes us. That's a lot of new maps, so
we're putting a lot of faith in the efficiency of the persistent data structure and the cleverness
of the JVM. (I did warn you that no attention would be paid performance.)</p>
<p>The second argument of <code>foldLeft</code> is a binary function, which in this case uses some relatively fancy
pattern matching and deconstruction. The <code>@</code> notation allows us to extract both the whole <code>node</code> as well
as the <code>deps</code> within it.</p>
<p>Note also the unpleasant use of <code>???</code>. We'll get a nasty stack trace if we get to this illegal place.</p>
<p>Now to <code>sully</code>. This function will follow the trail of <code>deps</code>, marking every signal node it finds as
requiring calculation. This technique is often called "dirty bit propagation," so the name of the function
is moderately witty but needlessly obscure. Too late to change it though. "Dirtyness" in our case will be
indicated by the absence of a value, i.e. the <code>Option[T]</code> being <code>None</code>.</p>
<div class="highlight"><pre><span></span><code><span class="w"> </span><span class="n">def</span><span class="w"> </span><span class="n">sully</span><span class="p">(</span><span class="nl">id</span><span class="p">:</span><span class="w"> </span><span class="n">Id</span><span class="o">[</span><span class="n">_</span><span class="o">]</span><span class="p">)</span><span class="w"> </span><span class="err">:</span><span class="w"> </span><span class="k">Map</span><span class="o">[</span><span class="n">Id[_</span><span class="o">]</span><span class="p">,</span><span class="n">Node</span><span class="o">[</span><span class="n">_</span><span class="o">]</span><span class="err">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="err">{</span><span class="w"></span>
<span class="w"> </span><span class="n">dag</span><span class="p">.</span><span class="k">get</span><span class="p">(</span><span class="n">id</span><span class="p">)</span><span class="w"> </span><span class="k">match</span><span class="w"> </span><span class="err">{</span><span class="w"></span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="ow">Some</span><span class="p">(</span><span class="n">Signal</span><span class="p">(</span><span class="n">_</span><span class="p">,</span><span class="n">_</span><span class="p">,</span><span class="n">_</span><span class="p">,</span><span class="k">None</span><span class="p">))</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="n">dag</span><span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">node</span><span class="w"> </span><span class="k">is</span><span class="w"> </span><span class="n">already</span><span class="w"> </span><span class="n">dirty</span><span class="w"></span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="ow">Some</span><span class="p">(</span><span class="n">s</span><span class="w"> </span><span class="err">@</span><span class="w"> </span><span class="n">Signal</span><span class="p">(</span><span class="n">deps</span><span class="p">,</span><span class="n">_</span><span class="p">,</span><span class="n">_</span><span class="p">,</span><span class="ow">Some</span><span class="p">(</span><span class="n">_</span><span class="p">)))</span><span class="w"> </span><span class="o">=></span><span class="w"></span>
<span class="w"> </span><span class="n">deps</span><span class="p">.</span><span class="n">foldLeft</span><span class="p">(</span><span class="n">dag</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="p">(</span><span class="n">id</span><span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="n">s</span><span class="p">.</span><span class="n">copy</span><span class="p">(</span><span class="k">value</span><span class="o">=</span><span class="k">None</span><span class="p">)))</span><span class="w"> </span><span class="err">{</span><span class="p">(</span><span class="n">d</span><span class="p">,</span><span class="n">i</span><span class="p">)</span><span class="o">=></span><span class="n">d</span><span class="p">.</span><span class="n">sully</span><span class="p">(</span><span class="n">i</span><span class="p">)</span><span class="w"> </span><span class="err">}</span><span class="w"></span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="ow">Some</span><span class="p">(</span><span class="nf">Var</span><span class="p">(</span><span class="n">deps</span><span class="p">,</span><span class="n">_</span><span class="p">))</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="n">deps</span><span class="p">.</span><span class="n">foldLeft</span><span class="p">(</span><span class="n">this</span><span class="p">.</span><span class="n">dag</span><span class="p">)((</span><span class="n">d</span><span class="p">,</span><span class="n">i</span><span class="p">)</span><span class="o">=></span><span class="n">d</span><span class="p">.</span><span class="n">sully</span><span class="p">(</span><span class="n">i</span><span class="p">))</span><span class="w"></span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="k">None</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="vm">???</span><span class="w"></span>
<span class="w"> </span><span class="err">}</span><span class="w"></span>
<span class="w"> </span><span class="err">}</span><span class="w"></span>
</code></pre></div>
<p>This is a classic recursive depth first search, except that we're using <code>foldLeft</code> to iterate over children,
and the recursion occurs in the folding function.
Note that the <code>Some(Var)</code> match should occur
exactly once, since only <code>Signal</code> nodes are dependents.</p>
<p>There are only a few more pieces to put in place. The following function will
ensure that a node in which we're interested has been calculated:</p>
<div class="highlight"><pre><span></span><code><span class="w"> </span><span class="n">def</span><span class="w"> </span><span class="n">ensure</span><span class="p">(</span><span class="nl">id</span><span class="p">:</span><span class="w"> </span><span class="n">Id</span><span class="o">[</span><span class="n">_</span><span class="o">]</span><span class="p">)</span><span class="w"> </span><span class="err">:</span><span class="w"> </span><span class="k">Map</span><span class="o">[</span><span class="n">Id[_</span><span class="o">]</span><span class="p">,</span><span class="n">Node</span><span class="o">[</span><span class="n">_</span><span class="o">]</span><span class="err">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">dag</span><span class="p">.</span><span class="k">get</span><span class="p">(</span><span class="n">id</span><span class="p">)</span><span class="w"> </span><span class="k">match</span><span class="w"> </span><span class="err">{</span><span class="w"></span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="ow">Some</span><span class="p">(</span><span class="nf">Var</span><span class="p">(</span><span class="n">_</span><span class="p">,</span><span class="n">_</span><span class="p">))</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="ow">Some</span><span class="p">(</span><span class="n">Signal</span><span class="p">(</span><span class="n">_</span><span class="p">,</span><span class="n">_</span><span class="p">,</span><span class="n">_</span><span class="p">,</span><span class="ow">Some</span><span class="p">(</span><span class="n">_</span><span class="p">)))</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="k">None</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="n">dag</span><span class="w"></span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="ow">Some</span><span class="p">(</span><span class="n">node</span><span class="w"> </span><span class="err">@</span><span class="w"> </span><span class="n">Signal</span><span class="p">(</span><span class="n">_</span><span class="p">,</span><span class="n">args</span><span class="p">,</span><span class="n">_</span><span class="p">,</span><span class="k">None</span><span class="p">))</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="err">{</span><span class="w"></span>
<span class="w"> </span><span class="n">val</span><span class="w"> </span><span class="n">dag2</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">args</span><span class="p">.</span><span class="n">foldLeft</span><span class="p">(</span><span class="n">dag</span><span class="p">)((</span><span class="n">d</span><span class="p">,</span><span class="n">i</span><span class="p">)</span><span class="o">=></span><span class="n">d</span><span class="p">.</span><span class="n">ensure</span><span class="p">(</span><span class="n">i</span><span class="p">))</span><span class="w"></span>
<span class="w"> </span><span class="n">dag2</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="p">(</span><span class="n">id</span><span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="n">node</span><span class="p">.</span><span class="n">copy</span><span class="p">(</span><span class="k">value</span><span class="o">=</span><span class="ow">Some</span><span class="p">(</span><span class="n">node</span><span class="p">.</span><span class="n">fn</span><span class="p">(</span><span class="n">dag2</span><span class="p">))))</span><span class="w"></span>
<span class="w"> </span><span class="err">}}</span><span class="w"></span>
</code></pre></div>
<p>This is also a depth first search, but it searches back along the <code>args</code> trail rather than
forward along <code>deps</code>.
When it finds a <code>Signal</code> with a <code>None</code> value, it recurs to ensure all the
arguments are available and then evaluates the function.</p>
<p>Now we need a way to get information out of the graph. Since doing so may trigger a calculation
and in so doing change the graph, our method will return a tuple <code>(Dag,T)</code>:</p>
<div class="highlight"><pre><span></span><code><span class="w"> </span><span class="n">def</span><span class="w"> </span><span class="k">get</span><span class="o">[</span><span class="n">T</span><span class="o">]</span><span class="p">(</span><span class="nl">id</span><span class="p">:</span><span class="n">Id</span><span class="o">[</span><span class="n">T</span><span class="o">]</span><span class="p">)</span><span class="w"> </span><span class="err">:</span><span class="w"> </span><span class="p">(</span><span class="n">Dag</span><span class="p">,</span><span class="n">T</span><span class="p">)</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="err">{</span><span class="w"></span>
<span class="w"> </span><span class="n">val</span><span class="w"> </span><span class="n">dag2</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ensure</span><span class="p">(</span><span class="n">id</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="n">dag2</span><span class="p">.</span><span class="k">get</span><span class="p">(</span><span class="n">id</span><span class="p">)</span><span class="w"> </span><span class="k">match</span><span class="w"> </span><span class="err">{</span><span class="w"></span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="ow">Some</span><span class="p">(</span><span class="n">Signal</span><span class="p">(</span><span class="n">_</span><span class="p">,</span><span class="n">_</span><span class="p">,</span><span class="n">_</span><span class="p">,</span><span class="ow">Some</span><span class="p">(</span><span class="n">v</span><span class="p">)))</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="p">(</span><span class="n">dag2</span><span class="p">,</span><span class="n">v</span><span class="p">.</span><span class="n">asInstanceOf</span><span class="o">[</span><span class="n">T</span><span class="o">]</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="ow">Some</span><span class="p">(</span><span class="nf">Var</span><span class="p">(</span><span class="n">_</span><span class="p">,</span><span class="n">v</span><span class="p">))</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="p">(</span><span class="n">dag2</span><span class="p">,</span><span class="n">v</span><span class="p">.</span><span class="n">asInstanceOf</span><span class="o">[</span><span class="n">T</span><span class="o">]</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="n">_</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="vm">???</span><span class="w"></span>
<span class="w"> </span><span class="err">}</span><span class="w"></span>
<span class="w"> </span><span class="err">}</span><span class="w"></span>
<span class="w"> </span><span class="n">def</span><span class="w"> </span><span class="n">getv</span><span class="o">[</span><span class="n">T</span><span class="o">]</span><span class="p">(</span><span class="nl">id</span><span class="p">:</span><span class="n">Id</span><span class="o">[</span><span class="n">T</span><span class="o">]</span><span class="p">)</span><span class="w"> </span><span class="err">:</span><span class="w"> </span><span class="n">T</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">get</span><span class="p">(</span><span class="n">id</span><span class="p">).</span><span class="n">_2</span><span class="w"></span>
</code></pre></div>
<p>The <code>asInstanceOf[T]</code> coercion is hideous, but I can't see a way around it using
Scala's type system. The declaration of <code>Map[[Id[_],Node[_]]</code> does not ensure
matching <code>Id</code> and <code>Node</code> types, and</p>
<div class="highlight"><pre><span></span><code><span class="n">type</span><span class="w"> </span><span class="n">DagMap</span><span class="o">[</span><span class="n">T</span><span class="o">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">Map</span><span class="o">[</span><span class="n">[Id[T</span><span class="o">]</span><span class="p">,</span><span class="n">Node</span><span class="o">[</span><span class="n">T</span><span class="o">]</span><span class="err">]</span><span class="w"></span>
</code></pre></div>
<p>would have restricted the map to holding only one type. I will look into
<a href="https://github.com/milessabin/shapeless/wiki/Feature-overview:-shapeless-2.0.0#heterogenous-maps">shapeless' heterogeneous maps</a>,
which can enforce relationships between the types of keys and values, but I haven't seen any examples were those types
were themselves polymorphic.</p>
<p>Before we party, it will be convenient to sugar up the <code>Id</code> class a bit.</p>
<div class="highlight"><pre><span></span><code><span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="k">class</span><span class="w"> </span><span class="n">Id</span><span class="o">[</span><span class="n">+T</span><span class="o">]</span><span class="p">(</span><span class="nl">id</span><span class="p">:</span><span class="n">String</span><span class="p">)</span><span class="w"> </span><span class="err">{</span><span class="w"></span>
<span class="w"> </span><span class="n">def</span><span class="w"> </span><span class="n">this</span><span class="p">()</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">this</span><span class="p">(</span><span class="n">UUID</span><span class="p">.</span><span class="n">randomUUID</span><span class="p">().</span><span class="n">toString</span><span class="p">())</span><span class="w"> </span>
<span class="w"> </span><span class="n">override</span><span class="w"> </span><span class="n">def</span><span class="w"> </span><span class="n">toString</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">id</span><span class="w"></span>
<span class="w"> </span><span class="n">def</span><span class="w"> </span><span class="n">apply</span><span class="p">(</span><span class="n">implicit</span><span class="w"> </span><span class="nl">dag</span><span class="p">:</span><span class="w"> </span><span class="n">Dag</span><span class="p">)</span><span class="w"> </span><span class="err">:</span><span class="w"> </span><span class="n">T</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">dag</span><span class="p">.</span><span class="n">dag</span><span class="p">.</span><span class="k">get</span><span class="p">(</span><span class="n">this</span><span class="p">)</span><span class="w"> </span><span class="k">match</span><span class="w"> </span><span class="err">{</span><span class="w"></span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="ow">Some</span><span class="p">(</span><span class="nf">Var</span><span class="p">(</span><span class="n">_</span><span class="p">,</span><span class="n">v</span><span class="p">))</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="n">v</span><span class="p">.</span><span class="n">asInstanceOf</span><span class="o">[</span><span class="n">T</span><span class="o">]</span><span class="w"></span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="ow">Some</span><span class="p">(</span><span class="n">Signal</span><span class="p">(</span><span class="n">_</span><span class="p">,</span><span class="n">_</span><span class="p">,</span><span class="n">_</span><span class="p">,</span><span class="ow">Some</span><span class="p">(</span><span class="n">v</span><span class="p">)))</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="n">v</span><span class="p">.</span><span class="n">asInstanceOf</span><span class="o">[</span><span class="n">T</span><span class="o">]</span><span class="w"></span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="n">_</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="vm">???</span><span class="w"></span>
<span class="w"> </span><span class="err">}</span><span class="w"></span>
<span class="w"> </span><span class="err">}</span><span class="w"></span>
</code></pre></div>
<p>This <code>apply</code> method for retrieving already calculated values
is meant to be used inside signal functions, making them somewhat prettier:</p>
<div class="highlight"><pre><span></span><code> set(c,Set(a,b),{<> => a(<>) + b(<>)})
</code></pre></div>
<p>The <code><></code> is a valid Scala identifier that is shorter than <code>dag</code> and stands out better.</p>
<p>Putting it all together,</p>
<div class="highlight"><pre><span></span><code><span class="w"> </span><span class="n">val</span><span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Id</span><span class="o">[</span><span class="n">Double</span><span class="o">]</span><span class="p">(</span><span class="ss">"a"</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="n">val</span><span class="w"> </span><span class="n">b</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Id</span><span class="o">[</span><span class="n">Double</span><span class="o">]</span><span class="p">(</span><span class="ss">"b"</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="n">val</span><span class="w"> </span><span class="n">c</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Id</span><span class="o">[</span><span class="n">Double</span><span class="o">]</span><span class="p">(</span><span class="ss">"c"</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="n">val</span><span class="w"> </span><span class="n">dag</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">Dag</span><span class="p">().</span><span class="k">set</span><span class="p">(</span><span class="n">a</span><span class="p">,</span><span class="mf">2.0</span><span class="p">).</span><span class="k">set</span><span class="p">(</span><span class="n">b</span><span class="p">,</span><span class="mf">3.0</span><span class="p">).</span><span class="w"></span>
<span class="w"> </span><span class="k">set</span><span class="p">(</span><span class="n">c</span><span class="p">,</span><span class="k">Set</span><span class="p">(</span><span class="n">a</span><span class="p">,</span><span class="n">b</span><span class="p">),</span><span class="err">{</span><span class="o"><></span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="n">a</span><span class="p">(</span><span class="o"><></span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">b</span><span class="p">(</span><span class="o"><></span><span class="p">)</span><span class="err">}</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="n">val</span><span class="w"> </span><span class="n">dag2</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">dag</span><span class="p">.</span><span class="k">set</span><span class="p">(</span><span class="n">a</span><span class="p">,</span><span class="mf">5.0</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="n">println</span><span class="p">(</span><span class="ss">"Before valuation:\ndag="</span><span class="o">+</span><span class="n">dag</span><span class="o">+</span><span class="ss">"\ndag2="</span><span class="o">+</span><span class="n">dag2</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="n">println</span><span class="p">(</span><span class="ss">"After valuation:\n"</span><span class="o">+</span><span class="n">dag</span><span class="p">.</span><span class="k">get</span><span class="p">(</span><span class="n">c</span><span class="p">)</span><span class="o">+</span><span class="ss">"\n"</span><span class="o">+</span><span class="n">dag2</span><span class="p">.</span><span class="k">get</span><span class="p">(</span><span class="n">c</span><span class="p">))</span><span class="w"></span>
</code></pre></div>
<p>you can see that we've replicated the example, but without global state or randomly throwing
away information:</p>
<div class="highlight"><pre><span></span><code><span class="kr">Before</span> <span class="n">valuation</span><span class="o">:</span>
<span class="n">dag</span><span class="o">=</span><span class="n">Dag</span><span class="p">(</span><span class="nf">Map</span><span class="p">(</span><span class="n">a</span> <span class="o">-></span> <span class="n">Var</span><span class="p">(</span><span class="nf">Set</span><span class="p">(</span><span class="n">c</span><span class="p">),</span><span class="mf">2.0</span><span class="p">),</span> <span class="n">b</span> <span class="o">-></span> <span class="n">Var</span><span class="p">(</span><span class="nf">Set</span><span class="p">(</span><span class="n">c</span><span class="p">),</span><span class="mf">3.0</span><span class="p">),</span> <span class="n">c</span> <span class="o">-></span> <span class="n">Signal</span><span class="p">(</span><span class="nf">Set</span><span class="p">(),</span><span class="nf">Set</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">),</span><span class="o"><</span><span class="n">function1</span><span class="o">></span><span class="p">,</span><span class="n">None</span><span class="p">)))</span>
<span class="n">dag2</span><span class="o">=</span><span class="n">Dag</span><span class="p">(</span><span class="nf">Map</span><span class="p">(</span><span class="n">a</span> <span class="o">-></span> <span class="n">Var</span><span class="p">(</span><span class="nf">Set</span><span class="p">(</span><span class="n">c</span><span class="p">),</span><span class="mf">5.0</span><span class="p">),</span> <span class="n">b</span> <span class="o">-></span> <span class="n">Var</span><span class="p">(</span><span class="nf">Set</span><span class="p">(</span><span class="n">c</span><span class="p">),</span><span class="mf">3.0</span><span class="p">),</span> <span class="n">c</span> <span class="o">-></span> <span class="n">Signal</span><span class="p">(</span><span class="nf">Set</span><span class="p">(),</span><span class="nf">Set</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">),</span><span class="o"><</span><span class="n">function1</span><span class="o">></span><span class="p">,</span><span class="n">None</span><span class="p">)))</span>
<span class="n">After</span> <span class="n">valuation</span><span class="o">:</span>
<span class="p">(</span><span class="n">Dag</span><span class="p">(</span><span class="nf">Map</span><span class="p">(</span><span class="n">a</span> <span class="o">-></span> <span class="n">Var</span><span class="p">(</span><span class="nf">Set</span><span class="p">(</span><span class="n">c</span><span class="p">),</span><span class="mf">2.0</span><span class="p">),</span> <span class="n">b</span> <span class="o">-></span> <span class="n">Var</span><span class="p">(</span><span class="nf">Set</span><span class="p">(</span><span class="n">c</span><span class="p">),</span><span class="mf">3.0</span><span class="p">),</span> <span class="n">c</span> <span class="o">-></span> <span class="n">Signal</span><span class="p">(</span><span class="nf">Set</span><span class="p">(),</span><span class="nf">Set</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">),</span><span class="o"><</span><span class="n">function1</span><span class="o">></span><span class="p">,</span><span class="n">Some</span><span class="p">(</span><span class="mf">5.0</span><span class="p">)))),</span><span class="mf">5.0</span><span class="p">)</span>
<span class="p">(</span><span class="n">Dag</span><span class="p">(</span><span class="nf">Map</span><span class="p">(</span><span class="n">a</span> <span class="o">-></span> <span class="n">Var</span><span class="p">(</span><span class="nf">Set</span><span class="p">(</span><span class="n">c</span><span class="p">),</span><span class="mf">5.0</span><span class="p">),</span> <span class="n">b</span> <span class="o">-></span> <span class="n">Var</span><span class="p">(</span><span class="nf">Set</span><span class="p">(</span><span class="n">c</span><span class="p">),</span><span class="mf">3.0</span><span class="p">),</span> <span class="n">c</span> <span class="o">-></span> <span class="n">Signal</span><span class="p">(</span><span class="nf">Set</span><span class="p">(),</span><span class="nf">Set</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">),</span><span class="o"><</span><span class="n">function1</span><span class="o">></span><span class="p">,</span><span class="n">Some</span><span class="p">(</span><span class="mf">8.0</span><span class="p">)))),</span><span class="mf">8.0</span><span class="p">)</span>
</code></pre></div>
<h2>A more complicated example</h2>
<p>Now a more complicated example, one that is popular when demoing secdb-like systems, but this time
with a rich-man's "diddle scope." We'll construct a graph to price a call option using the
discredited <a href="http://en.wikipedia.org/wiki/Black%E2%80%93Scholes_model">Black Scholes</a> model:</p>
<p><img alt="" src="images/bs-formula.png"></p>
<p>S is a stock price and K is the "strike" ... think of this as some random formula if you like.
We can express it in graph form as follows:</p>
<div class="highlight"><pre><span></span><code> def N(x:Double) = (1.0 + Erf.erf(x/Math.sqrt(2.0)))/2.0
val opt = new Dag().
set(K,101.0).
set(S,100.0).
set(T,1.0).
set(r,0.01).
set(sigma,0.35).
set(d1,Set(S,T,K,r,sigma), <> => ((Math.log(S(<>)/K(<>))) + (r(<>)+(sigma(<>)*sigma(<>)/2.0))*T(<>)) /(sigma(<>)*Math.sqrt(T(<>)))).
set(d2,Set(d1,T,sigma), <> => d1(<>) - (sigma(<>)*Math.sqrt(T(<>)))).
set(c,Set(S,T,K,r,d1,d2), <> => S(<>)*N(d1(<>)) - K(<>)*Math.exp(-r(<>)*T(<>))*N(d2(<>)))
</code></pre></div>
<p>Not much going on here, but now for some fun.</p>
<p>It turns out that the derivative $\partial c/\partial S$ is useful to know, as it represents the amount of
stock you'd need in order to hedge (exactly counterbalance) fluctuations in the option price. In this simple
case, it's easy to take the derivative analytically, but, in general, we might calculate it by finite
differencing. In the diddle scope paradigm, that would be accomplished with an unsightly bump and restore,
but we can do much better. Let's add a node to the graph:</p>
<div class="highlight"><pre><span></span><code> <span class="sx">.set(delta,Set(c,S),</span> <span class="sx"><></span> <span class="p">=</span><span class="sx">></span> <span class="sx">(<>.set(S,S(<>)+0.01).getv(c)</span> <span class="sx">-</span> <span class="sx">c(<>))/0.01)</span>
<span class="sx">println(opt.getv(c),opt.getv(delta))</span>
<span class="sx">//</span> <span class="sx">(13.894175181425787,0.5695720583588582)</span>
</code></pre></div>
<p>The anonymous function is retrieving the current value of S, creating a new graph where it's been increased
by 0.01 and extracting c's value reactively. It does this perfectly safely, with no possibility of corrupting
the original graph and no need to restore it explicitly. We could even have dispatched the delta calculation
to another thread, completely without worry. We get all this, because the graph is just a value!</p>
<h2><s>Rueful Conclusion</s> Never Graduate</h2>
<p>I believe that the foregoing could be industrialized, and I can imagine many useful applications.
However, I don't know that this is actually new, because I can't understand the literature
(for example, <a href="http://www.cs.rit.edu/~mtf/student-resources/20103_amsden_istudy.pdf">this survey</a>)
and I can't understand the literature, because it's all in Haskell. It's time to L[M]AHFGG.</p>
<p>(But I promise to post the clojure version first.)</p>xkcd 1313 by simulated annealing in Clojure2014-01-11T13:48:00-05:002014-01-11T13:48:00-05:00Peter Fraenkeltag:blog.podsnap.com,2014-01-11:/xkcd1313.html<p>Like any red-blooded American, I find <a href="http://xkcd.com/1313/">regex golf</a> fascinating. The idea, to
paraphrase the comic strip, is to find a regular expression that matches all members of a group of related terms, but not
any members of a different group. The hover text on the strip suggests a regex that matches all winning presidents.</p>
<p>Peter Norvig <a href="http://nbviewer.ipython.org/url/norvig.com/ipython/xkcd1313.ipynb">went to town</a> on this,
first clarifying the
problem and then building an algorithm to search for the shortest
solution that matches all mainstream presidential candidates who eventually
won, but not such candidates who never won. (So Perot, Nader, Anderson et al don't figure …</p><p>Like any red-blooded American, I find <a href="http://xkcd.com/1313/">regex golf</a> fascinating. The idea, to
paraphrase the comic strip, is to find a regular expression that matches all members of a group of related terms, but not
any members of a different group. The hover text on the strip suggests a regex that matches all winning presidents.</p>
<p>Peter Norvig <a href="http://nbviewer.ipython.org/url/norvig.com/ipython/xkcd1313.ipynb">went to town</a> on this,
first clarifying the
problem and then building an algorithm to search for the shortest
solution that matches all mainstream presidential candidates who eventually
won, but not such candidates who never won. (So Perot, Nader, Anderson et al don't figure.) Anyway, read his post before
continuing to read this, if you haven't already.</p>
<p>Peter's algorithm is best described by him, but, briefly, it creates a pool of regex fragments, like <code>/r.e$</code> and
repeatedly pulls from it the most effective ones, in the sense that they match as many winners as possible, don't match
any losers and contribute the least to the total length of the regex when they're all <code>|</code>'d together.</p>
<p>I started to wonder whether the problem could be approached as a more more general optimization, with the specifics of the
problem abstracted out into some kind of objective function. Specifically, I was thinking of simulated annealing, which I've
always liked because it has a nice analogy in the physical world and which has been getting a lot press recently as a
possible
application of <a href="http://en.wikipedia.org/wiki/D-Wave_Systems">quantum computing</a>. As always, it also seemed interesting to
try implementing this in Clojure, especially because one doesn't (or I didn't)
think of iterative optimizations over
complex state as natural candidates for a functional lisp.</p>
<p>All the code is on <a href="https://github.com/pnf/clojure-playground/blob/master/src/clj/playground/presidents.clj">github</a>, but
some of it will be recapitulated inline below.</p>
<h2>Simulated Annealing</h2>
<p>In simulated annealing, you have some sort of
state (e.g. the direction of magnetic spins in a crystal lattice) and model for the energy of that state (e.g. a positive contribution
for neighboring spins that are aligned). However the state and its energy are defined, we assume the
<a href="http://en.wikipedia.org/wiki/Boltzmann_distribution">Boltzmann distribution</a> for the likelihood of any given state at any
given temperature:
$$ p \sim e^{- E(S)/k_B T} $$
where $k_B$ is the Boltzmann constant. Sliding past the detailed physics for a moment, the basic idea is that, at high
temperatures, many different fluctuating states are possible even if their energies are high, while, as $T$ gets small, there is
greater punishment of higher energies. Starting a high $T$ and gradually lowering it, we "anneal" into the lowest possible
energy state.</p>
<p><a href="http://en.wikipedia.org/wiki/Metropolis%E2%80%93Hastings_algorithm">Metropolis-Hastings</a> is a Markov-Chain
Monte Carlo algorithm for generating random states that satisfy a complicated probability distribution. It is well suited for the
Boltzmann distribution, which was its earliest application. The gist (explained in terms of Boltzmann) is that you modify the
state $S$ in some (almost) arbitrary fashion to $S' = f_r(S)$ (where the $r$ subscript indicates that this transformation
has some randomness to it) and then use the following criteria to decide whether to accept the transformed state
or to try transforming again:</p>
<ul>
<li>If $E(S') < E(S)$, definitely accept $S'$.</li>
<li>If $E(S') > E(S)$, accept $S'$ with probability $e^{-(E(S') - E(S))/k_B T}$.</li>
</ul>
<p>If you repeat this long enough, the collection of $S$ values will satisfy the Boltzmann distribution.
I said <em>almost</em> any fashion, because the $f_r$ must satisfy some criteria for the algorithm to work at all. For example
the identity function would be terrible, as would any function that can't reach all possible states $S$. Similarly,
if $f_r$ is too close to identity, you'll explore the space very slowly.</p>
<p>The trick of simulated annealing is to lower the $T$ slowly. As the $T$ in the denominator decreases
the magnitude of the negative exponent gets large, and the probability of accepting an increase
in energy approaches zero. At large $T$, the magnitude of the exponent is reduced, thus increasing
the probability that we'll be able to explore even if we're currently in a local minimum.</p>
<h2>Energy and state for the regex problem</h2>
<p>For our purposes, we'll use a state that, echoing Peter's algorithm, is a set of regular expressions that will be or'd together
to form our answer. Instead, however, of starting with fragments, we'll be starting with the least subtle solution possible:</p>
<div class="highlight"><pre><span></span><code><span class="nv">xkcd></span> <span class="p">(</span><span class="k">def </span><span class="nv">initial</span> <span class="p">(</span><span class="nb">set </span><span class="p">(</span><span class="nb">map </span><span class="o">#</span><span class="p">(</span><span class="nb">str </span><span class="s">"^"</span> <span class="nv">%</span> <span class="s">"$"</span><span class="p">)</span> <span class="nv">winners</span><span class="p">)))</span>
<span class="nv">xkcd></span> <span class="nv">initial</span>
<span class="o">#</span><span class="p">{</span><span class="s">"^washington$"</span> <span class="s">"^adams$"</span>, <span class="nv">...</span><span class="p">}</span>
</code></pre></div>
<p>The or-concatenation of this set obviously matches the winners and <em>only</em> the winners. Again, we define winners as people
who eventually won:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="k">def </span><span class="nv">winners</span>
<span class="s">"Anyone who ever won the presidency."</span>
<span class="p">(</span><span class="nb">set </span><span class="p">(</span><span class="nf">str/split</span> <span class="s">"washington adams jefferson jefferson madison madison monroe monroe adams jackson jackson vanburen harrison polk taylor pierce buchanan lincoln lincoln grant grant hayes garfield cleveland harrison cleveland mckinley mckinley roosevelt taft wilson wilson harding coolidge hoover roosevelt roosevelt roosevelt roosevelt truman eisenhower eisenhower kennedy johnson nixon nixon carter reagan reagan bush clinton clinton bush bush obama obama"</span> <span class="o">#</span><span class="s">" "</span><span class="p">)))</span>
<span class="p">(</span><span class="k">def </span><span class="nv">losers</span>
<span class="s">"Anyone who ran as a major party candidate but never won"</span>
<span class="p">(</span><span class="nf">sets/difference</span> <span class="p">(</span><span class="nb">set </span><span class="p">(</span><span class="nf">str/split</span> <span class="s">"clinton jefferson adams pinckney pinckney clinton king adams jackson adams clay vanburen vanburen clay cass scott fremont breckinridge mcclellan seymour greeley tilden hancock blaine cleveland harrison bryan bryan parker bryan roosevelt hughes cox davis smith hoover landon wilkie dewey dewey stevenson stevenson nixon goldwater humphrey mcgovern ford carter mondale dukakis bush dole gore kerry mccain romney"</span> <span class="o">#</span><span class="s">" "</span><span class="p">))</span> <span class="nv">winners</span> <span class="p">))</span>
</code></pre></div>
<p>We can easily verify whether a candidate set of regular expressions satisfies the basic criteria:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">check</span> <span class="p">[</span><span class="nv">res</span><span class="p">]</span>
<span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">re</span> <span class="p">(</span><span class="nf">re-pattern</span> <span class="p">(</span><span class="nb">str</span><span class="nv">/join</span> <span class="s">"|"</span> <span class="nv">res</span><span class="p">))]</span>
<span class="p">(</span><span class="nf">and</span> <span class="p">(</span><span class="nf">every?</span> <span class="o">#</span><span class="p">(</span><span class="nf">re-find</span> <span class="nv">re</span> <span class="nv">%</span><span class="p">)</span> <span class="nv">winners</span><span class="p">)</span>
<span class="p">(</span><span class="nf">not</span> <span class="p">(</span><span class="nf">some</span> <span class="o">#</span><span class="p">(</span><span class="nf">re-find</span> <span class="nv">re</span> <span class="nv">%</span><span class="p">)</span> <span class="nv">losers</span><span class="p">)))))</span>
</code></pre></div>
<p>For lack of anything obviously cleverer, $E(S)$ will just be the length of the regex. Not actually
matching the presidents correctly corresponds to infinite energy, which we'll represent with <code>nil</code>.</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">energy</span> <span class="p">[</span><span class="nv">res</span><span class="p">]</span>
<span class="p">(</span><span class="nb">when </span><span class="p">(</span><span class="nf">check</span> <span class="nv">res</span><span class="p">)</span> <span class="p">(</span><span class="nb">dec </span> <span class="p">(</span><span class="nb">+ </span><span class="p">(</span><span class="nb">count </span><span class="nv">res</span><span class="p">)</span> <span class="p">(</span><span class="nb">reduce + </span><span class="p">(</span><span class="nb">map count </span><span class="nv">res</span><span class="p">))))))</span>
</code></pre></div>
<p>Now, what sorts of perturbations might be interesting? The first thing I thought of was chopping up a randomly chosen
regex at a randomly chosen position:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">get-rand-gt1</span> <span class="s">"Fetch a re of length at least two, as long as we have at least one"</span>
<span class="p">[</span><span class="nv">res</span><span class="p">]</span>
<span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">cands</span> <span class="p">(</span><span class="nb">filter </span><span class="o">#</span><span class="p">(</span><span class="nb">> </span><span class="p">(</span><span class="nb">count </span><span class="nv">%</span><span class="p">)</span> <span class="mi">1</span><span class="p">)</span> <span class="nv">res</span><span class="p">)]</span>
<span class="p">(</span><span class="k">if </span><span class="p">(</span><span class="nb">seq </span><span class="nv">cands</span><span class="p">)</span> <span class="p">(</span><span class="nf">rand-nth</span> <span class="nv">cands</span><span class="p">)</span> <span class="nv">nil</span><span class="p">)))</span>
<span class="p">(</span><span class="kd">defn </span><span class="nv">chop</span>
<span class="s">"Randomly chop one regex into two with |"</span>
<span class="p">[</span><span class="nv">res</span><span class="p">]</span>
<span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">cand</span> <span class="p">(</span><span class="nf">get-rand-gt1</span> <span class="nv">res</span><span class="p">)]</span>
<span class="p">(</span><span class="k">if </span><span class="nv">cand</span>
<span class="p">(</span><span class="nf">sets/union</span> <span class="p">(</span><span class="nb">disj </span><span class="nv">res</span> <span class="nv">cand</span><span class="p">)</span>
<span class="p">(</span><span class="nb">set </span><span class="p">(</span><span class="nb">map </span><span class="nv">str/join</span> <span class="p">(</span><span class="nb">split-at </span><span class="p">(</span><span class="nb">inc </span><span class="p">(</span><span class="nb">rand-int </span><span class="p">(</span><span class="nb">dec </span><span class="p">(</span><span class="nb">count </span><span class="nv">cand</span><span class="p">))))</span> <span class="nv">cand</span> <span class="p">)))))))</span>
</code></pre></div>
<p>This operation should never cause us not to miss a winner, but it might cause us to match
a loser - in which case, the energy will be infinite and it will be rejected.</p>
<p>Frequently, we should also try to get rid of a regex entirely, which, if it succeeds,
will definitely decrease the total length.</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">yank</span> <span class="p">[</span><span class="nv">res</span><span class="p">]</span> <span class="p">(</span><span class="nf">disj</span> <span class="nv">res</span> <span class="p">(</span><span class="nf">rand-nth</span> <span class="p">(</span><span class="nf">seq</span> <span class="nv">res</span><span class="p">))))</span>
</code></pre></div>
<p>I also wrote transformations to
insert a <code>dot</code> into the middle of one and, not competely orthogonally, <code>`decapitate</code> the first
or last character. These exciting functions are on github.</p>
<p>It's especially important that we have at least one transformation that will help us back out of
dead ends. For this, I decided to randomly add back the full regexp for one of the winners, so
we can have a go at chopping it up again.</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">add-back</span> <span class="p">[</span><span class="nv">res</span><span class="p">]</span> <span class="p">(</span><span class="nf">conj</span> <span class="nv">res</span> <span class="p">(</span><span class="nb">str</span> <span class="s">"^"</span> <span class="p">(</span><span class="nf">rand-nth</span> <span class="p">(</span><span class="nf">seq</span> <span class="nv">winners</span><span class="p">))</span> <span class="s">"$"</span><span class="p">)))</span>
</code></pre></div>
<p>This operation will always increase the energy of the state, so it will become less and less likely
at low temperatures.</p>
<p>Finally, everything gets bundled into a single <code>perturb</code> function that picks one of the transformations
randomly and applies it.</p>
<h2>The optimization algorithm</h2>
<p>The strategy is to perturb the state randomly, possibly accept the change, slightly reduce the temperature and repeat.
The temperature reduction will be achieved by repeatedly multiplying by <code>1.0-dT</code>. Without loss of
generality, we take $k_B=1$.</p>
<p>This doesn't take very much code at all!
The <code>steps</code> function produces an infinite lazy sequence of vectors
<code>[S E T]</code> vectors:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">steps</span> <span class="p">[</span><span class="nv">S0</span> <span class="nv">energy</span> <span class="nv">perturb</span> <span class="nv">T</span> <span class="nv">dT</span><span class="p">]</span>
<span class="p">(</span><span class="nf">letfn</span> <span class="p">[(</span><span class="nf">step</span> <span class="p">[[</span><span class="nv">S</span> <span class="nv">E</span> <span class="nv">T</span><span class="p">]]</span>
<span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">S2</span> <span class="p">(</span><span class="nf">perturb</span> <span class="nv">S</span><span class="p">)</span>
<span class="nv">E2</span> <span class="p">(</span><span class="nf">energy</span> <span class="nv">S2</span><span class="p">)</span>
<span class="p">[</span><span class="nv">S</span> <span class="nv">E</span><span class="p">]</span> <span class="p">(</span><span class="k">if</span> <span class="p">(</span><span class="nf">and</span> <span class="nv">E2</span> <span class="p">(</span><span class="nf">></span> <span class="p">(</span><span class="nf">Math/exp</span> <span class="p">(</span><span class="nf">/</span> <span class="p">(</span><span class="nf">-</span> <span class="nv">E</span> <span class="nv">E2</span><span class="p">)</span> <span class="nv">T</span><span class="p">))</span> <span class="p">(</span><span class="nf">rand</span><span class="p">)))</span>
<span class="p">[</span><span class="nv">S2</span> <span class="nv">E2</span><span class="p">]</span> <span class="p">[</span><span class="nv">S</span> <span class="nv">E</span><span class="p">])]</span>
<span class="p">[</span><span class="nv">S</span> <span class="nv">E</span> <span class="p">(</span><span class="nf">*</span> <span class="nv">T</span> <span class="p">(</span><span class="nf">-</span> <span class="mf">1.0</span> <span class="nv">dT</span><span class="p">))]))]</span>
<span class="p">(</span><span class="nb">iterate </span><span class="nv">step</span> <span class="p">[</span><span class="nv">S0</span> <span class="p">(</span><span class="nf">energy</span> <span class="nv">S0</span><span class="p">)</span> <span class="nv">T</span><span class="p">])))</span>
</code></pre></div>
<p>Remember that <code>iterate</code> basically does the <code>cons</code>/<code>lazy-seq</code> shuffle for the case where
the function only needs to consume its own output.</p>
<p>I think that's pretty neat. The code is compact, general, and thanks once again to persistent
data structures, it gets to be purely functional.</p>
<p>To make some sense of this, I wrote something to filter out uninteresting states, displaying every <code>dn</code>th
step, or when the energy is at least <code>dE</code> less than the previous minimum.</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">annotate</span> <span class="p">[</span><span class="nv">steps</span> <span class="nv">dn</span> <span class="nv">dE</span><span class="p">]</span>
<span class="p">(</span><span class="nf">letfn</span> <span class="p">[(</span><span class="nf">annotate*</span> <span class="p">[</span><span class="nv">steps</span> <span class="nv">n</span> <span class="nv">minE</span><span class="p">]</span>
<span class="p">(</span><span class="k">let </span><span class="p">[[</span><span class="nv">S</span> <span class="nv">E</span> <span class="nv">T</span><span class="p">]</span> <span class="p">(</span><span class="k">first </span><span class="nv">steps</span><span class="p">)</span>
<span class="nv">newMinE</span> <span class="p">(</span><span class="k">if</span> <span class="nv">minE</span> <span class="p">(</span><span class="nb">min</span> <span class="nv">E</span> <span class="nv">minE</span><span class="p">)</span> <span class="nv">E</span><span class="p">)</span>
<span class="nv">out1</span> <span class="p">(</span><span class="k">when </span><span class="p">(</span><span class="nf">and</span> <span class="nv">dE</span> <span class="nv">minE</span> <span class="p">(</span><span class="nf"><=</span> <span class="nv">dE</span> <span class="p">(</span><span class="nf">-</span> <span class="nv">minE</span> <span class="nv">newMinE</span><span class="p">)))</span> <span class="p">(</span><span class="nb">str</span> <span class="s">"*** "</span> <span class="nv">E</span> <span class="s">" "</span> <span class="p">(</span><span class="nb">str</span><span class="nv">/join</span> <span class="s">"|"</span> <span class="nv">S</span><span class="p">)))</span>
<span class="nv">out2</span> <span class="p">(</span><span class="k">when </span><span class="p">(</span><span class="nf">and</span> <span class="nv">dn</span> <span class="p">(</span><span class="nb">zero? </span><span class="p">(</span><span class="nf">mod</span> <span class="nv">n</span> <span class="nv">dn</span><span class="p">)))</span> <span class="p">(</span><span class="nb">str</span> <span class="nv">n</span> <span class="s">" "</span> <span class="nv">step</span><span class="p">))</span>
<span class="nv">tail</span> <span class="p">(</span><span class="nf">lazy-seq</span> <span class="p">(</span><span class="nf">annotate*</span> <span class="p">(</span><span class="nb">next</span> <span class="nv">steps</span><span class="p">)</span> <span class="p">(</span><span class="nb">inc </span><span class="nv">n</span><span class="p">)</span> <span class="nv">newMinE</span><span class="p">))]</span>
<span class="p">(</span><span class="k">if</span> <span class="p">(</span><span class="nf">or</span> <span class="nv">out1</span> <span class="nv">out2</span><span class="p">)</span> <span class="p">(</span><span class="nf">cons</span> <span class="p">(</span><span class="nb">str</span> <span class="nv">out1</span> <span class="nv">out2</span><span class="p">)</span> <span class="nv">tail</span><span class="p">)</span> <span class="nv">tail</span><span class="p">)))]</span>
<span class="p">(</span><span class="nf">annotate*</span> <span class="nv">steps</span> <span class="mi">0</span> <span class="nv">nil</span><span class="p">)))</span>
</code></pre></div>
<p>When and only when there's something interesting to report, we <code>cons</code> an explanatory string to the sequence,
so this sequence is far sparser than the input <code>steps</code>.</p>
<p>I'll invoke a simulation from the REPL with</p>
<div class="highlight"><pre><span></span><code>(doseq [x (annotate2 (steps2 initial energy 5.0 0.0000001) nil 1)] (println x))
</code></pre></div>
<p>The <code>doseq</code> forces the lazy sequence to evaluate without holding on to the head. The starting temperature
of 5.0 is not completely arbitrary: we want something such that the probability of of an add-back being
accepted is high enough to occur occasionally.</p>
<p>Here's some typical output, with <code>. . .</code> indicating elisions. </p>
<div class="highlight"><pre><span></span><code>*** 329 adams$|^reagan$|^obama$|^mckinley$|^vanburen$|^eisenhower$|^kennedy$|^lincoln$|^truman$|^taylor$|^garfield$|^harrison$|^coolidge$|^grant$|^nixon$|^roosevelt$|^polk$|^madison$|^monroe$|^taft$|^harding$|^hoover$|^cleveland$|^wilson$|^jackson$|^hayes$|^buchanan$|^washington$|^jefferson$|^carter$|^johnson$|^clinton$|^bush$|^pierce$
*** 328 adams$|^reagan$|^obama$|^mckinley$|^vanburen$|truman$|^eisenhower$|^kennedy$|^lincoln$|^taylor$|^garfield$|^harrison$|^coolidge$|^grant$|^nixon$|^roosevelt$|^polk$|^madison$|^monroe$|^taft$|^harding$|^hoover$|^cleveland$|^wilson$|^jackson$|^hayes$|^buchanan$|^washington$|^jefferson$|^carter$|^johnson$|^clinton$|^bush$|^pierce$
. . .
*** 107 rs|r.i.|ln|ru|o...$|i..n|o...e|a.t|ks|hn.|.n..y$|am|ye|..olidge$|.i.so|^c.eve.a.|ie.|bu..|t.n|r..g..|po|a.l
*** 106 rs|ln|ru|o...$|i..n|o...e|a.t|ks|hn.|.n..y$|am|ye|r.i|..olidge$|.i.so|^c.eve.a.|ie.|bu..|t.n|r..g..|po|a.l
*** 105 .olidge$|rs|ru|^taylor$|o...$|i..n|o...e|hn|i..o|a.t|ks|.n..y$|am|ye|r.i|^c.eve.a.|ie.|bu..|t.n|r..g..|po
*** 104 rs|ru|^taylor$|o...$|i..n|o...e|hn|i..o|a.t|ks|.n..y$|am|ye|r.i|olidge$|^c.eve.a.|ie.|bu..|t.n|r..g..|po
. . .
</code></pre></div>
<p>After a few minutes, one of my runs produced</p>
<div class="highlight"><pre><span></span><code>*** 52 n.e|ho|ls|a.t|a..i|j|^n|bu|v.l|ma|a.a|ay.|r.e$|li|po
</code></pre></div>
<p>Not half bad.</p>
<h2>So what?</h2>
<p>Obviously the algorithm could be improved, perhaps significantly, but I decided not to try (yet).
First, it's slightly under the par established by Peter Norvig and so is probably
probably not terrible.
Second, part of the appeal of this approach was that it wouldn't require too much fiddling.
I don't mean to imply that this is a "better" approach; it burns considerable CPU for essentially
the same quality answer, but it satisfied my goals of abstracting the domain specifics, allowing me
finally to use simulated annealing for something and demonstrating another sort of program that
can be handled gracefully in Clojure.</p>Hack the Wine2014-01-09T23:00:00-05:002014-01-09T23:00:00-05:00Peter Fraenkeltag:blog.podsnap.com,2014-01-09:/hacking_wine.html<p>No, not <a href="http://www.winehq.org/">that wine</a>. If you want to emulate Windows, just type <code>:(){ :|:& };:</code> in a bash session.</p>
<p>This is going to be about the consumption of moderately priced fermented grape
beverages in restaurants. Since my unrefined remarks are likely to upset a few
people, I shall start by insulting them.
There are three archetypes of wine experts:</p>
<ul>
<li>Comic-book Guy: Providing an outlet for obsessive accumulation of
arcane knowledge.</li>
<li>Barney: Assuring you that that everybody’s special and nobody’s
wrong.</li>
<li>Ivan Illych: Promoting a lifestyle that demonstrates how much money
you have but presages a painful death.</li>
</ul>
<p>To this list, Mr …</p><p>No, not <a href="http://www.winehq.org/">that wine</a>. If you want to emulate Windows, just type <code>:(){ :|:& };:</code> in a bash session.</p>
<p>This is going to be about the consumption of moderately priced fermented grape
beverages in restaurants. Since my unrefined remarks are likely to upset a few
people, I shall start by insulting them.
There are three archetypes of wine experts:</p>
<ul>
<li>Comic-book Guy: Providing an outlet for obsessive accumulation of
arcane knowledge.</li>
<li>Barney: Assuring you that that everybody’s special and nobody’s
wrong.</li>
<li>Ivan Illych: Promoting a lifestyle that demonstrates how much money
you have but presages a painful death.</li>
</ul>
<p>To this list, Mr. Podsnap would like to add his illustrious
moniker. He will approach the topic from life-hacking perspective - a
rational, compact framework for reliably enjoying oneself - while also
contributing to the world's store of ironically ambiguous snark and
slapdash statistical reasoning.</p>
<p>Ranked in decreasing ratio of reward to effort, here's what you must do:</p>
<ul>
<li>
<p>Don't "relax." Even if the wine doesn't make you nervous,
it's a big scary world, and there must be something you should
worry about.</p>
</li>
<li>
<p>Find out what “corked” wine smells like. About 7-10% of all
bottles have been ruined by fungi that grow on the cork and
produce nasty chlorinated compounds. If the wine is corked,
you’re always allowed to send it back. The
smell of corked wine is variously described as “wet dog” or “moldy
newspaper,” but who knows what that means.
Don’t be like me and spend years consuming ruined plonk under the
impression you’re too stupid to appreciate it. The best thing to
do is ask at a restaurant where they seem nice if you can sniff a
bottle that’s been returned for cork taint. Once it clicks,
you’ll never forget the smell.</p>
</li>
<li>
<p>Hack the sommelier! Or whoever the designated wine person seems to be.
The vast majority of them are just eager to
talk to someone who seems interested in this thing they’ve spent
years studying, but instead they have to suck up to pompous
twits trying to impress their dates.
Most have no interest in judging you or even in what you spend.
If you<s>socially engineer them</s>'re friendly and engaged, they’ll fall all over themselves
trying to help, let you taste stuff without buying it and
downsell you to cheaper bottles.</p>
</li>
<li>
<p>Detect and defuse villainy. Be alert for any of the following:</p>
<ul>
<li>The list has no vintages on it.</li>
<li>Someone says, "whaddya like, dry, full bodied?"</li>
<li>You're assured that "everyone loves this one."</li>
<li>You get pressured to choose something more expensive.</li>
</ul>
<p>For the 100% markup over retail they're charging, you should be treated
very nicely. Should any of the above occur, just say no. Get a beer. Or leave.</p>
</li>
<li>
<p>Alternatively, bring your own bottle as a backup. Most restaurants
will open it for a “corkage fee” of ten to twenty dollars. There's
something enjoyably hostile about paying \$15 for them to open
something you bought for \$7, because you have so little respect for
their selection.</p>
</li>
<li>
<p>Reject entire regions or varietals out of hand, for reasons
you explain only by rolling your eyes.
It's insanely unlikely that the wine
you would have chosen from a broader slate would have delivered so much
sensate pleasure as to compete with the advantages of a
crisp reduction in mental clutter.</p>
<p>Let us suppose that the enjoyment you
will derive from a particular wine is represented by a number $q$, which
is uniformally distributed between 0 and 1, where 0 is vomitous.
Given $n$ random wines, the probability that
the best of them has quality less than $q$ is $q^n$; the expected quality
of the best of $n$ is $\int_0^1 n q q^{n-1} dq = n/(n+1)$.</p>
<p>If that malarky is working for you,
let me suggest further that the probability distribution for your
actual choice is $(1+b/n)q^{b/n}$. For $n$ significantly above a befuddlement
level $b$, your choice is no better than random, but as $n$ goes down and
you have time to think, the distribution is skewed to the right. The expected value
of your choice is $(1+b/n)/(2+b/n)$; far into the befuddlement regime, the
expectation is as if you chose randomly. So, ignoring a few unpleasant
discretization issues, we have:</p>
<p><img alt="" src="./images/wine-quality.png"></p>
<p>You really don't want more than around ten bottles to choose from. That's high
enough that there's likely to be something good left and low enough that you
aren't too confused to care.</p>
</li>
<li>
<p>Forget pairing. It just complicates things and makes less of a difference
than Comic Book Guy would have you believe. Most of the time that
pairing is "difficult," the real answer is beer. As with the previous
item, expressing this with conviction makes you bad-ass.</p>
</li>
<li>
<p>Pick a few good sensory eigenvectors.
Despite the garish profusion of wine adjectives, two dimensions will get you
very close to describing what you like:</p>
<ul>
<li>A sort of outdoorsy, pleasantly decomposing smell of “earth,” which generally
mitigates obvious fruitiness.</li>
<li>The trace slight of vanilla and other cookie ingredients indicative of "oak."</li>
</ul>
<p>You can also talk about things like lavender and pencil lead, the mere mention
of which makes it likely that you'll detect them in whatever you're drinking, even Shasta.
I have
verified such suggestibility by performing unethical experiments on my friends.</p>
</li>
<li>
<p>Now look at the rough correspondence of those dimensions to regions and varietals.
Following my own advice, I've arbitrarily wiped white wines (I mean, come on!) off the map,
so here it is for red wine. I've added
the dimension of "red fruit" vs "black fruit," since it means something to me; the latter
means prunier.</p>
<table>
<thead>
<tr>
<th>Earth,Oak,Black</th>
<th>$\rightarrow$ Wines</th>
</tr>
</thead>
<tbody>
<tr>
<td>0 1 0</td>
<td>Zinfandel, New World Pinot/Grenache/Pinotage</td>
</tr>
<tr>
<td>0 1 1</td>
<td>Shiraz, South-American Cabernet, Petite Syrah, US Syrah/Merlot/Cabernet</td>
</tr>
<tr>
<td>1 0 0</td>
<td>Nebbiolo (incl Barolo), Sangiovese (incl Brunello, Chianti), some French/Spanish Grenache, Gamay (Beaujolais), Trousseau (Arbois),</td>
</tr>
<tr>
<td>1 0 1</td>
<td>Dolcetto, Barbera, French Malbec, Valpoliccela, Sagrantino, Nero d'Avola</td>
</tr>
<tr>
<td>1 1 0</td>
<td>French Pinot (Burgundy), some French/Spanish Grenache</td>
</tr>
<tr>
<td>1 1 1</td>
<td>French/Italian/Spanish Cabernet/Merlot incl Bourdeaux, Cabernet Franc, French Syrah, Syrah+Grenache+Carignan, Mourvèdre, Priorat, Garncha+Syrah+Carignan</td>
</tr>
</tbody>
</table>
<p>If you like or dislike something, noting the row it's in will give you an idea of why and
what you might like in the future. Among other
incredibly reductionist lessons to draw are:
* The European stuff is funkier and not as fruity.
* The French like oak more than the Italians.
* Grenache seems to be a bit diffuse in this basis. Maybe think about something else.</p>
</li>
<li>
<p>Remember the bad times. There are many ways that regional weather can trash
a year's batch of wine. That doesn't mean that everything from a bad year is awful,
but unless your sommelier buddy has a good story for why its an exception,
don't take the chance.</p>
<p>Here are recent bad years, by region.
* France: 2011, 2006
* Northern Italy: 2008, 2005
* Oregon/Washington: 2011, 2010
* New Zealand: 2010
* Argentina: 2011</p>
<p>There are "good years" of course, but they're not all <em>that</em> good, and
the prices get bid way up.</p>
</li>
<li>
<p>As I said, some flavor "notes" are a bit silly, and some, while
less silly, are unlikely to be the determining factor in
your enjoyment.</p>
<ul>
<li>Syrah, Mourvèdre and Sangiovese may have a whiff of cured pork product about them.</li>
<li>Pinot and Nebbiolo can smell like flowers. Pretty flowers.</li>
</ul>
</li>
</ul>
<p>There you are. Go drink something.</p>
<!--
set grid
plot [0:100] x/(1+x) with lines title "Expected value of best wine" , (1+10/x)/(2+10/x) with lines title "Expected value of choice with befuddlement level of 10", (1+20/x)/(2+20/x) with lines title "Expected value of choice with befuddlement level of 20"
-->Notes on Arrogance2013-12-29T16:47:00-05:002013-12-29T16:47:00-05:00Peter Fraenkeltag:blog.podsnap.com,2013-12-29:/arrogance.html<h1>What is Arrogance</h1>
<p>Recently, while reviewing some code, I found myself using the phrase,
"breathtaking arrogance" and later wondering exactly what I
meant. Well, I knew what I meant, but I did wonder what I might be
implying.</p>
<p>I feel reasonably confident that arrogance involves an offensive level
of self-assuredness and claims of superiority. I am less sure</p>
<ol>
<li>about whether privately held (but possibly inferable) belief in one's own superiority counts as arrogance,</li>
<li>of the degree to which arrogance can be collective, or achieved by association with those one believes to be superior,</li>
<li>of the relevance of measurable or generally …</li></ol><h1>What is Arrogance</h1>
<p>Recently, while reviewing some code, I found myself using the phrase,
"breathtaking arrogance" and later wondering exactly what I
meant. Well, I knew what I meant, but I did wonder what I might be
implying.</p>
<p>I feel reasonably confident that arrogance involves an offensive level
of self-assuredness and claims of superiority. I am less sure</p>
<ol>
<li>about whether privately held (but possibly inferable) belief in one's own superiority counts as arrogance,</li>
<li>of the degree to which arrogance can be collective, or achieved by association with those one believes to be superior,</li>
<li>of the relevance of measurable or generally accepted expertise in
the area where superiority is claimed, and</li>
<li>of the importance of the general prevalence of such expertise.</li>
</ol>
<p>That I did not bother to look up "arrogant" in a dictionary
indicates a certain level of confidence on my part. If my definition
turns out to be correct, or to reasonably approximate what seems to be
the consensus view in this matter, would that make me less arrogant?
Is there such a thing as legitimate arrogance, or does arrogance
reflect only misplaced confidence?</p>
<p>If, on the other hand, I had begun this article with the phrase,
"According to Webster's Dictionary,"
would that be a kind of ostentatious affiliation
with the Genteel Society of Learned Persons?
From there, I could apologize for opening with such a tired
construction, refer archly to the legal insignificance of the name
Webster and daintily confess a weakness for Victorian Capitalization.
All this in a slight Southern accent to set you at ease.</p>
<p>What if I did look it up and lied about it? What if everyone knows
what arrogance is anyway?</p>
<h1>Notes on arrogance</h1>
<h2>Arrogance in software</h2>
<ol>
<li>Unnecessary abstraction. Calling it a monad when you don't have to, especially if it is not a monad but even if it is.</li>
<li>Categorizing other people's abstractions as unnecessary.</li>
<li>Unnecessary nomenclature, especially design patterns, most especially the visitor pattern, which it really
isn't.</li>
<li>Comments advising the reader to go off and study.</li>
<li>Comments advising the reader to study specific documents that are likely to be intimidating.</li>
<li>Comments advising the reader to read an unpublished monograph by Simon Peyton Jones that presupposes knowledge of Haskell.</li>
<li>Comments advising the reader to learn something that the comment writer doesn't know, usually the CAP theorem.</li>
<li>Comments advising the reader to learn something that is not necessary to learn right now, usually the CAP theorem.</li>
<li>Referring to the "so-called" CAP theorem, or never calling it anything but Brewer's theorem.</li>
<li>Any comment with a wikipedia URL in it.</li>
<li>Comments pointing out that there is probably a bug here but not elaborating.</li>
<li>Comments that are initialed by their author, to distinguish them from lesser comments.</li>
<li>Comments containing the word "dubious."</li>
<li>Unexplained, possibly clever one-liners.</li>
<li>One-liners that are explained but as a consequence are no longer one-liners.</li>
<li>Noting that there is no such thing as a stupid question and illustrating this with a cartoon drawing of a
stupid looking person with question marks buzzing about his head.</li>
<li>Noting that there is indeed such a thing as a stupid question but that the stupidity can be mitigated by bothering
someone other than the author.</li>
</ol>
<h2>Arrogance in general competitive endeavor</h2>
<ol>
<li>Perhaps arrogance is necessary to achieve success, subsequent
arrogance concerning which is another matter entirely.</li>
<li>I invent a new programming language, or something I
call a framework. Or something. In many ways, it sucks, but in the
fullness of time it may attain some kind of critical mass, the dangling bugs
will get fixed, an ecosystem will emerge, and the world will be its
oyster.</li>
<li>Much of the criticism of my language (or the
the thing that I call a framework) is justified; moreover, the
critics can point to concrete flaws, while I can only put forward
abstract arguments and hypotheticals. On the other hand, I don't
have to play fair. I can lie. I can magnify my qualifications. I
can deride my critics as incompetents. I can bandy about computer
science terminology that makes others feel inadequate. I can
assert that certain bugs are already fixed in the release
candidate.</li>
<li>Years go by. I am considered ornery and arrogant, but this thing I
made is now popular and established, while competing technologies
have faded away. My arrogance allowed me to retain confidence in
the face of difficulty and skepticism, and the proof of the pudding
is in my enormous user base.</li>
<li>More years go by. I command minions. My minions have a reputation
for arrogance, but it serves them well. We collectively
ignore criticisms while honing our revenue model. Bad things seem
to happen to people who criticize us, so very few do.</li>
<li>More years, more minions. At this point, we've ignored so many
criticisms that entire industries have arisen to address them.
Increasingly, malcontents criticize with impunity. It has been
pointed out that we don't know what functors are. The wall has
writing on it.</li>
<li>Assume that the language (or framework) was a good thing, and that
arrogance was instrumental in its coming about. Does this justify
arrogance?</li>
<li>Does the employment of arrogance in establishing a construct
imply that arrogance will play a role in its demise, or is that
just a coincidence?</li>
</ol>
<h2>Perceptions and expressions of arrogance</h2>
<ol>
<li>
<p>The pretense of humility can be effective as arrogance, but
not if it is too convincing or too unconvincing.</p>
</li>
<li>
<p>Ironic self-deprecation may be taken as a literal admission of
weakness, genuine self doubt as a layering strategy for larger arrogance.</p>
</li>
<li>
<p>More people are impressed by arrogance - in any form - than you might think.</p>
</li>
<li>
<p>Arrogance by proxy, with the expectation that it will redound to
the speaker. E.g. on behalf of one's offspring or colleagues.</p>
</li>
<li>
<p>Deprecation of others' competence can be equivalent to inflation of
one's own. This is obvious.</p>
</li>
<li>
<p>Flattering others' competence in an area that is then derided as
unimportant. Almost as obvious.</p>
</li>
<li>
<p>Deriding as unimportant abilities possessed but not valued by oneself,
knowing that others are proud to have them.</p>
</li>
<li>
<p>Pretending to have only feeble abilities in an area for which one
is renowned. "I'm a simple person, and I like simple code."</p>
</li>
<li>
<p>Praising another person for their perceptive praise of you. E.g.
Joan Didion eulogizing her husband for his love of her writing.</p>
</li>
<li>
<p>Shock and awe derision of a figure nobody else would dare
criticize, e.g. Sontag asserting that Mozart is camp.</p>
</li>
<li>
<p>Glib reference to primary sources, and not inferior translations.</p>
</li>
<li>
<p>Feigned ignorance of popular culture or practical skills. </p>
</li>
<li>
<p>Apologizing for brazen misconduct is a form of arrogance.</p>
</li>
<li>
<p>Treatment of hard-wired skills as a matter of trying harder.</p>
</li>
</ol>
<h2>Circumstance of arrogance</h2>
<ol>
<li>
<p>Which is worse, arrogance justified by actual superiority, or
arrogance as part of feigning superiority?</p>
</li>
<li>
<p>Temporary humility to facilitate attaining skills that will later be used to justify
arrogance.</p>
</li>
<li>
<p>Arrogance about one's humility, vs arrogance expressed through
feigned humility that is not expected to be believed.</p>
</li>
<li>
<p>Talismanic self-criticism that goes not before a fall but is otherwise unfelt.</p>
</li>
<li>
<p>Juvenile arrogance due to limited exposure to actual greatness and
bad statistical intuition.</p>
</li>
<li>
<p>Smug confidence that the young will someday settle for
mediocrity, or at least are statistically likely to do so.</p>
</li>
<li>
<p>Arrogance after achieving measurable success. Conflating ordering
and causation.</p>
</li>
<li>
<p>Arrogant dismissal of measurable success on the grounds that an
unknown part of it was luck.</p>
</li>
</ol>
<h2>Plain old arrogance</h2>
<ol>
<li>Overblown self-promotion. There is a man who calls
himself "America's Life Transformation Dentist."</li>
</ol>Headfirst Search2013-11-22T10:00:00-05:002013-11-22T10:00:00-05:00Peter Fraenkeltag:blog.podsnap.com,2013-11-22:/functional-dfs.html<p>In <a href="http://blog.podsnap.com/game-set-match.html">another post</a>, I prattled on at some length about the scala <code>Set</code>
class. To understand its nuances, it was helpful to print out a graph of class and trait inheritance.
Here's a contrived example that's simpler than <code>Set</code>:</p>
<div class="highlight"><pre><span></span><code><span class="k">trait</span> <span class="nc">C1</span> <span class="p">{}</span>
<span class="k">trait</span> <span class="nc">C2</span> <span class="p">{}</span>
<span class="k">trait</span> <span class="nc">D</span> <span class="k">extends</span> <span class="nc">C1</span> <span class="k">with</span> <span class="nc">C2</span> <span class="p">{}</span>
<span class="k">trait</span> <span class="nc">E1</span> <span class="k">extends</span> <span class="nc">D</span> <span class="p">{}</span>
<span class="k">trait</span> <span class="nc">E2</span> <span class="k">extends</span> <span class="nc">D</span> <span class="p">{}</span>
<span class="k">trait</span> <span class="nc">E3</span> <span class="k">extends</span> <span class="nc">C2</span> <span class="p">{}</span>
<span class="k">trait</span> <span class="nc">F</span> <span class="k">extends</span> <span class="nc">E1</span> <span class="k">with</span> <span class="nc">E2</span> <span class="k">with</span> <span class="nc">E3</span> <span class="p">{}</span>
</code></pre></div>
<p>The hierarchy of F looks like:</p>
<div class="highlight"><pre><span></span><code> C1 C2
\ / \
D \
/ \ \
E1 E2 E3
\ /__/
F
</code></pre></div>
<p>which the proposed utility will display as:</p>
<div class="highlight"><pre><span></span><code>interface F
interface E1
interface D
interface C1
interface C2 …</code></pre></div><p>In <a href="http://blog.podsnap.com/game-set-match.html">another post</a>, I prattled on at some length about the scala <code>Set</code>
class. To understand its nuances, it was helpful to print out a graph of class and trait inheritance.
Here's a contrived example that's simpler than <code>Set</code>:</p>
<div class="highlight"><pre><span></span><code><span class="k">trait</span> <span class="nc">C1</span> <span class="p">{}</span>
<span class="k">trait</span> <span class="nc">C2</span> <span class="p">{}</span>
<span class="k">trait</span> <span class="nc">D</span> <span class="k">extends</span> <span class="nc">C1</span> <span class="k">with</span> <span class="nc">C2</span> <span class="p">{}</span>
<span class="k">trait</span> <span class="nc">E1</span> <span class="k">extends</span> <span class="nc">D</span> <span class="p">{}</span>
<span class="k">trait</span> <span class="nc">E2</span> <span class="k">extends</span> <span class="nc">D</span> <span class="p">{}</span>
<span class="k">trait</span> <span class="nc">E3</span> <span class="k">extends</span> <span class="nc">C2</span> <span class="p">{}</span>
<span class="k">trait</span> <span class="nc">F</span> <span class="k">extends</span> <span class="nc">E1</span> <span class="k">with</span> <span class="nc">E2</span> <span class="k">with</span> <span class="nc">E3</span> <span class="p">{}</span>
</code></pre></div>
<p>The hierarchy of F looks like:</p>
<div class="highlight"><pre><span></span><code> C1 C2
\ / \
D \
/ \ \
E1 E2 E3
\ /__/
F
</code></pre></div>
<p>which the proposed utility will display as:</p>
<div class="highlight"><pre><span></span><code>interface F
interface E1
interface D
interface C1
interface C2
interface E2
interface D
...
interface E3
interface C2
...
</code></pre></div>
<p>Note that the type data will come via Java introspection, and Java doesn't know from traits, so they'll show up here
as interfaces. The <code>...</code> stands in for the traits of traits we've already printed out and don't need to print again.</p>
<h2>A slightly less contrived example</h2>
<div class="highlight"><pre><span></span><code><span class="k">class</span> <span class="n">scala</span>.<span class="n">math</span>.<span class="n">BigInt</span>
<span class="k">class</span> <span class="n">scala</span>.<span class="n">math</span>.<span class="n">ScalaNumber</span>
<span class="k">class</span> <span class="n">java</span>.<span class="nb">lang</span>.<span class="n">Number</span>
<span class="k">class</span> <span class="n">java</span>.<span class="nb">lang</span>.<span class="n">Object</span>
<span class="n">interface</span> <span class="n">java</span>.<span class="n">io</span>.<span class="n">Serializable</span>
<span class="n">interface</span> <span class="n">scala</span>.<span class="n">math</span>.<span class="n">ScalaNumericConversions</span>
<span class="n">interface</span> <span class="n">scala</span>.<span class="n">math</span>.<span class="n">ScalaNumericAnyConversions</span>
<span class="n">interface</span> <span class="n">scala</span>.<span class="n">Serializable</span>
<span class="n">interface</span> <span class="n">java</span>.<span class="n">io</span>.<span class="n">Serializable</span>
...
</code></pre></div>
<p>contains more information than scaldocs, in a form that I find more digestible than scaladocs.
In addition to the direct "children," as specified in the class's declaration</p>
<div class="highlight"><pre><span></span><code><span class="k">class</span> <span class="n">BigInt</span> <span class="k">extends</span> <span class="n">ScalaNumber</span> <span class="n">with</span> <span class="n">ScalaNumericConversions</span> <span class="n">with</span> <span class="n">Serializable</span>
</code></pre></div>
<p>we can see the the linear trail of class extension all the way to <code>java.lang.Object</code>,
as well as the combining graph of traits and interfaces, showing both routes to <code>java.io.Serializable</code>.</p>
<p>What I want to talk about, though, is not the utility of the outline, but the ways one might generate it.
This is a classic depth-first-search, with the minor twists that (1) we want to indent by depth and
(2), because it's a directed acyclic graph rather than
a tree, we need to watch out for nodes that show up more than once, so as not to print out the same subgraphs
over and over again.</p>
<h2>A standard recursive solution</h2>
<p>looks like this, using recursion and carrying along a mutable set in
which to record the visited nodes:</p>
<div class="highlight"><pre><span></span><code><span class="k">import</span> <span class="nn">scala</span><span class="p">.</span><span class="nn">collection</span><span class="p">.</span><span class="nn">mutable</span><span class="p">.{</span><span class="nc">Set</span> <span class="o">=></span> <span class="nc">MSet</span><span class="p">}</span>
<span class="k">def</span> <span class="nf">inhGraph1</span><span class="p">(</span><span class="n">c</span> <span class="p">:</span> <span class="nc">Class</span><span class="p">[</span><span class="n">_</span><span class="p">],</span>
<span class="n">indent</span> <span class="p">:</span> <span class="nc">Int</span> <span class="o">=</span> <span class="mi">0</span><span class="p">,</span>
<span class="n">seen</span> <span class="p">:</span> <span class="nc">MSet</span><span class="p">[</span><span class="nc">Class</span><span class="p">[</span><span class="n">_</span><span class="p">]]</span> <span class="o">=</span> <span class="nc">MSet</span><span class="p">[</span><span class="nc">Class</span><span class="p">[</span><span class="n">_</span><span class="p">]]())</span> <span class="p">:</span> <span class="nc">Unit</span> <span class="o">=</span> <span class="p">{</span>
<span class="n">println</span><span class="p">(</span><span class="s">" "</span><span class="o">*</span><span class="n">indent</span> <span class="o">+</span> <span class="n">c</span><span class="p">.</span><span class="n">toString</span><span class="p">)</span>
<span class="k">if</span><span class="p">(</span><span class="n">seen</span><span class="p">.</span><span class="n">contains</span><span class="p">(</span><span class="n">c</span><span class="p">))</span> <span class="p">{</span>
<span class="c1">// Do not recurse to children if we have seen this node before.</span>
<span class="n">println</span><span class="p">(</span><span class="s">" "</span><span class="o">*</span><span class="n">indent</span> <span class="o">+</span> <span class="s">"..."</span><span class="p">)</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
<span class="c1">// Record that we have seen the node.</span>
<span class="n">seen</span> <span class="o">+=</span> <span class="n">c</span>
<span class="c1">// Check out the superclass</span>
<span class="kd">val</span> <span class="n">s</span> <span class="o">=</span> <span class="n">c</span><span class="p">.</span><span class="n">getSuperclass</span>
<span class="k">if</span><span class="p">(</span> <span class="n">s</span> <span class="o">!=</span> <span class="kc">null</span><span class="p">)</span>
<span class="n">inhGraph1</span><span class="p">(</span><span class="n">s</span><span class="p">,</span><span class="n">indent</span><span class="o">+</span><span class="mi">2</span><span class="p">,</span><span class="n">seen</span><span class="p">)</span>
<span class="c1">// And the interfaces</span>
<span class="k">for</span> <span class="p">(</span><span class="n">cc</span> <span class="o"><-</span> <span class="n">c</span><span class="p">.</span><span class="n">getInterfaces</span><span class="p">)</span>
<span class="n">inhGraph1</span><span class="p">(</span><span class="n">cc</span><span class="p">,</span><span class="n">indent</span><span class="o">+</span><span class="mi">2</span><span class="p">,</span><span class="n">seen</span><span class="p">)</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<p>That's simple enough, but this mutable set business is a bit unsightly.
Mutability may, in this case, be pragmatic and obviously harmless, but
we're civilized, and we don't do that kind of thing. Better, I think is</p>
<h2>using lovely, immutable, persistent data structures to write code that doesn't actually work</h2>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">inhGraph2</span><span class="p">(</span><span class="n">c</span> <span class="p">:</span> <span class="nc">Class</span><span class="p">[</span><span class="n">_</span><span class="p">],</span>
<span class="n">indent</span> <span class="p">:</span> <span class="nc">Int</span> <span class="o">=</span> <span class="mi">0</span><span class="p">,</span>
<span class="n">seen</span> <span class="p">:</span> <span class="nc">Set</span><span class="p">[</span><span class="nc">Class</span><span class="p">[</span><span class="n">_</span><span class="p">]]</span> <span class="o">=</span> <span class="nc">Set</span><span class="p">[</span><span class="nc">Class</span><span class="p">[</span><span class="n">_</span><span class="p">]]())</span> <span class="p">:</span> <span class="nc">Unit</span> <span class="o">=</span> <span class="p">{</span>
<span class="n">println</span><span class="p">(</span><span class="s">" "</span><span class="o">*</span><span class="n">indent</span> <span class="o">+</span> <span class="n">c</span><span class="p">.</span><span class="n">toString</span><span class="p">)</span>
<span class="k">if</span><span class="p">(</span><span class="n">seen</span><span class="p">.</span><span class="n">contains</span><span class="p">(</span><span class="n">c</span><span class="p">))</span> <span class="p">{</span>
<span class="n">println</span><span class="p">(</span><span class="s">" "</span><span class="o">*</span><span class="n">indent</span> <span class="o">+</span> <span class="s">"..."</span><span class="p">)</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
<span class="kd">val</span> <span class="n">s</span> <span class="o">=</span> <span class="n">c</span><span class="p">.</span><span class="n">getSuperclass</span>
<span class="k">if</span><span class="p">(</span> <span class="n">s</span> <span class="o">!=</span> <span class="kc">null</span><span class="p">)</span>
<span class="n">inhGraph2</span><span class="p">(</span><span class="n">s</span><span class="p">,</span><span class="n">indent</span><span class="o">+</span><span class="mi">2</span><span class="p">,</span><span class="n">seen</span><span class="o">+</span><span class="n">c</span><span class="p">)</span> <span class="c1">// <----</span>
<span class="k">for</span> <span class="p">(</span><span class="n">cc</span> <span class="o"><-</span> <span class="n">c</span><span class="p">.</span><span class="n">getInterfaces</span><span class="p">)</span>
<span class="n">inhGraph2</span><span class="p">(</span><span class="n">cc</span><span class="p">,</span><span class="n">indent</span><span class="o">+</span><span class="mi">2</span><span class="p">,</span><span class="n">seen</span><span class="o">+</span><span class="n">c</span><span class="p">)</span> <span class="c1">// <----</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<p>Here, we use the <code>Set.+</code> operator to produce brand new immutable sets
sporting new members. The not actually working part is that we fail to
keep track correctly of what's already been printed so the output</p>
<div class="highlight"><pre><span></span><code><span class="n">scala</span><span class="o">></span><span class="w"> </span><span class="n">inhGraph2</span><span class="p">(</span><span class="n">classOf</span><span class="o">[</span><span class="n">F</span><span class="o">]</span><span class="p">)</span><span class="w"></span>
<span class="n">interface</span><span class="w"> </span><span class="n">F</span><span class="w"></span>
<span class="w"> </span><span class="n">interface</span><span class="w"> </span><span class="n">E1</span><span class="w"></span>
<span class="w"> </span><span class="n">interface</span><span class="w"> </span><span class="n">D</span><span class="w"></span>
<span class="w"> </span><span class="n">interface</span><span class="w"> </span><span class="n">C1</span><span class="w"></span>
<span class="w"> </span><span class="n">interface</span><span class="w"> </span><span class="n">C2</span><span class="w"></span>
<span class="w"> </span><span class="n">interface</span><span class="w"> </span><span class="n">E2</span><span class="w"></span>
<span class="w"> </span><span class="n">interface</span><span class="w"> </span><span class="n">D</span><span class="w"></span>
<span class="w"> </span><span class="n">interface</span><span class="w"> </span><span class="n">C1</span><span class="w"></span>
<span class="w"> </span><span class="n">interface</span><span class="w"> </span><span class="n">C2</span><span class="w"></span>
<span class="w"> </span><span class="n">interface</span><span class="w"> </span><span class="n">E3</span><span class="w"></span>
<span class="w"> </span><span class="n">interface</span><span class="w"> </span><span class="n">C2</span><span class="w"></span>
</code></pre></div>
<p>repeats the entire trait history of <code>D</code> unnecessarily. Although we duly
recorded our visit to <code>D</code>, the <code>Set</code> we recorded it in is popped off
the stack the moment the recursive call returns, and we lose it.</p>
<p>Whoops.</p>
<h2>Tail recursion to the rescue,</h2>
<p>albeit not for the usual reasons of performance and memory conservation.
The fundamental design issue is that we need two data structures:</p>
<ol>
<li>A <em>stack</em>, with which to track what nodes to visit next and how much to indent them.</li>
<li>A <em>set</em>, with which to track what nodes we already visited.</li>
</ol>
<p>Normally, we get the stack for free by hitching a ride on
the recursive call stack, but that convenience sometimes blinds us
to all the other stuff that's being stacked unnecessarily, or, in our case,
counter-effectively.</p>
<p>We avoid the problem if we can contort our algorithm such that nothing important
is done with <code>seen</code> after the recursive call returns.
That's another way of saying that we need our function to be
<a href="http://en.wikipedia.org/wiki/Tail_call">tail-recursive</a>.</p>
<p>At the same time we still the stack; we just don't want
<code>seen</code> on that stack, so we</p>
<h2>maintain the stack ourselves,</h2>
<p>rather than language to do it.</p>
<p>The inner recursive function will look like</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">visit</span><span class="p">(</span><span class="n">stack</span><span class="p">:</span> <span class="nc">List</span><span class="p">[(</span><span class="nc">Class</span><span class="p">[</span><span class="n">_</span><span class="p">],</span><span class="nc">Int</span><span class="p">)],</span> <span class="n">seen</span><span class="p">:</span> <span class="nc">Set</span><span class="p">[</span><span class="nc">Class</span><span class="p">[</span><span class="n">_</span><span class="p">]])</span> <span class="p">:</span> <span class="nc">Unit</span>
</code></pre></div>
<p>where the <code>List</code> maintains the current path of nodes and indentation, and
the <code>Set</code> grows monotonically with every iteration. One cool thing about a stack
built from a <code>List</code> of tuples, is that we can use pattern matching to examine and
destructure it, as</p>
<div class="highlight"><pre><span></span><code> <span class="n">stack</span> <span class="k">match</span> <span class="p">{</span> <span class="k">case</span> <span class="p">(</span><span class="n">c</span><span class="p">,</span><span class="n">depth</span><span class="p">)</span><span class="n">::tail</span> <span class="o">=></span> <span class="p">...</span>
</code></pre></div>
<p>While we're being clever, we'll also make the code a little more general, by separating
the traversal functionality from the specifics of introspection, paramterizing on
a type <code>T</code> rather than specifying <code>Class[_]</code> and allowing arbitrary functions
to define what to print and how to obtain children of a node. In sum:</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">dfs</span><span class="p">[</span><span class="nc">T</span><span class="p">](</span><span class="n">o</span><span class="p">:</span><span class="nc">T</span><span class="p">)(</span><span class="n">f_pre</span><span class="p">:(</span><span class="nc">T</span><span class="p">,</span><span class="nc">Int</span><span class="p">)</span><span class="o">=></span><span class="nc">Unit</span><span class="p">)</span> <span class="c1">// What to do with each node</span>
<span class="p">(</span><span class="n">f_dup</span><span class="p">:(</span><span class="nc">T</span><span class="p">,</span><span class="nc">Int</span><span class="p">)</span><span class="o">=></span><span class="nc">Unit</span><span class="p">)</span> <span class="c1">// What to do if the node is a duplicate</span>
<span class="p">(</span><span class="n">get_kids</span><span class="p">:</span><span class="nc">T</span><span class="o">=></span><span class="nc">List</span><span class="p">[</span><span class="nc">T</span><span class="p">])</span> <span class="c1">// Find children of the node</span>
<span class="p">:</span> <span class="nc">Unit</span> <span class="o">=</span> <span class="p">{</span>
<span class="k">def</span> <span class="nf">visit</span><span class="p">(</span><span class="n">stack</span><span class="p">:</span> <span class="nc">List</span><span class="p">[(</span><span class="nc">T</span><span class="p">,</span><span class="nc">Int</span><span class="p">)],</span> <span class="n">seen</span><span class="p">:</span> <span class="nc">Set</span><span class="p">[</span><span class="nc">T</span><span class="p">])</span> <span class="p">:</span> <span class="nc">Unit</span> <span class="o">=</span> <span class="p">{</span>
<span class="n">stack</span> <span class="k">match</span> <span class="p">{</span>
<span class="k">case</span> <span class="p">(</span><span class="n">o</span><span class="p">,</span><span class="n">depth</span><span class="p">)</span><span class="n">::tail</span> <span class="o">=></span> <span class="p">{</span>
<span class="n">f_pre</span><span class="p">(</span><span class="n">o</span><span class="p">,</span><span class="n">depth</span><span class="p">)</span>
<span class="k">if</span><span class="p">(</span><span class="n">seen</span><span class="p">(</span><span class="n">o</span><span class="p">))</span> <span class="p">{</span>
<span class="n">f_dup</span><span class="p">(</span><span class="n">o</span><span class="p">,</span><span class="n">depth</span><span class="p">)</span>
<span class="n">visit</span><span class="p">(</span><span class="n">tail</span><span class="p">,</span><span class="n">seen</span><span class="p">)</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
<span class="n">visit</span><span class="p">(</span> <span class="n">get_kids</span><span class="p">(</span><span class="n">o</span><span class="p">).</span><span class="n">zip</span><span class="p">(</span><span class="nc">Stream</span><span class="p">.</span><span class="n">continually</span><span class="p">(</span><span class="n">depth</span><span class="o">+</span><span class="mi">1</span><span class="p">))</span> <span class="o">:::</span> <span class="n">tail</span><span class="p">,</span> <span class="n">seen</span> <span class="o">+</span> <span class="n">o</span><span class="p">)</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">case</span> <span class="n">_</span> <span class="o">=></span>
<span class="p">}</span>
<span class="p">}</span>
<span class="n">visit</span><span class="p">(</span><span class="nc">List</span><span class="p">((</span><span class="n">o</span><span class="p">,</span><span class="mi">0</span><span class="p">)),</span> <span class="nc">Set</span><span class="p">[</span><span class="nc">T</span><span class="p">]())</span>
<span class="p">}</span>
<span class="k">def</span> <span class="nf">inhGraph3</span><span class="p">(</span><span class="n">c</span> <span class="p">:</span> <span class="nc">Class</span><span class="p">[</span><span class="n">_</span><span class="p">])</span> <span class="o">=</span> <span class="n">dfs</span><span class="p">[</span><span class="nc">Class</span><span class="p">[</span><span class="n">_</span><span class="p">]](</span><span class="n">c</span><span class="p">){(</span><span class="n">c</span><span class="p">,</span><span class="n">i</span><span class="p">)</span><span class="o">=></span><span class="n">println</span><span class="p">(</span><span class="s">" "</span><span class="o">*</span><span class="n">i</span><span class="o">+</span><span class="n">c</span><span class="p">.</span><span class="n">toString</span><span class="p">)}</span>
<span class="p">{(</span><span class="n">c</span><span class="p">,</span><span class="n">i</span><span class="p">)</span><span class="o">=></span><span class="n">println</span><span class="p">(</span><span class="s">" "</span><span class="o">*</span><span class="n">i</span><span class="o">+</span><span class="s">"..."</span><span class="p">)}</span>
<span class="p">{</span><span class="n">c</span> <span class="p">:</span> <span class="nc">Class</span><span class="p">[</span><span class="n">_</span><span class="p">]</span> <span class="o">=></span> <span class="p">{</span><span class="n">c</span><span class="p">.</span><span class="n">getInterfaces</span><span class="p">.</span><span class="n">toList</span><span class="p">}</span> <span class="o">:::</span> <span class="nc">Option</span><span class="p">(</span><span class="n">c</span><span class="p">.</span><span class="n">getSuperclass</span><span class="p">).</span><span class="n">toList</span><span class="p">}</span>
</code></pre></div>
<p>This works and yields the same output as <code>inhGraph1</code>. The only</p>
<h2>awkward bit</h2>
<p>is</p>
<div class="highlight"><pre><span></span><code> <span class="n">get_kids</span><span class="p">(</span><span class="n">o</span><span class="p">).</span><span class="n">zip</span><span class="p">(</span><span class="nc">Stream</span><span class="p">.</span><span class="n">continually</span><span class="p">(</span><span class="n">depth</span><span class="o">+</span><span class="mi">1</span><span class="p">))</span>
</code></pre></div>
<p>which converts a list of child nodes <code>o1,o2,o3</code> into a list of tuples <code>(o1,d), (o2,d), (o3,d)</code>, all having the same
second member. It feels a little too tricky and too reliant on non-fundamental parts of the API, but I'm not sure
there's a better way to do it.</p>
<p>One ratification of our functional design is that can essentially be</p>
<h2>transcribed into clojure:</h2>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">dfs</span> <span class="p">[</span><span class="nv">o</span> <span class="nv">f-pre</span> <span class="nv">f-seen</span> <span class="nv">f-kids</span><span class="p">]</span>
<span class="p">(</span><span class="k">loop </span><span class="p">[</span> <span class="p">[[</span><span class="nv">o</span> <span class="nv">depth</span><span class="p">]</span> <span class="o">&</span> <span class="nv">stack</span><span class="p">]</span> <span class="p">(</span><span class="nb">list </span><span class="p">[</span><span class="nv">o</span> <span class="mi">0</span><span class="p">])</span>
<span class="nv">seen</span> <span class="o">#</span><span class="p">{}</span> <span class="p">]</span>
<span class="p">(</span><span class="nb">when </span><span class="nv">o</span>
<span class="p">(</span><span class="nf">f-pre</span> <span class="nv">depth</span> <span class="nv">o</span><span class="p">)</span>
<span class="p">(</span><span class="k">if </span><span class="p">(</span><span class="nf">seen</span> <span class="nv">o</span><span class="p">)</span>
<span class="p">(</span><span class="k">do </span>
<span class="p">(</span><span class="nf">f-seen</span> <span class="nv">depth</span> <span class="nv">o</span><span class="p">)</span>
<span class="p">(</span><span class="nf">recur</span> <span class="nv">stack</span> <span class="nv">seen</span><span class="p">))</span>
<span class="p">(</span><span class="nf">recur</span> <span class="p">(</span><span class="nb">concat </span><span class="p">(</span><span class="nb">map vector </span><span class="p">(</span><span class="nf">f-kids</span> <span class="nv">o</span><span class="p">)</span> <span class="p">(</span><span class="nb">repeat </span><span class="p">(</span><span class="nb">inc </span><span class="nv">depth</span><span class="p">))))</span>
<span class="p">(</span><span class="nb">conj </span><span class="nv">seen</span> <span class="nv">o</span><span class="p">))))))</span>
<span class="p">(</span><span class="kd">defn </span><span class="nv">inh-graph</span> <span class="p">[</span><span class="nv">o</span><span class="p">]</span> <span class="p">(</span><span class="nf">dfs</span> <span class="p">(</span><span class="nf">.getClass</span> <span class="nv">o</span><span class="p">)</span>
<span class="o">#</span><span class="p">(</span><span class="nb">println </span><span class="p">(</span><span class="nb">apply str </span><span class="p">(</span><span class="nb">repeat </span><span class="nv">%1</span> <span class="s">" "</span><span class="p">))</span> <span class="nv">%2</span><span class="p">)</span>
<span class="o">#</span><span class="p">(</span><span class="nb">println </span><span class="p">(</span><span class="nb">apply str </span><span class="p">(</span><span class="nb">repeat </span><span class="nv">%1</span> <span class="s">" "</span><span class="p">))</span> <span class="s">"..."</span> <span class="p">)</span>
<span class="o">#</span><span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">ifcs</span> <span class="p">(</span><span class="nb">seq </span><span class="p">(</span><span class="nf">.getInterfaces</span> <span class="nv">%</span><span class="p">))</span>
<span class="nv">sc</span> <span class="p">(</span><span class="nf">.getSuperclass</span> <span class="nv">%</span><span class="p">)]</span>
<span class="p">(</span><span class="k">if </span><span class="nv">sc</span> <span class="p">(</span><span class="nb">conj </span><span class="nv">ifcs</span> <span class="nv">sc</span><span class="p">)</span> <span class="nv">ifcs</span><span class="p">))))</span>
</code></pre></div>
<p>Neato.</p>Game, Set, Match2013-11-22T10:00:00-05:002013-11-22T10:00:00-05:00Peter Fraenkeltag:blog.podsnap.com,2013-11-22:/game-set-match.html<p>Around a year ago, there was a <a href="https://groups.google.com/forum/#!topic/scala-user/hbxt2TnRii0">lively debate</a> about the
type invariance of the immutable <code>Set</code> in Scala. Dogpile argumentation on a subject
far outside the popular interest is of course thrilling in itself, but the topic also provides a nice focal point for
exploring and clarifying some important aspects of the Scala type system.</p>
<p>We recall that Scala collections (and other higher kinded classes) can be invariant, covariant or contravariant in their type parameters,
corresponding repectively to declarations as <code>class Whatever[A]</code>, <code>class Whatever[+A]</code> or <code>class Whatever[-A]</code>.</p>
<ul>
<li>In the case of <em>covariance</em>, a <code>Whatever[B]</code> will …</li></ul><p>Around a year ago, there was a <a href="https://groups.google.com/forum/#!topic/scala-user/hbxt2TnRii0">lively debate</a> about the
type invariance of the immutable <code>Set</code> in Scala. Dogpile argumentation on a subject
far outside the popular interest is of course thrilling in itself, but the topic also provides a nice focal point for
exploring and clarifying some important aspects of the Scala type system.</p>
<p>We recall that Scala collections (and other higher kinded classes) can be invariant, covariant or contravariant in their type parameters,
corresponding repectively to declarations as <code>class Whatever[A]</code>, <code>class Whatever[+A]</code> or <code>class Whatever[-A]</code>.</p>
<ul>
<li>In the case of <em>covariance</em>, a <code>Whatever[B]</code> will be a subclass of <code>Whatever[A]</code> if <code>B</code> is a subclass of <code>A</code>.</li>
<li>With <em>contravariance</em>, a <code>Whatever[B]</code> will be a <em>superclass</em> of <code>Whatever[A]</code> if <code>B</code> is a subclass of <code>A</code>.</li>
<li>With <em>invariance</em>, <code>Whatever[A]</code> and <code>Whatever[B]</code> are totally separate classes having nothing to do with each other.</li>
</ul>
<p>To be even more precise about what this means,
consider the <a href="http://en.wikipedia.org/wiki/Liskov_substitution_principle">Liskov Substitution Principle</a>,
which has an intimidating ring to it, but comes down to something simple: <code>B</code> is a subclass of <code>A</code> if objects of type <code>B</code>
can be substituted everywhere an <code>A</code> is expected.
It's unlikely that you'll need to recite this to yourself when dealing with simple, unparameterized types, but it's
but it starts to get interesting for, say, functions. The classic example is a callback, e.g.</p>
<div class="highlight"><pre><span></span><code> <span class="k">class</span> <span class="nc">Event</span> <span class="p">...</span>
<span class="k">class</span> <span class="nc">ClickEvent</span> <span class="k">extends</span> <span class="nc">Event</span> <span class="p">...</span>
<span class="k">def</span> <span class="nf">registerCallback</span><span class="p">(</span><span class="n">cb</span> <span class="p">:</span> <span class="nc">ClickEvent</span> <span class="o">=></span> <span class="nc">Status</span><span class="p">)</span> <span class="p">:</span> <span class="nc">Unit</span> <span class="o">=</span> <span class="p">...</span>
<span class="n">registerClickCallback</span><span class="p">(</span><span class="n">cb</span> <span class="p">:</span> <span class="nc">ClickEvent</span> <span class="o">=></span> <span class="k">new</span> <span class="nc">Status</span><span class="p">(</span><span class="s">s"We just had a </span><span class="si">${</span><span class="n">cb</span><span class="si">}</span><span class="s">"</span><span class="p">))</span>
</code></pre></div>
<p>When somebody clicks, a <code>ClickEvent</code> object will be created and passed to the <code>cb</code>, which
will examine it and somehow convert it into a <code>Status</code> object.</p>
<p>But really, we should be allowed to register a callback that can deal with any <code>Event</code>, not just <code>ClickEvent</code>s, i.e.
this should be OK:</p>
<div class="highlight"><pre><span></span><code> <span class="n">registerClickCallback</span><span class="p">(</span><span class="n">cb</span> <span class="p">:</span> <span class="nc">Event</span> <span class="o">=></span> <span class="k">new</span> <span class="nc">Status</span><span class="p">(</span><span class="s">s"We just had a </span><span class="si">${</span><span class="n">cb</span><span class="si">}</span><span class="s">"</span><span class="p">))</span>
</code></pre></div>
<p>Indeed it is OK, i.e.</p>
<ol>
<li>we can substitute an <code>Event => Status</code> when a <code>ClickEvent => Status</code> is expected, which means that</li>
<li><code>Event => Status</code> is a <em>subclass</em> of <code>ClickEvent => Status</code>, even though</li>
<li><code>Event</code> is a <em>superclass</em> of <code>ClickEvent</code>.</li>
<li>Thus</li>
</ol>
<h2>function types are <em>contravariant</em> in their argument type.</h2>
<p>Indeed, the Scala library defines</p>
<div class="highlight"><pre><span></span><code> <span class="k">trait</span> <span class="nc">Function1</span><span class="p">[</span><span class="o">-</span><span class="nc">T1</span><span class="p">,</span> <span class="o">+</span><span class="nc">R</span><span class="p">]</span> <span class="p">{</span>
<span class="k">abstract</span> <span class="k">def</span> <span class="nf">apply</span><span class="p">(</span><span class="n">v</span> <span class="p">:</span> <span class="nc">T1</span><span class="p">)</span> <span class="p">:</span> <span class="nc">R</span>
<span class="p">}</span>
</code></pre></div>
<p>Of course this declaration says not just that functions are contravariant in their argument type, but also that they're
<em>covariant</em> in their return type. That means we could do:</p>
<div class="highlight"><pre><span></span><code> <span class="k">class</span> <span class="nc">SimpleStatus</span> <span class="k">extends</span> <span class="nc">Status</span> <span class="p">...</span>
<span class="n">registerClickCallback</span><span class="p">(</span><span class="n">cb</span> <span class="p">:</span> <span class="nc">Event</span> <span class="o">=></span> <span class="k">new</span> <span class="nc">SimpleStatus</span><span class="p">(</span><span class="s">s"We just had a </span><span class="si">${</span><span class="n">cb</span><span class="si">}</span><span class="s">"</span><span class="p">))</span>
</code></pre></div>
<p>which is right and just, as the callback registerer can handle <em>any</em> kind of <code>Status</code>, and we should be
able to go ahead and return a specific sort. In short, <code>Function[ClickEvent,Status]</code> is a subclass of
<code>Function1[Event,SimpleStatus]</code>.</p>
<p>It should now not be surprising that</p>
<h2>immutable List is declared covariantly</h2>
<p>as <code>List[+T]</code>. If there were a function</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">printOutEvents</span><span class="p">(</span><span class="n">es</span> <span class="p">:</span> <span class="nc">List</span><span class="p">[</span><span class="nc">Event</span><span class="p">])</span>
</code></pre></div>
<p>we should be able to substitute a <code>List[ClickEvent]</code>. And we can. Note that the <em>mutable</em> list is declared
<em>invariantly</em> as <code>mutable.List[T]</code>, which reflects the fact that all hell would break loose if</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">appendNewEvents</span><span class="p">(</span><span class="n">es</span> <span class="p">:</span> <span class="n">mutable</span><span class="p">.</span><span class="nc">List</span><span class="p">[</span><span class="nc">Event</span><span class="p">])</span>
</code></pre></div>
<p>could be passed <code>mutable.list[ClickEvent]</code> and then shove <code>KeyboardEvent</code>s into it.</p>
<p>As usual, life in Java-land is less beautiful, but life in general is too short to talk about such things.</p>
<p>Instead, let's go back to the original question: Why are <code>Set</code>s declared invariantly as <code>Set[T]</code> when
immutable collections like <code>List</code> and <code>Vector</code> are covariant? Why can't I define</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">printOutEvents</span><span class="p">(</span><span class="n">es</span> <span class="p">:</span> <span class="nc">Set</span><span class="p">[</span><span class="nc">Event</span><span class="p">])</span> <span class="p">{</span> <span class="k">for</span> <span class="p">(</span><span class="n">e</span> <span class="o"><-</span> <span class="n">es</span><span class="p">)</span> <span class="n">println</span><span class="p">(</span><span class="n">e</span><span class="p">)</span> <span class="p">}</span>
</code></pre></div>
<p>and pass it a <code>Set[ClickEvent]</code>?</p>
<p>To answer that question (without reading the mailthread),</p>
<h2>it will be nice to have a little tooling.</h2>
<p>Given a class, I want to be able to trace its entire inheritance, trait and interface implementation graph
all the way up to <code>Object</code>, in a nice outline form, like this:</p>
<div class="highlight"><pre><span></span><code><span class="n">scala</span><span class="o">></span><span class="w"> </span><span class="n">inhGraph</span><span class="p">(</span><span class="n">classOf</span><span class="o">[</span><span class="n">Integer</span><span class="o">]</span><span class="p">)</span><span class="w"></span>
<span class="k">class</span><span class="w"> </span><span class="n">java</span><span class="p">.</span><span class="n">lang</span><span class="p">.</span><span class="k">Integer</span><span class="w"></span>
<span class="w"> </span><span class="n">interface</span><span class="w"> </span><span class="n">java</span><span class="p">.</span><span class="n">lang</span><span class="p">.</span><span class="n">Comparable</span><span class="w"></span>
<span class="w"> </span><span class="k">class</span><span class="w"> </span><span class="n">java</span><span class="p">.</span><span class="n">lang</span><span class="p">.</span><span class="n">Number</span><span class="w"></span>
<span class="w"> </span><span class="n">interface</span><span class="w"> </span><span class="n">java</span><span class="p">.</span><span class="n">io</span><span class="p">.</span><span class="n">Serializable</span><span class="w"></span>
<span class="w"> </span><span class="k">class</span><span class="w"> </span><span class="n">java</span><span class="p">.</span><span class="n">lang</span><span class="p">.</span><span class="k">Object</span><span class="w"></span>
</code></pre></div>
<p>So we see not just that <code>Integer</code> is <code>Serializable</code> and <code>Comparable</code>, but that it got to the former
via <code>Number</code>.
This simple outline form can become much more complicated, not only as the hierarchy gets deeper, but as it
recombines due to diamond inheritance. In the case of recombination, we'll print out ellipses "..." the second time
we encounter a class, just to keep things tidy. </p>
<p>Circularly enough, keeping track of what we've seen will require a <code>Set</code>, and if your hair-trigger functional conscience
requires that <code>Set</code> to be immutable, there are implementation implications. Should you care, you can
<a href="http://blog.podsnap.com/functional-dfs.html">read about them</a>.</p>
<p>To cut to the chase,</p>
<h2>here's the scoop on Set:</h2>
<div class="highlight"><pre><span></span><code><span class="n">scala</span><span class="o">></span><span class="w"> </span><span class="n">inhGraph</span><span class="p">(</span><span class="n">classOf</span><span class="o">[</span><span class="n">Set[_</span><span class="o">]</span><span class="err">]</span><span class="p">)</span><span class="w"></span>
<span class="n">interface</span><span class="w"> </span><span class="n">scala</span><span class="p">.</span><span class="n">collection</span><span class="p">.</span><span class="n">immutable</span><span class="p">.</span><span class="k">Set</span><span class="w"></span>
<span class="w"> </span><span class="n">interface</span><span class="w"> </span><span class="n">scala</span><span class="p">.</span><span class="n">collection</span><span class="p">.</span><span class="n">immutable</span><span class="p">.</span><span class="n">Iterable</span><span class="w"></span>
<span class="w"> </span><span class="n">interface</span><span class="w"> </span><span class="n">scala</span><span class="p">.</span><span class="n">collection</span><span class="p">.</span><span class="n">immutable</span><span class="p">.</span><span class="n">Traversable</span><span class="w"></span>
<span class="w"> </span><span class="n">interface</span><span class="w"> </span><span class="n">scala</span><span class="p">.</span><span class="n">collection</span><span class="p">.</span><span class="n">Traversable</span><span class="w"></span>
<span class="w"> </span><span class="n">interface</span><span class="w"> </span><span class="n">scala</span><span class="p">.</span><span class="n">collection</span><span class="p">.</span><span class="n">TraversableLike</span><span class="w"></span>
<span class="w"> </span><span class="n">interface</span><span class="w"> </span><span class="n">scala</span><span class="p">.</span><span class="n">collection</span><span class="p">.</span><span class="n">generic</span><span class="p">.</span><span class="n">HasNewBuilder</span><span class="w"></span>
<span class="w"> </span><span class="n">interface</span><span class="w"> </span><span class="n">scala</span><span class="p">.</span><span class="n">collection</span><span class="p">.</span><span class="n">generic</span><span class="p">.</span><span class="n">FilterMonadic</span><span class="w"></span>
<span class="w"> </span><span class="n">interface</span><span class="w"> </span><span class="n">scala</span><span class="p">.</span><span class="n">collection</span><span class="p">.</span><span class="n">TraversableOnce</span><span class="w"></span>
<span class="w"> </span><span class="n">interface</span><span class="w"> </span><span class="n">scala</span><span class="p">.</span><span class="n">collection</span><span class="p">.</span><span class="n">GenTraversableOnce</span><span class="w"></span>
<span class="w"> </span><span class="n">interface</span><span class="w"> </span><span class="n">scala</span><span class="p">.</span><span class="n">collection</span><span class="p">.</span><span class="n">GenTraversableLike</span><span class="w"></span>
<span class="w"> </span><span class="n">interface</span><span class="w"> </span><span class="n">scala</span><span class="p">.</span><span class="n">collection</span><span class="p">.</span><span class="n">GenTraversableOnce</span><span class="w"></span>
<span class="w"> </span><span class="p">...</span><span class="w"></span>
<span class="w"> </span><span class="n">interface</span><span class="w"> </span><span class="n">scala</span><span class="p">.</span><span class="n">collection</span><span class="p">.</span><span class="n">Parallelizable</span><span class="w"></span>
<span class="w"> </span><span class="n">interface</span><span class="w"> </span><span class="n">scala</span><span class="p">.</span><span class="n">collection</span><span class="p">.</span><span class="n">GenTraversable</span><span class="w"></span>
<span class="w"> </span><span class="n">interface</span><span class="w"> </span><span class="n">scala</span><span class="p">.</span><span class="n">collection</span><span class="p">.</span><span class="n">GenTraversableLike</span><span class="w"></span>
<span class="w"> </span><span class="p">...</span><span class="w"></span>
<span class="w"> </span><span class="n">interface</span><span class="w"> </span><span class="n">scala</span><span class="p">.</span><span class="n">collection</span><span class="p">.</span><span class="n">generic</span><span class="p">.</span><span class="n">GenericTraversableTemplate</span><span class="w"></span>
<span class="w"> </span><span class="n">interface</span><span class="w"> </span><span class="n">scala</span><span class="p">.</span><span class="n">collection</span><span class="p">.</span><span class="n">generic</span><span class="p">.</span><span class="n">HasNewBuilder</span><span class="w"></span>
<span class="w"> </span><span class="p">...</span><span class="w"></span>
<span class="w"> </span><span class="n">interface</span><span class="w"> </span><span class="n">scala</span><span class="p">.</span><span class="n">Immutable</span><span class="w"></span>
<span class="w"> </span><span class="n">interface</span><span class="w"> </span><span class="n">scala</span><span class="p">.</span><span class="n">collection</span><span class="p">.</span><span class="n">Iterable</span><span class="w"></span>
<span class="w"> </span><span class="n">interface</span><span class="w"> </span><span class="n">scala</span><span class="p">.</span><span class="n">collection</span><span class="p">.</span><span class="n">Traversable</span><span class="w"></span>
<span class="w"> </span><span class="p">...</span><span class="w"></span>
<span class="w"> </span><span class="n">interface</span><span class="w"> </span><span class="n">scala</span><span class="p">.</span><span class="n">collection</span><span class="p">.</span><span class="n">GenIterable</span><span class="w"></span>
<span class="w"> </span><span class="n">interface</span><span class="w"> </span><span class="n">scala</span><span class="p">.</span><span class="n">collection</span><span class="p">.</span><span class="n">GenIterableLike</span><span class="w"></span>
<span class="w"> </span><span class="n">interface</span><span class="w"> </span><span class="n">scala</span><span class="p">.</span><span class="n">collection</span><span class="p">.</span><span class="n">GenTraversableLike</span><span class="w"></span>
<span class="w"> </span><span class="p">...</span><span class="w"></span>
<span class="w"> </span><span class="n">interface</span><span class="w"> </span><span class="n">scala</span><span class="p">.</span><span class="n">collection</span><span class="p">.</span><span class="n">GenTraversable</span><span class="w"></span>
<span class="w"> </span><span class="p">...</span><span class="w"></span>
<span class="w"> </span><span class="n">interface</span><span class="w"> </span><span class="n">scala</span><span class="p">.</span><span class="n">collection</span><span class="p">.</span><span class="n">IterableLike</span><span class="w"></span>
<span class="w"> </span><span class="n">interface</span><span class="w"> </span><span class="n">scala</span><span class="p">.</span><span class="k">Equals</span><span class="w"></span>
<span class="w"> </span><span class="n">interface</span><span class="w"> </span><span class="n">scala</span><span class="p">.</span><span class="n">collection</span><span class="p">.</span><span class="n">TraversableLike</span><span class="w"></span>
<span class="w"> </span><span class="p">...</span><span class="w"></span>
<span class="w"> </span><span class="n">interface</span><span class="w"> </span><span class="n">scala</span><span class="p">.</span><span class="n">collection</span><span class="p">.</span><span class="n">GenIterableLike</span><span class="w"></span>
<span class="w"> </span><span class="p">...</span><span class="w"></span>
<span class="w"> </span><span class="n">interface</span><span class="w"> </span><span class="n">scala</span><span class="p">.</span><span class="n">collection</span><span class="p">.</span><span class="k">Set</span><span class="w"></span>
<span class="w"> </span><span class="n">interface</span><span class="w"> </span><span class="n">scala</span><span class="p">.</span><span class="n">collection</span><span class="p">.</span><span class="n">Iterable</span><span class="w"></span>
<span class="w"> </span><span class="p">...</span><span class="w"></span>
<span class="w"> </span><span class="n">interface</span><span class="w"> </span><span class="n">scala</span><span class="p">.</span><span class="n">collection</span><span class="p">.</span><span class="n">GenSet</span><span class="w"></span>
<span class="w"> </span><span class="n">interface</span><span class="w"> </span><span class="n">scala</span><span class="p">.</span><span class="n">collection</span><span class="p">.</span><span class="n">GenSetLike</span><span class="w"></span>
<span class="w"> </span><span class="n">interface</span><span class="w"> </span><span class="n">scala</span><span class="p">.</span><span class="n">collection</span><span class="p">.</span><span class="n">GenIterableLike</span><span class="w"></span>
<span class="w"> </span><span class="p">...</span><span class="w"></span>
<span class="w"> </span><span class="n">interface</span><span class="w"> </span><span class="n">scala</span><span class="p">.</span><span class="n">Function1</span><span class="w"></span>
<span class="w"> </span><span class="n">interface</span><span class="w"> </span><span class="n">scala</span><span class="p">.</span><span class="k">Equals</span><span class="w"></span>
<span class="w"> </span><span class="p">...</span><span class="w"></span>
<span class="w"> </span><span class="n">interface</span><span class="w"> </span><span class="n">scala</span><span class="p">.</span><span class="n">collection</span><span class="p">.</span><span class="n">GenIterable</span><span class="w"></span>
<span class="w"> </span><span class="p">...</span><span class="w"></span>
<span class="w"> </span><span class="n">interface</span><span class="w"> </span><span class="n">scala</span><span class="p">.</span><span class="n">collection</span><span class="p">.</span><span class="n">generic</span><span class="p">.</span><span class="n">GenericSetTemplate</span><span class="w"></span>
<span class="w"> </span><span class="n">interface</span><span class="w"> </span><span class="n">scala</span><span class="p">.</span><span class="n">collection</span><span class="p">.</span><span class="n">generic</span><span class="p">.</span><span class="n">GenericTraversableTemplate</span><span class="w"></span>
<span class="w"> </span><span class="p">...</span><span class="w"></span>
<span class="w"> </span><span class="n">interface</span><span class="w"> </span><span class="n">scala</span><span class="p">.</span><span class="n">collection</span><span class="p">.</span><span class="n">SetLike</span><span class="w"></span>
<span class="w"> </span><span class="n">interface</span><span class="w"> </span><span class="n">scala</span><span class="p">.</span><span class="n">collection</span><span class="p">.</span><span class="n">IterableLike</span><span class="w"></span>
<span class="w"> </span><span class="p">...</span><span class="w"></span>
<span class="w"> </span><span class="n">interface</span><span class="w"> </span><span class="n">scala</span><span class="p">.</span><span class="n">collection</span><span class="p">.</span><span class="n">GenSetLike</span><span class="w"></span>
<span class="w"> </span><span class="p">...</span><span class="w"></span>
<span class="w"> </span><span class="n">interface</span><span class="w"> </span><span class="n">scala</span><span class="p">.</span><span class="n">collection</span><span class="p">.</span><span class="n">generic</span><span class="p">.</span><span class="n">Subtractable</span><span class="w"></span>
</code></pre></div>
<p>If you look carefully, you'll see that <code>Set</code> inherits from <code>Function1</code> across five generations.
Deleting a lot of lines, it looks like</p>
<div class="highlight"><pre><span></span><code>interface scala.collection.immutable.Set
interface scala.collection.Set
interface scala.collection.GenSet
interface scala.collection.GenSetLike
interface scala.Function1
</code></pre></div>
<p>More specifically, a <code>Set[T]</code> may be substituted anywhere a <code>Function1[T,Boolean]</code> is expected.
Since, as we know, <code>Function1</code> is contravariant, there is no way that <code>Set</code> can be covariant. On the
other hand, it directly implements the immutable <code>Iterable</code>, which is covariant, so there's no way that <code>Set</code>
can be contravariant. The only option is invariance. To hammer in the general principal here:
<strong>a class cannot be
substituted for another if its variance is incompatible.</strong></p>
<p>Tada.</p>
<h2>The mystery isn't quite cleared up though.</h2>
<p>First of all, Martin Odersky's <a href="http://www.scala-lang.org/old/node/9764">explanation</a>
is slightly different - essentially blaming invariance on an implementation detail. Still, no matter what the implementation,
you still couldn't provide <code>Function1</code> and still be covariant. I hesitate to say that he's wrong, but he might be wrong.</p>
<p>Second, it isn't obvious why <code>Set</code> has to implement <code>Function</code> at all.
Presumably, it's so we can use it as a sort of validator,
for example as an argument to <code>List.filter</code></p>
<div class="highlight"><pre><span></span><code> aListOfThings.filter(setOfThings)
</code></pre></div>
<p>but I can't imagine a national day of mourning if one had to write</p>
<div class="highlight"><pre><span></span><code> aListOfThings.filter(setOfThings.contains(_))
</code></pre></div>
<p>instead. There is nothing intrinsic about sets that requires them to pass as functions. At best, the feature is syntactic
sugar - and sugar of diminished utility in a language that lets you define functions as compactly as Scala does.</p>
<p>All that said,</p>
<h2>this should not matter to you if you lead a virtuous life,</h2>
<p>where virtue here is all about using type
signatures that enforce what you care about and not what you don't care about. I complained above that you couldn't
pass <code>Set[ClickEvent]</code> to a function <code>printOutEvents(es : Set[Event])</code>, but in truth that function has very little
reason to exist. What I should write instead is</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">printOutEvents</span><span class="p">(</span><span class="n">es</span> <span class="p">:</span> <span class="nc">TraversableOnce</span><span class="p">[</span><span class="nc">Event</span><span class="p">])</span> <span class="p">{</span> <span class="k">for</span> <span class="p">(</span><span class="n">e</span> <span class="o"><-</span> <span class="n">es</span><span class="p">)</span> <span class="n">println</span><span class="p">(</span><span class="n">e</span><span class="p">)</span> <span class="p">}</span>
</code></pre></div>
<p><code>Set</code> inherits from <code>TraversableOnce</code></p>
<div class="highlight"><pre><span></span><code>interface scala.collection.immutable.Set
interface scala.collection.immutable.Iterable
interface scala.collection.immutable.Traversable
interface scala.collection.Traversable
interface scala.collection.TraversableLike
interface scala.collection.TraversableOnce
</code></pre></div>
<p>so we can call this new function with <code>Set[Event]</code> if we like, and <code>TraversableOnce</code> is covariant, so
we can call it with <code>Set[AmusingEvent]</code>.
We express no opinion at all on some other points, like whether we could have
looped over the contents more than once, or whether the container needs a <code>union</code> method.</p>
<p>If you don't have <code>TraversableOnce</code> at the tips of our typing fingers, get thee to
<a href="http://www.scala-lang.org/api/current/#package">the API</a> and spend some time there.
The scaladoc web app is actually a bit fancier than javadocs, and you might want to read
a <a href="http://dcsobral.blogspot.com/2011/12/using-scala-api-documentation.html">good introduction</a>
to its capabilities.</p>Making a pretentious logo using CSS2013-10-23T00:00:00-04:002013-10-23T00:00:00-04:00Peter Fraenkeltag:blog.podsnap.com,2013-10-23:/pretentious-logo.html<p>My pride in not being a "front end guy" is notoriously obnoxious and
obviously compensatory. On the other hand, having crappy front ends
for my projects might help disguise deeper flaws and thus actually be
an advantage.</p>
<p>Came a time, however, when I wanted to make myself a logo. Faced
with the horrific prospect of doing actual art, I had no real choice
but to use markup. Make no
mistake, the linguae francae of our browsable universe -- html, css
and javascript -- are a collective insult to science and beauty. But
you can do cool stuff sometimes.</p>
<p>My <a href="http://acyc.lc">new company</a>, is …</p><p>My pride in not being a "front end guy" is notoriously obnoxious and
obviously compensatory. On the other hand, having crappy front ends
for my projects might help disguise deeper flaws and thus actually be
an advantage.</p>
<p>Came a time, however, when I wanted to make myself a logo. Faced
with the horrific prospect of doing actual art, I had no real choice
but to use markup. Make no
mistake, the linguae francae of our browsable universe -- html, css
and javascript -- are a collective insult to science and beauty. But
you can do cool stuff sometimes.</p>
<p>My <a href="http://acyc.lc">new company</a>, is what is known as a
<a href="http://info.legalzoom.com/disregarded-entity-llc-3551.html">disregarded entity</a>.
It currently exists only so that I can feel some emotional distance from the
<a href="http://www.imdb.com/title/tt0800342/">large financial institution</a>
benefiting from my sage advice (and, as a free-and-easy consultant,
skip out periodically to attend alumni day at
<a href="http://hackerschool.com">Hacker School</a>). But even a shell company
deserves a pretentious logo that alludes to graph theory and
functional programming.</p>
<p>The idea is that I want to spell out "Acyclic," but with the
horizontal bar in the "A" replaced with a bar of a contrasting color.
This resultant "$\Lambda$" alludes stylishly to my appreciation of
functional programming, while the exiled cross bar emphasizes my
abhorrence of topological cycles. Plus, you can read it without
automatically pronouncing an "L." (<a href="http://www.kia.com/">KI$\Lambda$ Motors</a>,
I'm looking at you.)</p>
<p>To realize this aesthetic marvel, we will use the CSS <code>position</code>
property to overlay three bits of text, from front to back:</p>
<ol>
<li>A maroon colored em-dash, surrounded by a thin line of shadow so it
stands out.</li>
<li>In phosphor green, the non-word $\Lambda$cyclic.</li>
<li>The full word "Acyclic," properly spelled so the you can search for
it on the page, but cloaked in the same color as the background.</li>
</ol>
<p>Altogether, it looks like this:</p>
<div>
<style>
#title1 {
background: #151515;
font-size: 50px;
line-height: 1.5;
font-weight: bold;
font-family: Monaco, "Bitstream Vera Sans Mono", "Lucida Console", Terminal, monospace;
color: #b5e853;
text-shadow: 0 1px 1px rgba(0, 0, 0, 0.1),
0 0 5px rgba(181, 232, 83, 0.1),
0 0 10px rgba(181, 232, 83, 0.1);
letter-spacing: -1px;
-webkit-font-smoothing: antialiased;
position: absolute;
z-index: 0;
top: 0px;
left: 0px;
}
#title2 {
font-size: 50px;
background: rgba(0, 0, 0, 0, 0); /* transparent */
line-height: 1.5;
font-weight: bold
font-family: Monaco, "Bitstream Vera Sans Mono", "Lucida Console", Terminal, monospace;
color: #800000;
text-shadow: 0px 1px 0px #000000,
0px -1px 0px #000000;
letter-spacing: -1px;
-webkit-font-smoothing: antialiased;
z-index: 1;
position: absolute;
top: 0px;
left: -2.5px;
}
#title3 {
background: #151515;
font-size: 50px;
line-height: 1.5;
font-weight: bold;
font-family: Monaco, "Bitstream Vera Sans Mono", "Lucida Console", Terminal, monospace;
color: #151515;
letter-spacing: -1px;
position: absolute;
z-index: -1;
top: 0px;
left: 0px;
}
</style>
<div style="position:relative;height=300px;">
<div id="title1">Λcyclic LLC</div>
<div id="title2">—</div>
<div id="title3">Acyclic LLC</div>
</div>
</div>
<ul>
<li>*</li>
<li>*</li>
<li>*</li>
</ul>
<p>The html for the the three layers is just:</p>
<div class="highlight"><pre><span></span><code> <span class="p"><</span><span class="nt">div</span> <span class="na">id</span><span class="o">=</span><span class="s">"title1"</span><span class="p">></span><span class="ni">&Lambda;</span>cyclic LLC<span class="p"></</span><span class="nt">div</span><span class="p">></span>
<span class="p"><</span><span class="nt">div</span> <span class="na">id</span><span class="o">=</span><span class="s">"title2"</span><span class="p">></span><span class="ni">&mdash;</span><span class="p"></</span><span class="nt">div</span><span class="p">></span>
<span class="p"><</span><span class="nt">div</span> <span class="na">id</span><span class="o">=</span><span class="s">"title3"</span><span class="p">></span>Acyclic LLC<span class="p"></</span><span class="nt">div</span><span class="p">></span>
</code></pre></div>
<p>The three <code>#title</code> id's are more or less stolen from
<a href="https://github.com/blog/1081-instantly-beautiful-project-pages">Github Pages "hack" theme</a>,
with very slight tweaks. The only change for the <strong>$\Lambda$cyclic</strong>
layer is positioning dictated by the last four lines:</p>
<div class="highlight"><pre><span></span><code><span class="p">#</span><span class="nn">title1</span> <span class="p">{</span>
<span class="k">background</span><span class="p">:</span> <span class="mh">#151515</span><span class="p">;</span>
<span class="k">font-size</span><span class="p">:</span> <span class="mi">50</span><span class="kt">px</span><span class="p">;</span>
<span class="k">line-height</span><span class="p">:</span> <span class="mf">1.5</span><span class="p">;</span>
<span class="k">font-weight</span><span class="p">:</span> <span class="kc">bold</span><span class="p">;</span>
<span class="k">font-family</span><span class="p">:</span> <span class="n">Monaco</span><span class="p">,</span> <span class="s2">"Bitstream Vera Sans Mono"</span><span class="p">,</span> <span class="s2">"Lucida Console"</span><span class="p">,</span> <span class="n">Terminal</span><span class="p">,</span> <span class="kc">monospace</span><span class="p">;</span>
<span class="k">color</span><span class="p">:</span> <span class="mh">#b5e853</span><span class="p">;</span>
<span class="k">text-shadow</span><span class="p">:</span> <span class="mi">0</span> <span class="mi">1</span><span class="kt">px</span> <span class="mi">1</span><span class="kt">px</span> <span class="nb">rgba</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mf">0.1</span><span class="p">),</span>
<span class="mi">0</span> <span class="mi">0</span> <span class="mi">5</span><span class="kt">px</span> <span class="nb">rgba</span><span class="p">(</span><span class="mi">181</span><span class="p">,</span> <span class="mi">232</span><span class="p">,</span> <span class="mi">83</span><span class="p">,</span> <span class="mf">0.1</span><span class="p">),</span>
<span class="mi">0</span> <span class="mi">0</span> <span class="mi">10</span><span class="kt">px</span> <span class="nb">rgba</span><span class="p">(</span><span class="mi">181</span><span class="p">,</span> <span class="mi">232</span><span class="p">,</span> <span class="mi">83</span><span class="p">,</span> <span class="mf">0.1</span><span class="p">);</span>
<span class="k">letter-spacing</span><span class="p">:</span> <span class="mi">-1</span><span class="kt">px</span><span class="p">;</span>
<span class="kp">-webkit-</span><span class="n">font-smoothing</span><span class="p">:</span> <span class="n">antialiased</span><span class="p">;</span>
<span class="c">/* Position absolutely at the upper left */</span>
<span class="k">position</span><span class="p">:</span> <span class="kc">absolute</span><span class="p">;</span>
<span class="k">z-index</span><span class="p">:</span> <span class="mi">0</span><span class="p">;</span>
<span class="k">top</span><span class="p">:</span> <span class="mi">0</span><span class="kt">px</span><span class="p">;</span>
<span class="k">left</span><span class="p">:</span> <span class="mi">0</span><span class="kt">px</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div>
<p>In <code>#title2</code>, for the <strong>em-dash</strong>, there are a few additional changes:</p>
<div class="highlight"><pre><span></span><code> <span class="nt">background</span><span class="o">:</span> <span class="nt">rgba</span><span class="o">(</span><span class="nt">0</span><span class="o">,</span> <span class="nt">0</span><span class="o">,</span> <span class="nt">0</span><span class="o">,</span> <span class="nt">0</span><span class="o">,</span> <span class="nt">0</span><span class="o">);</span> <span class="c">/* transparent background */</span>
<span class="nt">color</span><span class="o">:</span> <span class="p">#</span><span class="nn">800000</span><span class="o">;</span> <span class="c">/* maroon */</span>
<span class="nt">text-shadow</span><span class="o">:</span> <span class="nt">0px</span> <span class="nt">1px</span> <span class="nt">0px</span> <span class="p">#</span><span class="nn">000000</span><span class="o">,</span> <span class="c">/* black shadow above and below </span>
<span class="c"> 0px -1px 0px #000000;</span>
<span class="c"> z-index: 1; /* one layer closer to the viewer */</span>
<span class="nt">left</span><span class="o">:</span> <span class="nt">-2</span><span class="p">.</span><span class="nc">5px</span><span class="o">;</span> <span class="c">/* nudge it slightly to the left */</span>
</code></pre></div>
<p>Finally <code>#title3</code> governs the invisible but searchable and
highlightable <strong>Acyclic</strong>:</p>
<div class="highlight"><pre><span></span><code> <span class="c">/* As #title1, but: */</span>
<span class="nt">color</span><span class="o">:</span> <span class="p">#</span><span class="nn">151515</span><span class="o">;</span> <span class="c">/* same as background */</span>
<span class="nt">z-index</span><span class="o">:</span> <span class="nt">-1</span><span class="o">;</span> <span class="c">/* furthest away, so it doesn't block anything */</span>
</code></pre></div>
<p>One last trick is to extract a <code>.png</code> of the logo from a screen shot and then
refer to it in the <code><head></code> of pages you need to show up properly in previews:</p>
<div class="highlight"><pre><span></span><code> <span class="p"><</span><span class="nt">meta</span> <span class="na">property</span><span class="o">=</span><span class="s">"og:image"</span> <span class="na">content</span><span class="o">=</span><span class="s">"http://acyc.lc/images/bigA.png"</span> <span class="p">/></span>
<span class="p"><</span><span class="nt">link</span> <span class="na">rel</span><span class="o">=</span><span class="s">"image_src"</span> <span class="na">href</span><span class="o">=</span><span class="s">"http://acyc.lc/images/bigA.png"</span> <span class="p">/></span>
</code></pre></div>
<p>For extra credit, please tell me just what is the <a href="https://www.google.com/search?tbs=sbi:AMhZZiuxxuwyF-7omPfvU-JrH_1xZCvRgNH5m-m0rEW7tJ5oxuLf-Z1Tvt0zMQXUenUpNeqUoo2h8dmUbwoWkLoQovVMrw4hLutReu3avL_1WC_1mVYPy9CJXRKPuuFvRig8CY9ec06XW2shSa2n5-irkwbP5wa5VSl8qEaMf3xB750ryKyG0mK9TgJphYCbUyamtVO-MGz1jxmtcVcxnafEvwsdV2QdsdUmaTkGg1v8Fj3690wa7VSRzxpgcje7CFa0j53Jr346Wau4Z8F2xxXPLF-ADwcRinW9jZ8n6PGgK73VuJyoShXwKEUeM0iKPhm3evDNlbHIlT80ZYxKC5hbGB_18mPkKD_1xARlVlXCw_1KUD3vMezFPV6sG_1dOBKwKbqy-Gzpi_1621gmkp8pUoB15LkqLLLdqblAr2qWPYMIyNfdHVC41_16fOXEC4e3SG7Q4kmWks5lG7_10euabrNuR7vkLj1k7jNrJ1k4_1e9CyLzklipn9LGZs2eSp77cAJcUna_1IjEgbyqUF90rcHpcVaVt7rMkCvTpwL64AG40qCm4sBC4TyAsbYGNK0FGN6AGzuqEW1UoezJ7NjoVRdB_1SH1N3dgsRR7vzJPbQWAz1MFOFDKEB5Dqr4xSCJECOCvPYdE-UBYf3-nup2bpbdTqpE4sbKeWt4nzVj8O-63G8nbBEPm3mG5Wobr28dSDWAWM1aqBxESfCyDyU-W8a2M-DI-yQJbz9ivx10npB0Bl8fTFw3WWJBor6Jeu2g">religious significance</a> of a bisected A.</p>Porting a land line to Google Voice and Obihai2013-10-14T12:00:00-04:002013-10-14T12:00:00-04:00Peter Fraenkeltag:blog.podsnap.com,2013-10-14:/obihai.html<p>The assumptions here are that you have a US land line, it has a phone
number you've grown attached to, you or someone else in your household
prefers to talk on something traditionally phone-like, you want to
pay as little as possible for the privilege,
that you have a high
tolerance for - or ideally derive pleasure from - configuring gadgets,
and that pains in the ass don't count as payment for the purposes of
the fourth assumption.
At this point, we're probably down to me, as
there is low or negative correlation for simultaneous applicability of
the forgoing assumptions.</p>
<p>There internet …</p><p>The assumptions here are that you have a US land line, it has a phone
number you've grown attached to, you or someone else in your household
prefers to talk on something traditionally phone-like, you want to
pay as little as possible for the privilege,
that you have a high
tolerance for - or ideally derive pleasure from - configuring gadgets,
and that pains in the ass don't count as payment for the purposes of
the fourth assumption.
At this point, we're probably down to me, as
there is low or negative correlation for simultaneous applicability of
the forgoing assumptions.</p>
<p>There internet contains instructions for overlapping pieces of the
procedure outlined here, but there are a few gotchas that I
discovered, so, if you're like me enough for all this to be relevant
but not sufficiently like me to actually enjoy the gotchas, read on.</p>
<p>The basic components of the plan are:</p>
<ul>
<li>Google voice - for the cheap or free service,</li>
<li>An <a href="http://obihai.com">OBi100</a> SIP gateway to connect Google to your
telephone,</li>
<li>An ATT Go Phone on which to park your existing phone number for a
short period of time.</li>
</ul>
<p>The last of these is necessary, because Google voice ports numbers
only from existing mobile accounts, not from landline carriers.</p>
<p>First, you need to purchase the physical objects. You need the OBi
device itself. The basic model, the OBi 100, as of today costs $39 from
<a href="http://www.amazon.com/OBi100-Telephone-Adapter-Service-Bridge/dp/B004LO098O">Amazon</a>.
There are more expensive models, which would allow you to connect to
many SIP providers simultaneously, gate in an actual landline, or
connect a FAX machine. Probably you don't need that.</p>
<p>You also need a
acheap prepaid phone. Currently, there is
<a href="http://www.att.com/shop/wireless/devices/alcatel/510a-prepaid.html">one for $15 post paid</a>,
but that is probably the most transient bit of information in this
post. In any case, it doesn't matter what phone you get, so you might
as well get the cheapest. It's likely that you can do this with
carriers other than ATT, but ATT demonstrably works for many people, and,
unless someone pays you to take their phone, it can at most be $15 cheaper.
When you order the phone online, you will be asked if
you want to port an existing number. The answer is yes! If you enter
no here, you'll dramatically increase the amount of time you
spend talking to ATT reps on the phone. The number will not actually
port until the new phone is activated, so you can continue to use the
old line after ordering.</p>
<p>While you wait for packages to arrive, create a <em>completely new</em> Google
account and set up Google Voice on it. You do not want to use an
existing account, or an account you will use for any other purpose, as
it will be necessary later to give Obihai your password. By the same token,
don't set up 2-factor authentication on this new account, or you have
a complete password to give.
While you shouldn't care at all
about the phone number they assign to you, Google will still force
you to choose a preferred exchange or zip code and then, if those
correspond to a somewhat populated city, explain that nothing is available. Save
valuable seconds by starting with something from an obscure plain
state. You might as well complete the GV setup, the most important
aspects of which are (1) selecting <em>Forward calls to Google Chat</em> and
(2) turning <em>off</em> call screening. You have to do both of those, or
nothing will work.
You could also enter an email address to which to forward voice mail.
You should not configure the account to forward voice calls to another
telephone number, as unanswered calls would go nondeterministically to
one of two rival voicemail boxes, depending on how long each took to
answer.</p>
<p>Also set up an account on <a href="http://obitalk.com">obitalk.com</a>, (not obihai.com). It will be recommended that
you sign in with Google rather than create an id and passord, but
don't. You'll need a real password in order to use the cool phone
apps later. When a pop up asks if you want to add an Obi device,
click no and just log off. Then wait for the UPS guy.</p>
<p>When the OBi arrives, follow the <em>exact</em> instructions on the little
card as to what to plug in and in what order. Minor deviations from the
protocol end in failure and force you to start over. When all is
connected and the blinking lights have reached equilibrium, make the
suggested test call from the attached telephone, <code>**9 222 222
222</code>. If you know too much, you'll wonder whether you need to open
up a router port; the answer is no, the device initiates
connections through the NAT.</p>
<p>Now log back in to obitalk. Click "add a device," and choose your
model. You'll then be given a more random looking <code>**</code> number to
dial from the telephone. Shortly after you do, your box will show up
on the "Dashboard." Basically, it filled in an "OBi number," a serial number and
a MAC address for you. You could now call up like-minded
electronics consumers, if you knew their OBi number. I suspect that
nobody has ever done this.</p>
<p>To make real calls, you need to connect your Google Voice account.
(Note that the OBi box works with any SIP provider, but I'm only going
to talk about Google.)
Click the gear on the row of the first service provider on the obitalk
dashboard (depending on the exact OBi model,
there may be a few), charge headlong through the modal
dialog acknowledging the unavailability of 911 service, choose Google
Voice from the list of providers and enter your Google
credentials as I said you would have to do. Back on the dashboard,
you should see the word "Connected" on the service provider row. If
you see "Backing Off," it's probably because you didn't enter the
correct credentials.</p>
<p>Now, verify that you can make and receive calls on your new, arbitrary
GV number.</p>
<p>When the Go Phone arrives, do the obvious things with the battery and
SIM card and turn it on. You might have expected it to connect to the
ATT network automatically and trigger the porting process. Or, if
you're less optimistic, but still a sucker, you follow the
instructions, which have you call
<code>611</code> from the Go Phone ("cannot connect") or call, from another phone,
a different toll free number (after much DTMF, a synthesized
<a href="http://en.wikipedia.org/wiki/General_American">Standard American</a>
voice fails to recognize your 10-digit number), or online (more or less
the same message, but without the comforting dipthongs). What you'll have
to do instead is call the technical support line for <em>non</em>-pre-paid
phones. They will try to transfer you to the useless toll-free line,
but what you really want is technical support in the <em>porting
department</em>. It may be unpatriotic to say, but basically keep trying until the person
on the other end is obviously only pretending to be American, as he's
likely to have been educated properly.
At some point during this saga, ask for and
write down your ATT account number "for your records." It isn't
written on any document to which you have access, and you'll need it
later.</p>
<p>After much fuss, your temporary mobile phone will display a 3G
signal. Either via <code>611</code> or <a href="http://att.com/mygophone">att.com/mygophone</a>, add the
smallest possible amount of money to your account, which seems to be $10.
You only need the phone to work for a few days.
The old carrier's line will, of
course, go dead; although the number port is supposed to cancel the
departed service automatically, it might not, so call whatever well
hidden support line is supposed to handle this, prepared to turn a
deaf ear to repeated pleas for your continued custom.</p>
<p>Now that your old number lives on ATT wireless, it is straightforward
it to Google. For this, you'll pay $20, and you'll need the ATT account
number you wrote down two paragraphs ago. Within a few days, if all goes
well, you will have a setup that behaves exactly like your old fixed
phone, but with no ongoing charges (though you still have to pay for
international calls).
You'll have spent a total of $85, assuming you can't sell the burner
(which might actually be worth keeping as a backup for <code>911</code>
calls, carriers being required to put those through whether or not you
have an account).</p>
<p>Of course all the standard Google Voice goodies work. You can
designate former friends as spammerss, set up custom greetings for
actual friends, receive SMS (though only from within the US), etc.</p>
<p>For one last cheap thrill, you can download the OBiON app for iOS or
Android. This
lets you call through your obitalk line from the smarphone over 3G or
wifi. It's probably only useful for international calls, though there
might be occasional benefit to pretending to be calling someone from home.
You need to log in with obitalk credentials, which is why you set them
up earlier instead of logging into obitalk with a Google ID. Note
that you could have used OBiON without the OBi device, but I can't
think of a good reason to do so.</p>
<hr>
<p>Well, that was fun. Having done all this, I will probably use the
fixed phone as many times each year as I did last, so maybe 5 times
altogether before I get bored and try something else.</p>Typecasting, part 22013-10-09T21:57:00-04:002013-10-09T21:57:00-04:00Peter Fraenkeltag:blog.podsnap.com,2013-10-09:/imdb-part2.html<p>Some time after my recent <a href="hollywood-typecasting.html">fiddles with IMDB</a>, I
read an interesting
<a href="http://honnibal.wordpress.com/2013/09/11/a-good-part-of-speechpos-tagger-in-about-200-lines-of-python/">article</a>
about using a <a href="http://en.wikipedia.org/wiki/Perceptron">perceptron</a> to classify
words as parts of speech based on features that precede them in text. It's all
done in python or some such sh*t, but whatever. Still very cool. Since I had
all of this IMDB data accumulated in Mongo, I thought I would try to play with
it, and the idea I had was to predict <a href="http://www.metacritic.com/">metacritic</a>
scores from the actors that appeared in each film. In retrospect, it's far from
clear that such a prediction can be made and especialy …</p><p>Some time after my recent <a href="hollywood-typecasting.html">fiddles with IMDB</a>, I
read an interesting
<a href="http://honnibal.wordpress.com/2013/09/11/a-good-part-of-speechpos-tagger-in-about-200-lines-of-python/">article</a>
about using a <a href="http://en.wikipedia.org/wiki/Perceptron">perceptron</a> to classify
words as parts of speech based on features that precede them in text. It's all
done in python or some such sh*t, but whatever. Still very cool. Since I had
all of this IMDB data accumulated in Mongo, I thought I would try to play with
it, and the idea I had was to predict <a href="http://www.metacritic.com/">metacritic</a>
scores from the actors that appeared in each film. In retrospect, it's far from
clear that such a prediction can be made and especialy that a perceptron is well
suited to making it. I tried a few things, and everything sucked, so the only
saving grace was that some bits were fun to implement.</p>
<p>The first task was to augment my database, which had been built mostly for
exploring linkages, with data that one might want to either predict or use for
predicting. Basically, I need to add a <code>:metacritic-score</code> field to each
record, and while I'm doing it, I should grab anything else that's easy to find
in approximately the same place. I didn't want to wait a very long time, and
imdb.com's IP throttling seems to get confused by my VPN provider, so the
parallel GETs are in order, but not an unlimited number of GETs.
In a situation like this, I like
to set up a worker pool using <code>core.async</code>. If you have no idea what
that is, read
<a href="http://clojure.com/blog/2013/06/28/clojure-core-async-channels.html">this</a>,
<a href="http://www.infoq.com/news/2013/07/core-async">this</a> and
<a href="http://swannodette.github.io/2013/07/12/communicating-sequential-processes/">this</a>.
Then look at this:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">wc</span> <span class="p">(</span><span class="nf">a/chan</span><span class="p">)</span>, <span class="nv">n</span> <span class="mi">5</span><span class="p">]</span>
<span class="p">(</span><span class="nb">dotimes </span><span class="p">[</span><span class="nv">i</span> <span class="nv">n</span><span class="p">]</span>
<span class="p">(</span><span class="nf">a/go</span> <span class="p">(</span><span class="k">loop </span><span class="p">[]</span>
<span class="p">(</span><span class="nb">when-let </span><span class="p">[</span><span class="nv">rec</span> <span class="p">(</span><span class="nf">a/<!</span> <span class="nv">wc</span><span class="p">)]</span>
<span class="p">(</span><span class="nb">println </span><span class="s">"Doing stuff with"</span> <span class="nv">rec</span> <span class="s">"in thread"</span> <span class="nv">i</span><span class="p">)</span>
<span class="p">(</span><span class="nf">recur</span><span class="p">))</span>
<span class="p">(</span><span class="nb">println </span><span class="s">"Shutting down thread"</span> <span class="nv">i</span><span class="p">))))</span>
<span class="p">(</span><span class="nf">a/<!!</span> <span class="p">(</span><span class="nf">a/onto-chan</span> <span class="nv">c</span> <span class="p">(</span><span class="nf">repeatedly</span> <span class="mi">10</span> <span class="nv">rand</span><span class="p">))))</span>
</code></pre></div>
<p>which prints out something like:</p>
<div class="highlight"><pre><span></span><code>Doing stuff with 0 in thread Doing stuff with Doing stuff withDoing stuff with 4
Doing stuff with
4 1 3 in thread 3
Doing stuff with 2 Doing stuff with 5 in thread 4
in thread 2
in thread Doing stuff with0
6Doing stuff with in thread 3
in thread Shutting down thread 3
Doing stuff with
nil
8 9 in thread 4
1
in thread 7 0
Shutting down thread Shutting down thread in thread 0
Shutting down thread 4
1
2
Shutting down thread 2
</code></pre></div>
<p>Obviously stuff is happening in parallel. Let's walk through it.</p>
<p>First, maybe obviously, my name space
has <code>:require</code>d <code>[core.async :as a]</code>. The basic idea is that
anything we want done will be lobbed onto the <code>wc</code> channel, where
a configurable number of workers will compete to unload it.
We spawn <code>n</code> of these workers as identical <code>go</code> forms,
which return immediately while
their bodies continue running in the background. You'll often see a
worker created like</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="n">a</span><span class="o">/</span><span class="k">go</span><span class="w"> </span><span class="p">(</span><span class="k">while</span><span class="w"> </span><span class="k">true</span><span class="w"> </span><span class="p">(</span><span class="n">println</span><span class="w"> </span><span class="p">(</span><span class="n">a</span><span class="o">/<</span><span class="err">!</span><span class="w"> </span><span class="n">wc</span><span class="p">))))</span><span class="w"></span>
</code></pre></div>
<p>but this is untidily immortal.
It's better to capture the <code>nil</code> that indicates
that the channel has been closed, so you can clean up properly:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="n">a</span><span class="o">/</span><span class="k">go</span><span class="w"> </span><span class="p">(</span><span class="n">loop</span><span class="w"> </span><span class="err">[]</span><span class="w"> </span><span class="p">(</span><span class="k">when</span><span class="o">-</span><span class="n">let</span><span class="w"> </span><span class="o">[</span><span class="n">l (a/<! wc)</span><span class="o">]</span><span class="w"> </span><span class="p">(</span><span class="n">do</span><span class="o">-</span><span class="nf">stuff</span><span class="w"> </span><span class="n">a</span><span class="p">)(</span><span class="n">recur</span><span class="p">))</span><span class="w"></span>
<span class="w"> </span><span class="p">(</span><span class="n">clean</span><span class="o">-</span><span class="n">up</span><span class="p">)))</span><span class="w"></span>
</code></pre></div>
<p>If this is one too many pairs of parentheses, <code>core.async</code> has some
sugar for you:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="n">a</span><span class="o">/</span><span class="k">go</span><span class="o">-</span><span class="n">loop</span><span class="w"> </span><span class="err">[]</span><span class="w"> </span><span class="p">(</span><span class="k">when</span><span class="o">-</span><span class="n">let</span><span class="w"> </span><span class="o">[</span><span class="n">l (a/<! wc)</span><span class="o">]</span><span class="w"> </span><span class="p">(</span><span class="n">do</span><span class="o">-</span><span class="nf">stuff</span><span class="w"> </span><span class="n">a</span><span class="p">)(</span><span class="n">recur</span><span class="p">))</span><span class="w"></span>
<span class="w"> </span><span class="p">(</span><span class="n">clean</span><span class="o">-</span><span class="n">up</span><span class="p">))</span><span class="w"></span>
</code></pre></div>
<p>In any case, we can now offer work by <code>>!!</code>ing it into <code>wc</code>. If
all the workers are busy, the operation will block, which is most likely
the behavior we want. If, in equilibrium, we're generating work faster than
it can be consumed, <em>something nondeterministic and therefore bad</em>
will happen: either we drop work on the
floor, or we accumulate a boundless queue that will eventually serve us an
<code>OutOfMemoryError</code> exception.</p>
<p>A straightforward way to feed in a preexisting collection is</p>
<div class="highlight"><pre><span></span><code>(doseq [rec my-coll] (a/>!! wc rec)) (a/close! wc))
</code></pre></div>
<p>but the there's a handy function that does a lot of this for us</p>
<div class="highlight"><pre><span></span><code>(a/<!! (a/onto-chan wc my-coll))
</code></pre></div>
<p>even (optionally) closing the channel for us afterwards. The only
thing to watch out for is that <code>onto-chan</code> returns immediately with a
channel that will receive notification when everything's been processed.
If we <em>want</em> to block, we can explicitly wait on that channel with <code><!!</code>.</p>
<p>Finally, since we're using async anyway, we might as well also use it to
linearize our status messages by creating a worker for the purpose:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">wc</span> <span class="p">(</span><span class="nf">a/chan</span><span class="p">)</span>, <span class="nv">lc</span> <span class="p">(</span><span class="nf">a/chan</span><span class="p">)</span>, <span class="nv">n</span> <span class="mi">5</span><span class="p">]</span>
<span class="p">(</span><span class="nf">a/go-loop</span> <span class="p">[]</span> <span class="p">(</span><span class="nb">when-let </span><span class="p">[</span><span class="nv">l</span> <span class="p">(</span><span class="nf">a/<!</span> <span class="nv">lc</span><span class="p">)]</span> <span class="p">(</span><span class="nb">apply println </span><span class="nv">l</span><span class="p">)</span> <span class="p">(</span><span class="nf">recur</span><span class="p">)))</span>
<span class="p">(</span><span class="nb">dotimes </span><span class="p">[</span><span class="nv">i</span> <span class="nv">n</span><span class="p">]</span>
<span class="p">(</span><span class="nf">a/go-loop</span> <span class="p">[]</span>
<span class="p">(</span><span class="nb">when-let </span><span class="p">[</span><span class="nv">rec</span> <span class="p">(</span><span class="nf">a/<!</span> <span class="nv">wc</span><span class="p">)]</span>
<span class="p">(</span><span class="nf">a/>!</span> <span class="nv">lc</span> <span class="p">[</span><span class="s">"Doing stuff with"</span> <span class="nv">rec</span> <span class="s">"in thread"</span> <span class="nv">i</span><span class="p">])</span>
<span class="p">(</span><span class="nf">recur</span><span class="p">))</span>
<span class="p">(</span><span class="nf">a/>!</span> <span class="nv">lc</span> <span class="p">[</span><span class="s">"Shutting down thread"</span> <span class="nv">i</span><span class="p">])))</span>
<span class="p">(</span><span class="nf">a/<!!</span> <span class="p">(</span><span class="nf">a/onto-chan</span> <span class="nv">wc</span> <span class="p">(</span><span class="nb">range </span><span class="mi">10</span><span class="p">))))</span>
</code></pre></div>
<p>giving us the rather more organized</p>
<div class="highlight"><pre><span></span><code>Doing stuff with 2 in thread 2
Doing stuff with 1 in thread 4
Doing stuff with 0 in thread 1
Doing stuff with 3 in thread 0
Doing stuff with 5 in thread 2
Doing stuff with 4 in thread 3
Doing stuff with 6 in thread 4
Doing stuff with 7 in thread 1
Doing stuff with 8 in thread 0
Doing stuff with 9 in thread 2
Shutting down thread 3
Shutting down thread 4
Shutting down thread 1
Shutting down thread 0
Shutting down thread 2
</code></pre></div>
<p>In production, you'd likely be using a real logging framework, nearly all
of which are nicely fronted by
<a href="https://github.com/clojure/tools.logging/">clojure.tools.logging</a>, but
this just goes to illustrate that <code>core.async</code> is now so intuitive and
pleasant that it saves you time even in non-production work.</p>
<p>Now we have to add static typing to this motherf***r. As a vocal
<a href="http://www.indiegogo.com/projects/typed-clojure">booster</a>
of <a href="https://github.com/clojure/core.typed">core.typed</a>, I feel comfortable
admitting that the business is in this case a little unattractive. Much
of the magic of <code>core.typed</code> is implemented in macros, and you can't add
typing annotations to macros - you'd have to annotate the underlying functions
they call, and many of those functions are, while not technically private,
considered to be internal.</p>
<p>As it happens, <code>core.typed</code> ships with an annotated shell for
<code>core.async</code>, providing alternate versions of key macros and functions.
For example, <code>(chan> Number)</code> is a channel through
which numbers pass. There's also a <code>go></code> macro, which takes no special
type
arguments and substitutes exactly for <code>go</code>. You might wonder why, if we use
it exactly the same way, do we need <code>go></code> at all, and the answer is
<a href="https://github.com/clojure/core.typed/blob/master/src/main/clojure/clojure/core/typed/async.clj#L161">ugly</a>.
It mirrors the innards of <code>go</code>, except that it uses
<code>chan></code> instead of <code>chan</code> and wraps some typologically gruesome
forms in <code>tc-ignore</code>. There's now an interdependence between
<code>core.typed</code> and <code>core.async</code>, which is acceptable only because of
defensive <code>0.x</code> versioning of both libraries and viable only because
the authors get along well.</p>
<p>This is not sustainable. Clearly, package annotations need to move out of
<code>core.typed</code> and into their own distributions which means <code>core.typed</code>
needs to be stable and widely accepted enough that it becomes <em>de rigeur</em> to
depend on it. Essentially, it needs to get to the point where it's considered
part of the language itself. With that in mind, go back on click on the
"booster" link several paragraphs above (or equivalently
<a href="http://www.indiegogo.com/projects/typed-clojure">here</a> ) and support the
crowdsourcing effort to put <code>core.typed</code> on a professional footing.</p>
<p><a href="http://www.indiegogo.com/projects/typed-clojure">Clicked</a> yet?</p>
<p>Until you do, we'll have file headers like this:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">ns </span><span class="nv">whatever</span>
<span class="p">(</span><span class="ss">:require</span> <span class="nv">...</span>
<span class="p">[</span><span class="nv">clojure.core.typed</span> <span class="ss">:as</span> <span class="nv">t</span><span class="p">]</span>
<span class="p">[</span><span class="nv">clojure.core.async</span> <span class="ss">:as</span> <span class="nv">a</span><span class="p">]</span>
<span class="p">[</span><span class="nv">clojure.core.typed.async</span> <span class="ss">:as</span> <span class="nv">ta</span><span class="p">])</span>
<span class="nv">...</span><span class="p">)</span>
<span class="p">(</span><span class="nf">clojure.core.typed/typed-deps</span> <span class="nv">clojure.core.typed.async</span><span class="p">)</span>
</code></pre></div>
<p>Kvetching aside, it really isn't so hard to do things properly, even today.
The code below uses <code>monger</code> and a bunch of functions defined in my
increasingly cluttered IMDB project on <a href="https://github.com/pnf/imdb">github</a>:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="nf">t/ann</span> <span class="o">^</span><span class="ss">:no-check</span> <span class="nv">monger.operators/$exists</span> <span class="nv">clojure.lang.Symbol</span><span class="p">)</span>
<span class="p">(</span><span class="nf">t/ann</span> <span class="o">^</span><span class="ss">:no-check</span> <span class="nv">monger.operators/$regex</span> <span class="nv">clojure.lang.Symbol</span><span class="p">)</span>
<span class="p">(</span><span class="nf">t/ann</span> <span class="nv">pillage-imdb</span> <span class="p">(</span><span class="nf">Fn</span> <span class="p">[</span><span class="nv">t/AnyInteger</span> <span class="nb">-> </span><span class="nv">nil</span><span class="p">]))</span>
<span class="p">(</span><span class="nf">t/ann</span> <span class="o">^</span><span class="ss">:no-check</span> <span class="nv">clojure.core.async/onto-chan</span>
<span class="p">(</span><span class="nf">All</span> <span class="p">[</span><span class="nv">a</span><span class="p">]</span> <span class="p">(</span><span class="nf">Fn</span> <span class="p">[(</span><span class="nf">clojure.core.typed.async/Chan</span> <span class="nv">a</span><span class="p">)</span> <span class="p">(</span><span class="nf">t/Seqable</span> <span class="nv">a</span><span class="p">)</span> <span class="nv">-></span>
<span class="p">(</span><span class="nf">clojure.core.typed.async/ReadOnlyPort</span> <span class="nv">Any</span><span class="p">)])</span> <span class="p">))</span>
<span class="p">(</span><span class="kd">defn </span><span class="nv">pillage-imdb</span> <span class="p">[</span><span class="nv">n</span><span class="p">]</span>
<span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">recs</span> <span class="p">(</span><span class="nf">mc/find-maps</span> <span class="s">"nodes"</span>
<span class="p">{</span><span class="ss">:metacritic-score</span> <span class="p">{</span><span class="nv">mo/$exists</span> <span class="nv">false</span><span class="p">}</span>
<span class="ss">:id</span> <span class="p">{</span><span class="nv">mo/$regex</span> <span class="s">"^\\/title"</span><span class="p">}})</span>
<span class="nv">_</span> <span class="p">(</span><span class="nb">println </span><span class="s">"Found"</span> <span class="p">(</span><span class="nb">count </span><span class="nv">recs</span><span class="p">)</span> <span class="s">"nodes to repair"</span><span class="p">)</span>
<span class="nv">c</span> <span class="p">(</span><span class="nf">ta/chan></span> <span class="nv">MongoRecord</span><span class="p">)]</span>
<span class="p">(</span><span class="nf">t/dotimes></span> <span class="p">[</span><span class="nv">i</span> <span class="nv">n</span><span class="p">]</span>
<span class="p">(</span><span class="nf">ta/go></span>
<span class="p">(</span><span class="k">loop </span><span class="p">[]</span> <span class="p">(</span><span class="nb">when-let </span><span class="p">[</span><span class="nv">rec</span> <span class="p">(</span><span class="nf">a/<!</span> <span class="nv">c</span><span class="p">)]</span>
<span class="p">(</span><span class="nb">-> </span><span class="p">(</span><span class="nb">dissoc </span><span class="nv">rec</span> <span class="ss">:updated</span><span class="p">)</span>
<span class="p">(</span><span class="nf">assure-node</span><span class="p">)</span>
<span class="p">(</span><span class="nf">add-film-details</span><span class="p">)</span>
<span class="p">(</span><span class="nf">add-metacritic-score</span><span class="p">)</span>
<span class="p">(</span><span class="nf">as-></span> <span class="nv">%</span> <span class="p">(</span><span class="k">if </span><span class="p">(</span><span class="nb">get </span><span class="nv">%</span> <span class="ss">:updated</span><span class="p">)</span>
<span class="p">(</span><span class="k">do </span><span class="p">(</span><span class="nb">println </span><span class="s">"Updating in"</span> <span class="nv">i</span> <span class="s">":"</span> <span class="nv">%</span><span class="p">)</span>
<span class="p">(</span><span class="nf">mc/update</span> <span class="s">"nodes"</span> <span class="p">{</span><span class="ss">:id</span> <span class="p">(</span><span class="ss">:id</span> <span class="nv">%</span><span class="p">)}</span>
<span class="p">(</span><span class="nb">dissoc </span><span class="nv">%</span> <span class="ss">:updated</span><span class="p">))))))</span>
<span class="p">(</span><span class="nf">recur</span><span class="p">)))</span>
<span class="p">(</span><span class="nb">println </span><span class="s">"Leaving"</span> <span class="nv">i</span><span class="p">)))</span>
<span class="p">(</span><span class="nf">a/<!!</span> <span class="p">(</span><span class="nf">a/onto-chan</span> <span class="nv">c</span> <span class="nv">recs</span><span class="p">))</span>
<span class="nv">nil</span> <span class="c1">; (because we promised to return nil in annotation)</span>
<span class="p">))</span>
</code></pre></div>
<p>You can certainly tell what this does.
Note that <code>onto.chan</code> wasn't annotated, so I had to do it myself. This
isn't a case of annotating for my special case; the polymorphic declaration is
quite general. Also, <code>core.typed</code> doesn't ship with a <code>go-loop></code> macro,
so we have to type a few extra characters.</p>
<p>The main significance of this unremarkable code is personal. Having started
writing it without typing, I ran into bugs that would normally have required a
hunting expedition and liberal <code>println</code>ing. Instead, I took a step back,
added the type annotations, and my problem showed up immediately in the output
of <code>check-ns</code>. Static typing isn't only about virtue. Sometimes it's also the
easy way out.</p>What if John Conway Wrote Esolangs?2013-09-20T00:00:00-04:002013-09-20T00:00:00-04:00Peter Fraenkeltag:blog.podsnap.com,2013-09-20:/fractran.html<p>This is about a month old.</p>
<p>Actually, scrap that. This kind of thing never gets old.</p>
<p>To wit, a presentation on implementing and "using"
<a href="http://en.wikipedia.org/wiki/FRACTRAN">Fractran</a>, via clojure of course.</p>
<p>The least silly aspects of the show are (1) that we can implement the language very
concisely and functionaly, and (2) to interpret the results, there's a nifty
sieve of Eratosthenes - the <a href="http://www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf">real</a> one.</p>
<iframe src="http://blog.podsnap.com/extra/reveal/fractran.html" height=400 width=800></iframe>
<p>The code is on <a href="https://github.com/pnf/clojure-playground/blob/master/src/clj/playground/fractran.clj">github</a>.</p>Hollywood Typecasting - Adventures with typed clojure and IMDB2013-09-17T16:47:00-04:002013-09-17T16:47:00-04:00Peter Fraenkeltag:blog.podsnap.com,2013-09-17:/hollywood-typecasting.html<p>I am broadly sympathetic to view that scalable systems must be built
with statically typed langages, for reasons outlined in this
<a href="http://williamedwardscoder.tumblr.com/post/54327549368/dynamic-languages-are-unmaintainable-and-unit-testing">wonderful
screed</a>,
and, until recently, that has made it difficult for me to recommend clojure for
institutional use.</p>
<p>With the introduction of <a href="https://github.com/clojure/core.typed">core.typed</a>,
that has changed. The author has
<a href="https://groups.google.com/forum/#!topic/clojure-core-typed/U_aA_Ce3qWg">says</a>
that <code>core.typed</code> is now production-ready, and I agree. It's not perfect, but it
will find bugs in your code without breaking it or causing performance problems.
It's also pretty cool, and in many ways more expressive than type declarations
in "normal" statically typed languages.</p>
<p>That said, the …</p><p>I am broadly sympathetic to view that scalable systems must be built
with statically typed langages, for reasons outlined in this
<a href="http://williamedwardscoder.tumblr.com/post/54327549368/dynamic-languages-are-unmaintainable-and-unit-testing">wonderful
screed</a>,
and, until recently, that has made it difficult for me to recommend clojure for
institutional use.</p>
<p>With the introduction of <a href="https://github.com/clojure/core.typed">core.typed</a>,
that has changed. The author has
<a href="https://groups.google.com/forum/#!topic/clojure-core-typed/U_aA_Ce3qWg">says</a>
that <code>core.typed</code> is now production-ready, and I agree. It's not perfect, but it
will find bugs in your code without breaking it or causing performance problems.
It's also pretty cool, and in many ways more expressive than type declarations
in "normal" statically typed languages.</p>
<p>That said, the documentation is not, uhm, exhaustive. I decided to teach myself
what I could using <a href="https://github.com/pnf/imdb-degrees">this toy project</a>,
which started life as a coding challenge for a job interview. (It didn't work
out, though probably not for reasons connected with the challenge. Thanks for
asking.) The task is to compute movie
<a href="http://en.wikipedia.org/wiki/Six_Degrees_of_Kevin_Bacon">degrees of separation</a>
using data scraped from <a href="http://imdb.com">IMDB</a>: two actors are considered
to be directly connected if they appear in the same film together; if they're
connected via a third actor with whom each of the first two appeared in a film,
then it's a 2nd degree connection, etc. (One of the reasons this is
not a particularly <em>good</em> challenge question is that IMDB limits your scraping
rate so much that it is literally impossible to complete the calculation in the
allotted time for non-trivial examples.)</p>
<p>Many people would say that clojure is not an obvious choice for this task,
since it would seem at first that
<a href="http://en.wikipedia.org/wiki/Dijkstra's_algorithm">Dijkstra</a> is a naturally
stateful algorithm, but
<a href="http://www.amazon.com/Purely-Functional-Structures-Chris-Okasaki/dp/0521663504">persistent data structures</a>
are wonderful things. We can store the graph as a nested map of links, every
"update" to which is effectively a brand new graph.</p>
<p>As you've gathered, this post is about three totally unrelated things, in
descending order of emphasis:</p>
<ul>
<li>Typed clojure</li>
<li>Graph algorithms with functional data structures</li>
<li>Scraping IMDB</li>
</ul>
<p>So hang on tight.</p>
<p>Combining the first two topics,
I now have the luxury of explaining the graph structure by showing you
type signatures:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="nf">t/def-alias</span> <span class="nv">Node</span> <span class="s">"Node representing an actor or film"</span>
<span class="p">(</span><span class="nf">HMap</span> <span class="ss">:mandatory</span> <span class="p">{</span><span class="ss">:id</span> <span class="nv">String</span>
<span class="ss">:links</span> <span class="p">(</span><span class="nf">t/Set</span> <span class="nv">String</span><span class="p">)</span>
<span class="ss">:title</span> <span class="nv">String</span><span class="p">}</span>
<span class="ss">:optional</span> <span class="p">{</span><span class="ss">:distance</span> <span class="nv">t/AnyInteger</span>
<span class="ss">:path</span> <span class="p">(</span><span class="nf">t/Seq</span> <span class="nv">String</span><span class="p">)}</span>
<span class="ss">:complete?</span> <span class="nv">true</span><span class="p">))</span>
<span class="p">(</span><span class="nf">t/def-alias</span> <span class="nv">Graph</span> <span class="s">"Graph of tokens to nodes"</span>
<span class="p">(</span><span class="nf">Map</span> <span class="nv">String</span> <span class="nv">Node</span><span class="p">))</span>
</code></pre></div>
<p>This is <em>almost</em> self-explanatory. Every node has a string id, along with a title
and a set of ids of nodes to which it is linked. As the Dijkstra algorithm
proceeds, we add fields containing the updated estimtated distance from the
starting node and the path by which we got to it. So far, so obvious.</p>
<p>What's <em>not</em> immediately obvious is that this declaration contains more type
information than we're used to seeing in standard OO languages. Remember that
this isn't a structure or class - just an ordinary hash map. The field names
are being represented here as types; that is, a type can specify not just a
particular sort of entity but the values that the entity is allowed to take!
Outide of map structures, you may not find yourself taking advantage of this ability at first, but
when <code>core.typed</code> infers type information within your code, it will be as
conservative as possible, down to the value level if it has that information.
For example, if I had <code>(if (> x 0.0) (Math/log x) "Bleh")</code>, the type might
be inferred as <code>(U double "Bleh")</code>. A more general implication, about which
I'll say more a bit later, is that there are often multiple approaches to typing
within a given piece of code.</p>
<p>Note that the enforcement of this typing is strictly at our discretion. To
typecheck a namespace, run <code>(check-ns)</code>; for a more automated experience,
<code>lein typed check</code> (using a plugin you'll find in my <code>project.clj</code>);
or examine individual expressions in the REPL using <code>cf</code>.
Actual compilation can proceed irrespective of whether we did any of these
things, or whether they succeeded, and the compiled code won't differ either.</p>
<p>Let's take a look at some more type declarations in the project. One possibly
unnecessary complication I introduced was a Mongo caching layer over IMDB, so it
wouldn't be necessary to scrape the same thing more than once. There's a
<a href="http://clojuremongodb.info/">very nice Mongo interface</a> available, but it
doesn't (yet) come typed out of the box. One of the great things about
the separation of definition and type declaration is that I can provide remedial
typing of external libraries:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="nf">t/ann</span> <span class="o">^</span><span class="ss">:no-check</span> <span class="nv">monger.core/connect!</span>
<span class="p">(</span><span class="nf">Fn</span> <span class="p">[(</span><span class="nf">HMap</span> <span class="ss">:optional</span> <span class="p">{</span><span class="ss">:port</span> <span class="nv">t/AnyInteger</span> <span class="ss">:host</span> <span class="nv">String</span><span class="p">}</span> <span class="ss">:complete?</span> <span class="nv">true</span><span class="p">)</span> <span class="nb">-> </span><span class="p">(</span><span class="nf">t/Option</span> <span class="nv">com.mongodb.MongoClient</span><span class="p">)]</span>
<span class="p">[</span><span class="nb">-> </span><span class="p">(</span><span class="nf">t/Option</span> <span class="nv">com.mongodb.MongoClient</span><span class="p">)]))</span>
<span class="p">(</span><span class="nf">t/ann</span> <span class="o">^</span><span class="ss">:no-check</span> <span class="nv">monger.core/get-db</span> <span class="p">[</span><span class="nv">String</span> <span class="nb">-> </span><span class="nv">com.mongodb.DBApiLayer</span><span class="p">])</span>
<span class="p">(</span><span class="nf">t/ann</span> <span class="o">^</span><span class="ss">:no-check</span> <span class="nv">monger.core/set-db!</span> <span class="p">[</span><span class="nv">com.mongodb.DBApiLayer</span> <span class="nb">-> </span><span class="nv">com.mongodb.gridfs.GridFS</span><span class="p">])</span>
</code></pre></div>
<p>As you can see, this adds type information to functions in another namespace.</p>
<ul>
<li>The <code>:no-check</code> metadata tells <code>core.typed</code> not to bother
checking within the definitions of these functions (which would certainly
fail).</li>
<li><code>Option</code> is shorthand for <code>(U Something nil)</code> The monadic allusion
may be overplayed here, but explictly noting that <code>nil</code> is a possible
outcome allows a type error if my code doesn't handle it.</li>
<li>As you can see, qualified Java class names can be types.</li>
</ul>
<p>I also needed external libraries for web queries and parsing. My
<code>httpkit</code> annotation</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="nf">t/ann</span> <span class="o">^</span><span class="ss">:no-check</span> <span class="nv">org.httpkit.client/request</span>
<span class="p">[</span><span class="o">'</span><span class="p">{</span><span class="ss">:url</span> <span class="nv">String</span> <span class="ss">:method</span> <span class="p">(</span><span class="nf">Value</span> <span class="ss">:get</span><span class="p">)}</span>, <span class="p">[</span><span class="nv">Any</span> <span class="nb">-> </span><span class="nv">Any</span><span class="p">]</span> <span class="nv">-></span>
<span class="p">(</span><span class="nf">t/Atom1</span> <span class="o">'</span><span class="p">{</span><span class="ss">:status</span> <span class="nv">Number</span> <span class="ss">:body</span> <span class="nv">String</span><span class="p">})])</span>
</code></pre></div>
<p>does not even attempt completeness.
As long as I err on the overly restrictive side, this does nobody
any harm. Worst case, I'll realize I need to use the function more
generally and have to expand my annotation. (In truth, <code>[Any->Any]</code> is
too broad for the callback function, so I've sort of broken my own rule.
I think it should be <code>(Value identity)</code>, but <code>core.typed</code> doesn't
permit function constants at this point.)</p>
<p>I used <code>enlive</code> to convert the raw
content stream from the imdb.com into a nested map. The declaration
language is rich enough to describe this object recursively</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="nf">t/def-alias</span> <span class="nv">Resource</span> <span class="s">"Parsed DOM resource"</span>
<span class="p">(</span><span class="nf">Rec</span> <span class="p">[</span><span class="nv">x</span><span class="p">]</span> <span class="p">(</span><span class="nf">t/Map</span> <span class="nv">Keyword</span> <span class="p">(</span><span class="nf">U</span> <span class="nv">String</span> <span class="nv">x</span> <span class="p">(</span><span class="nf">t/Seq</span> <span class="nv">x</span><span class="p">))))</span>
</code></pre></div>
<p>where <code>x</code> stands for the full declaration of <code>Resource</code>, and it's easy
enough to declare the functions I use:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="nf">t/ann</span> <span class="o">^</span><span class="ss">:no-check</span> <span class="nv">net.cgrand.enlive-html/html-resource</span> <span class="p">[</span><span class="nv">java.io.Reader</span> <span class="nb">-> </span><span class="nv">Resource</span><span class="p">])</span>
<span class="p">(</span><span class="nf">t/ann</span> <span class="o">^</span><span class="ss">:no-check</span> <span class="nv">net.cgrand.enlive-html/select</span> <span class="p">[</span><span class="nv">Resource</span> <span class="p">(</span><span class="nf">t/Vec</span> <span class="nv">Keyword</span><span class="p">)</span> <span class="nb">-> </span><span class="p">(</span><span class="nf">t/Seq</span> <span class="nv">Resource</span><span class="p">)])</span>
</code></pre></div>
<p>The actual parsing, however, is a bit ugly.
Either we rigorously validate all
the assumptions we make about the layout of the document, or we punt on the
type checking. Extraction of links to actors or films starts a bit like this:</p>
<div class="highlight"><pre><span></span><code> <span class="p">(</span><span class="nb">-> </span><span class="nv">resource</span>
<span class="p">(</span><span class="nf">html/select</span> <span class="p">[</span><span class="ss">:div.title</span><span class="p">])</span>
<span class="p">(</span><span class="nf">as-></span> <span class="nb">doc </span><span class="p">(</span><span class="nb">map </span><span class="o">#</span><span class="p">(</span><span class="nb">-> </span><span class="nv">%</span> <span class="ss">:content</span> <span class="nb">second </span><span class="ss">:attrs</span> <span class="ss">:href</span><span class="p">)</span> <span class="nv">doc</span><span class="p">)))</span>
</code></pre></div>
<p>That is, we expect to find a list of <code>div</code>s of class <code>title</code>, the second
item in the content of each of which is supposed to be the link we're interested
in. Expressing this with higher order functions and threading is relatively
compact and readable, but the type inference turns into a complete mess.
If we truly cared about Blythe Danner's relation to Meatloaf,
it would be worth doing a host of
extra validations, including but hardly limited to type checking, but there's
no guarantee that IMDB would be using the same format by the time we were done.
(We might hope that they don't: IMDB pages are a total rat's nest. I make life
a little easier by scraping the mobile pages, but there are still nuances such
as different URLs for actors and actresses, despite no explicit data for gender.)
In any case, I wrap my link extraction routine in a big fat <code>no-check</code>.</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="nf">t/ann</span> <span class="o">^</span><span class="ss">:no-check</span> <span class="nv">resource->links</span> <span class="p">[</span><span class="nv">String</span> <span class="nv">Resource</span> <span class="nb">-> </span><span class="p">(</span><span class="nf">t/Set</span> <span class="nv">String</span><span class="p">)])</span>
</code></pre></div>
<p>which nicely isolates the most typologically problematic portion of the code
while allowing me to make progress on the rest.</p>
<p>The final external library I'll use is <code>data.priority-map</code>. At each
iteration of the Dijkstra algorithm, we visit the node that currently has the
smallest estimated distance, and it's nice to be able to find that node in
something less than O(n). I could use <code>java.util.PriorityQueue</code>, but
<a href="https://github.com/clojure/data.priority-map">This</a> lovely package gives me a
more idiomatic clojure solution. The object created by <code>(priority-map)</code>
behaves like a map, but you can use <code>peek</code> and <code>pop</code> to retrieve and
discard the entry with the smallest value.</p>
<p>Once again, of course, the package doesn't come
pre-typed, but <code>core.typed</code> lets me correct for that:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="nf">t/def-alias</span> <span class="nv">Queue</span> <span class="p">(</span><span class="nf">t/Map</span> <span class="nv">String</span> <span class="nv">t/AnyInteger</span><span class="p">))</span>
<span class="p">(</span><span class="nf">t/ann</span> <span class="o">^</span><span class="ss">:no-check</span> <span class="nv">clojure.data.priority-map/priority-map</span> <span class="p">[</span> <span class="nb">-> </span><span class="nv">Queue</span><span class="p">])</span>
<span class="p">(</span><span class="nf">t/ann</span> <span class="o">^</span><span class="ss">:no-check</span> <span class="nv">pmap-assoc</span> <span class="p">[</span><span class="nv">Queue</span> <span class="nv">String</span> <span class="p">(</span><span class="nf">U</span> <span class="nv">nil</span> <span class="nv">t/AnyInteger</span><span class="p">)</span> <span class="nb">-> </span><span class="nv">Queue</span><span class="p">])</span>
<span class="p">(</span><span class="k">def </span><span class="nv">pmap-assoc</span> <span class="nv">assoc</span><span class="p">)</span>
<span class="p">(</span><span class="nf">t/ann</span> <span class="o">^</span><span class="ss">:no-check</span> <span class="nv">pmap-peek</span> <span class="p">[</span><span class="nv">Queue</span> <span class="nb">-> </span><span class="p">(</span><span class="nf">Vector*</span> <span class="nv">String</span> <span class="nv">t/AnyInteger</span><span class="p">)])</span>
<span class="p">(</span><span class="k">def </span><span class="nv">pmap-peek</span> <span class="nv">peek</span><span class="p">)</span>
<span class="p">(</span><span class="nf">t/ann</span> <span class="o">^</span><span class="ss">:no-check</span> <span class="nv">pmap-pop</span> <span class="p">[</span><span class="nv">Queue</span> <span class="nb">-> </span><span class="nv">Queue</span><span class="p">])</span>
<span class="p">(</span><span class="k">def </span><span class="nv">pmap-pop</span> <span class="nv">pop</span><span class="p">)</span>
</code></pre></div>
<p>The full cache layering can be summarized with these declarations:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="nf">t/def-alias</span> <span class="nv">Graph</span> <span class="s">"Graph of tokens to nodes"</span> <span class="p">(</span><span class="nf">t/Map</span> <span class="nv">String</span> <span class="nv">Node</span><span class="p">))</span>
<span class="p">(</span><span class="nf">t/ann</span> <span class="nv">fetch-node-from-imdb</span> <span class="p">[</span><span class="nv">String</span> <span class="nb">-> </span><span class="nv">Node</span><span class="p">])</span>
<span class="p">(</span><span class="nf">t/ann</span> <span class="nv">fetch-node-from-mongo-fallback-to-imdb</span> <span class="p">[</span><span class="nv">String</span> <span class="nb">-> </span><span class="nv">Node</span><span class="p">])</span>
<span class="p">(</span><span class="nf">t/ann</span> <span class="nv">fetch-node-from-graph-fallback-to-mongo</span> <span class="p">[</span><span class="nv">Graph</span> <span class="nv">String</span> <span class="nb">-> </span><span class="p">(</span><span class="nf">Vector*</span> <span class="nv">Graph</span> <span class="nv">Node</span><span class="p">)])</span>
</code></pre></div>
<p>And now for the core of the algorithm. Since I have complete control over my
own code, there will be none of this <code>^:no-check</code> nonsense; every single
line is type-checked. As the following function illustrates, <code>core.typed</code>
is doing a lot of work on my behalf:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="nf">t/ann</span> <span class="nv">update-queue</span> <span class="p">[</span><span class="nv">Queue</span> <span class="nv">Graph</span> <span class="nv">String</span> <span class="p">(</span><span class="nf">t/Seqable</span> <span class="nv">String</span><span class="p">)</span> <span class="nb">-> </span><span class="nv">Queue</span><span class="p">])</span>
<span class="p">(</span><span class="kd">defn </span><span class="nv">update-queue</span> <span class="p">[</span><span class="nv">queue</span> <span class="nv">graph</span> <span class="nv">id</span> <span class="nv">links</span><span class="p">]</span>
<span class="p">(</span><span class="nb">reduce </span><span class="p">(</span><span class="nf">t/fn></span> <span class="ss">:-</span> <span class="nv">Queue</span> <span class="p">[</span><span class="nv">q</span> <span class="ss">:-</span> <span class="nv">Queue</span> <span class="nv">id</span> <span class="ss">:-</span> <span class="nv">String</span><span class="p">]</span>
<span class="p">(</span><span class="nf">pmap-assoc</span> <span class="nv">q</span> <span class="nv">id</span> <span class="p">(</span><span class="ss">:distance</span> <span class="p">(</span><span class="nb">get </span><span class="nv">graph</span> <span class="nv">id</span><span class="p">))))</span> <span class="nv">queue</span> <span class="nv">links</span><span class="p">))</span>
</code></pre></div>
<p>To verify that this code does indeed produce a <code>Queue</code>, type must be
properly inferred through 4 layers of function calls, one of them higher-order.
I find that impressive.</p>
<p>Had I fat-fingered the last line as</p>
<div class="highlight"><pre><span></span><code> ... (pmap-assoc id q (:distance (get graph id)))) queue links)
</code></pre></div>
<p>I would have learned that immediately</p>
<div class="highlight"><pre><span></span><code> Type Error (imdb.core:219:18) Type mismatch:
Expected: Queue
Actual: String
in: (imdb.core/pmap-assoc id412107 q (:distance (clojure.lang.RT/get graph id412107)))
Type Error (imdb.core:219:18) Type mismatch:
Expected: String
Actual: Queue
in: (imdb.core/pmap-assoc id412107 q (:distance (clojure.lang.RT/get graph id412107)))
</code></pre></div>
<p>rather than having to deduce it from some downstream catastrophe.</p>
<p>The rest of the algorithm contains few surprises. As promised, it's purely
functional, with multiple iterations of the graph emerging from an recursion and reduction.</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">update-distance</span>
<span class="s">"Update :distance field in graph node id to be one greater than di if appropriate</span>
<span class="s"> and return the updated graph."</span>
<span class="p">[</span><span class="nv">entry</span> <span class="nv">graph</span> <span class="nv">id</span><span class="p">]</span>
<span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">di</span> <span class="p">(</span><span class="nf">distance</span> <span class="nv">entry</span><span class="p">)</span>
<span class="p">[</span><span class="nv">graph</span> <span class="nv">node</span><span class="p">]</span> <span class="p">(</span><span class="nf">fetch-node-from-graph</span> <span class="nv">graph</span> <span class="nv">id</span><span class="p">)</span>
<span class="nb">path </span> <span class="p">(</span><span class="nb">or </span><span class="p">(</span><span class="ss">:path</span> <span class="nv">entry</span><span class="p">)</span> <span class="o">'</span><span class="p">())]</span>
<span class="p">(</span><span class="k">if </span><span class="p">(</span><span class="nb"><= </span><span class="p">(</span><span class="nf">distance</span> <span class="nv">node</span><span class="p">)</span> <span class="nv">di</span><span class="p">)</span> <span class="nv">graph</span>
<span class="p">(</span><span class="nb">-> </span><span class="nv">graph</span>
<span class="p">(</span><span class="nf">set-node</span> <span class="nv">id</span> <span class="ss">:distance</span> <span class="p">(</span><span class="nb">inc </span><span class="nv">di</span><span class="p">))</span>
<span class="p">(</span><span class="nf">set-node</span> <span class="nv">id</span> <span class="ss">:path</span> <span class="p">(</span><span class="nb">conj path </span><span class="nv">id</span><span class="p">))))))</span>
<span class="p">(</span><span class="nf">t/ann</span> <span class="nv">visit-node</span> <span class="p">[</span><span class="nv">Graph</span> <span class="nv">Queue</span> <span class="nv">String</span> <span class="nb">-> </span><span class="p">(</span><span class="nf">Vector*</span> <span class="nv">Graph</span> <span class="nv">Queue</span><span class="p">)])</span>
<span class="p">(</span><span class="kd">defn </span><span class="nv">visit-node</span>
<span class="s">"Visits node id, and update the :distance fields of its neighbors.</span>
<span class="s"> queue is a priority queue of id=>distance, containing entries only for unvisited</span>
<span class="s"> nodes. We will use this to determine what to visit next, Dijkstra style."</span>
<span class="p">[</span><span class="nv">graph</span> <span class="nv">queue</span> <span class="nv">id</span><span class="p">]</span>
<span class="p">(</span><span class="k">let </span><span class="p">[[</span><span class="nv">graph</span> <span class="nv">entry</span><span class="p">]</span> <span class="p">(</span><span class="nf">fetch-node-from-graph</span> <span class="nv">graph</span> <span class="nv">id</span><span class="p">)</span>
<span class="nv">links</span> <span class="p">(</span><span class="nb">filter </span><span class="o">#</span><span class="p">(</span><span class="nb">nil? </span><span class="p">(</span><span class="ss">:visited</span> <span class="p">(</span><span class="nf">graph</span> <span class="nv">%</span><span class="p">)))</span> <span class="p">(</span><span class="ss">:links</span> <span class="nv">entry</span><span class="p">))</span>
<span class="nv">graph</span> <span class="p">(</span><span class="nb">reduce </span><span class="p">(</span><span class="nb">partial </span><span class="nv">update-distance</span> <span class="nv">entry</span><span class="p">)</span> <span class="nv">graph</span> <span class="nv">links</span><span class="p">)</span>
<span class="nv">queue</span> <span class="p">(</span><span class="nf">update-queue</span> <span class="nv">queue</span> <span class="nv">graph</span> <span class="nv">id</span> <span class="nv">links</span><span class="p">)</span>
<span class="nv">graph</span> <span class="p">(</span><span class="nf">set-node</span> <span class="nv">graph</span> <span class="nv">id</span> <span class="ss">:visited</span> <span class="nv">true</span><span class="p">)]</span>
<span class="p">[</span><span class="nv">graph</span> <span class="nv">queue</span><span class="p">]))</span>
<span class="p">(</span><span class="nf">t/ann</span> <span class="nv">find-distance</span> <span class="p">[</span><span class="nv">String</span> <span class="nv">String</span> <span class="nb">-> </span><span class="p">(</span><span class="nf">t/Option</span> <span class="nv">Any</span><span class="p">)])</span>
<span class="p">(</span><span class="kd">defn </span><span class="nv">find-distance</span>
<span class="s">"E.g. (find-distance \"/name/nm0000257/\" \"/name/nm0000295/\")</span>
<span class="s"> Use Dijkstra algorithm to find shortest path from id to target."</span>
<span class="p">[</span><span class="nv">id</span> <span class="nv">target</span><span class="p">]</span>
<span class="p">(</span><span class="k">let </span><span class="p">[</span> <span class="p">[</span><span class="nv">graph</span> <span class="nv">node</span><span class="p">]</span> <span class="p">(</span><span class="nf">fetch-node-from-graph</span> <span class="p">{}</span> <span class="nv">id</span><span class="p">)</span>
<span class="nv">graph</span> <span class="p">(</span><span class="nf">set-node</span> <span class="nv">graph</span> <span class="nv">id</span> <span class="ss">:distance</span> <span class="mi">0</span><span class="p">)</span>
<span class="nv">queue</span> <span class="p">(</span><span class="nf">priority-map</span><span class="p">)]</span>
<span class="p">(</span><span class="nf">t/loop></span> <span class="p">[</span><span class="nv">graph</span> <span class="ss">:-</span> <span class="nv">Graph</span> <span class="nv">graph</span>
<span class="nv">queue</span> <span class="ss">:-</span> <span class="nv">Queue</span> <span class="nv">queue</span>
<span class="nv">id</span> <span class="ss">:-</span> <span class="nv">String</span> <span class="nv">id</span><span class="p">]</span>
<span class="p">(</span><span class="k">if </span><span class="p">(</span><span class="nb">= </span><span class="nv">id</span> <span class="nv">target</span><span class="p">)</span> <span class="p">(</span><span class="nb">get </span><span class="nv">graph</span> <span class="nv">id</span><span class="p">)</span>
<span class="p">(</span><span class="k">let </span><span class="p">[[</span><span class="nv">graph</span> <span class="nv">queue</span><span class="p">]</span> <span class="p">(</span><span class="nf">visit-node</span> <span class="nv">graph</span> <span class="nv">queue</span> <span class="nv">id</span><span class="p">)</span>
<span class="nv">closest</span> <span class="p">(</span><span class="nb">first </span><span class="p">(</span><span class="nf">pmap-peek</span> <span class="nv">queue</span><span class="p">))]</span>
<span class="p">(</span><span class="k">if </span><span class="p">(</span><span class="nf">empty?</span> <span class="nv">queue</span><span class="p">)</span>
<span class="s">"Couldn't find a path!"</span>
<span class="p">(</span><span class="nf">recur</span> <span class="nv">graph</span> <span class="p">(</span><span class="nf">pmap-pop</span> <span class="nv">queue</span><span class="p">)</span> <span class="nv">closest</span><span class="p">)))))))</span>
</code></pre></div>
<p>I mentioned above that the <code>core.typed</code>'s flexibility gives us some
unexpected choices. To investigate this in the REPL, we'll use a couple of
extra tools.
First, <code>(t/ann-form #{"a"} (t/Set String))</code> specifies inline that
<code>#{a}</code> is to be treated as a set of String, rather than the more conservatively
inferred <code>(t/Set (Value "a"))</code>, and <code>(t/cf someform)</code> checks a form and
displays its inferred type.</p>
<p>Here's how <code>union</code> might be annotated</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="nf">t/ann</span> <span class="o">^</span><span class="ss">:no-check</span> <span class="nv">clojure.set/union</span> <span class="p">(</span><span class="nf">All</span> <span class="p">[</span><span class="nv">a</span><span class="p">]</span> <span class="p">(</span><span class="nf">Fn</span> <span class="p">[(</span><span class="nf">t/Set</span> <span class="nv">a</span><span class="p">)</span> <span class="nb">* -> </span><span class="p">(</span><span class="nf">t/Set</span> <span class="nv">a</span><span class="p">)])))</span>
</code></pre></div>
<p>and here's how that would be interpreted:</p>
<div class="highlight"><pre><span></span><code>imdb.core> (t/cf (clojure.set/union (t/ann-form #{1} (t/Set Integer)) (t/ann-form #{"one"} (t/Set String))))
(t/Set (U Integer String))
</code></pre></div>
<p>That certainly isn't what I expected, given apparently analogous declarations in other languages.
For example, scala's sets have this <code>union</code> method</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">union</span><span class="p">(</span><span class="n">that</span><span class="p">:</span> <span class="nc">GenSet</span><span class="p">[</span><span class="nc">A</span><span class="p">]):</span> <span class="nc">Set</span><span class="p">[</span><span class="nc">A</span><span class="p">]</span>
</code></pre></div>
<p>which of course bombs on heterogenous sets:</p>
<div class="highlight"><pre><span></span><code><span class="n">scala</span><span class="o">></span><span class="w"> </span><span class="n">HashSet</span><span class="p">(</span><span class="mi">1</span><span class="p">).</span><span class="ow">union</span><span class="p">(</span><span class="n">HashSet</span><span class="p">(</span><span class="mi">2</span><span class="p">))</span><span class="w"></span>
<span class="nl">res0</span><span class="p">:</span><span class="w"> </span><span class="n">scala</span><span class="p">.</span><span class="n">collection</span><span class="p">.</span><span class="n">immutable</span><span class="p">.</span><span class="n">HashSet</span><span class="o">[</span><span class="n">Int</span><span class="o">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">Set</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="mi">2</span><span class="p">)</span><span class="w"></span>
<span class="n">scala</span><span class="o">></span><span class="w"> </span><span class="n">HashSet</span><span class="p">(</span><span class="mi">1</span><span class="p">).</span><span class="ow">union</span><span class="p">(</span><span class="n">HashSet</span><span class="p">(</span><span class="ss">"one"</span><span class="p">))</span><span class="w"></span>
<span class="o"><</span><span class="n">console</span><span class="o">></span><span class="err">:</span><span class="mi">9</span><span class="err">:</span><span class="w"> </span><span class="nl">error</span><span class="p">:</span><span class="w"> </span><span class="n">type</span><span class="w"> </span><span class="n">mismatch</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="k">found</span><span class="w"> </span><span class="err">:</span><span class="w"> </span><span class="n">String</span><span class="p">(</span><span class="ss">"one"</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="nl">required</span><span class="p">:</span><span class="w"> </span><span class="nc">Int</span><span class="w"></span>
</code></pre></div>
<p>To get the familiar behavior from clojure, we'd have to specify an equality bound on the type parameter:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="nf">t/ann</span> <span class="nv">clojure.set/union</span> <span class="p">(</span><span class="nf">All</span> <span class="p">[</span><span class="nv">x</span> <span class="p">[</span><span class="nv">x1</span> <span class="ss">:<</span> <span class="nv">x</span> <span class="ss">:></span> <span class="nv">x</span><span class="p">]]</span>
<span class="p">(</span><span class="nf">Fn</span> <span class="p">[(</span><span class="nf">t/Set</span> <span class="nv">x</span><span class="p">)</span> <span class="p">(</span><span class="nf">t/Set</span> <span class="nv">x1</span><span class="p">)</span> <span class="nb">* -> </span><span class="p">(</span><span class="nf">t/Set</span> <span class="nv">x1</span><span class="p">)])))</span>
</code></pre></div>
<p>Now:</p>
<div class="highlight"><pre><span></span><code>imdb.core> (t/cf (clojure.set/union (t/ann-form #{1} (t/Set Integer))
(t/ann-form #{"one"} (t/Set String))))
AssertionError Assert failed: 1: Inferred type String is not between bounds Integer and Integer
(and (subtype? inferred upper-bound) (subtype? lower-bound inferred)) clojure.core.typed.cs-gen/subst-gen/fn--10414 (cs_gen.clj:1333)
</code></pre></div>
<p>Neither of these alternate declarations is obviously more correct, and both will catch our coding errors, though in slightly different ways.
Suppose we had the following solecism:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="nf">t/cf</span> <span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">s</span> <span class="p">(</span><span class="nf">clojure.set/union</span> <span class="o">#</span><span class="p">{</span><span class="mi">1</span><span class="p">}</span> <span class="o">#</span><span class="p">{</span><span class="s">"a"</span><span class="p">})]</span>
<span class="p">(</span><span class="nb">map inc </span><span class="nv">s</span><span class="p">)))</span>
</code></pre></div>
<p>With the bounded annotation, the call to <code>union</code> is simply illegal:</p>
<div class="highlight"><pre><span></span><code>AssertionError Assert failed: 1: Inferred type (Value "a") is not between bounds (Value 1) and (Value 1)
(and (subtype? inferred upper-bound) (subtype? lower-bound inferred)) clojure.core.typed.cs-gen/subst-gen/fn--10414 (cs_gen.clj:1333)
</code></pre></div>
<p>With the first, more general declaration, the call to <code>union</code> is fine, but <code>inc</code> will have problems:</p>
<div class="highlight"><pre><span></span><code><span class="kr">Type</span> <span class="nf">Error</span> <span class="p">(</span><span class="n">imdb</span><span class="p">.</span><span class="n">core</span><span class="o">:</span><span class="mi">2</span><span class="o">:</span><span class="mi">6</span><span class="p">)</span> <span class="n">Polymorphic</span> <span class="kr">function</span> <span class="n">clojure</span><span class="p">.</span><span class="n">core</span><span class="o">/</span><span class="nf">map</span> <span class="n">could</span> <span class="kr">not</span> <span class="n">be</span> <span class="n">applied</span> <span class="n">to</span> <span class="n">arguments</span><span class="o">:</span>
<span class="n">Domains</span><span class="o">:</span>
<span class="p">(</span><span class="n">Fn</span> <span class="p">[</span><span class="n">a</span> <span class="n">b</span> <span class="p">...</span> <span class="n">b</span> <span class="o">-></span> <span class="n">c</span><span class="p">])</span> <span class="p">(</span><span class="n">t</span><span class="o">/</span><span class="n">NonEmptySeqable</span> <span class="n">a</span><span class="p">)</span> <span class="p">(</span><span class="n">t</span><span class="o">/</span><span class="n">NonEmptySeqable</span> <span class="n">a</span><span class="p">)</span> <span class="p">...</span> <span class="n">b</span>
<span class="p">(</span><span class="n">Fn</span> <span class="p">[</span><span class="n">a</span> <span class="n">b</span> <span class="p">...</span> <span class="n">b</span> <span class="o">-></span> <span class="n">c</span><span class="p">])</span> <span class="p">(</span><span class="n">U</span> <span class="n">nil</span> <span class="p">(</span><span class="n">clojure</span><span class="p">.</span><span class="n">lang</span><span class="p">.</span><span class="n">Seqable</span> <span class="n">a</span><span class="p">))</span> <span class="p">(</span><span class="n">U</span> <span class="n">nil</span> <span class="p">(</span><span class="n">clojure</span><span class="p">.</span><span class="n">lang</span><span class="p">.</span><span class="n">Seqable</span> <span class="n">b</span><span class="p">))</span> <span class="p">...</span> <span class="n">b</span>
<span class="n">Arguments</span><span class="o">:</span>
<span class="p">(</span><span class="n">Fn</span> <span class="p">[</span><span class="n">t</span><span class="o">/</span><span class="n">AnyInteger</span> <span class="o">-></span> <span class="n">t</span><span class="o">/</span><span class="n">AnyInteger</span><span class="p">]</span> <span class="p">[</span><span class="n">Number</span> <span class="o">-></span> <span class="n">Number</span><span class="p">])</span> <span class="p">(</span><span class="n">t</span><span class="o">/</span><span class="nf">Set</span> <span class="p">(</span><span class="n">U</span> <span class="p">(</span><span class="n">Value</span> <span class="mi">1</span><span class="p">)</span> <span class="p">(</span><span class="n">Value</span> <span class="s">"a"</span><span class="p">)))</span>
<span class="n">Ranges</span><span class="o">:</span>
<span class="p">(</span><span class="n">t</span><span class="o">/</span><span class="n">NonEmptyLazySeq</span> <span class="n">c</span><span class="p">)</span>
<span class="p">(</span><span class="n">clojure</span><span class="p">.</span><span class="n">lang</span><span class="p">.</span><span class="n">LazySeq</span> <span class="n">c</span><span class="p">)</span>
<span class="n">with</span> <span class="n">expected</span> <span class="kr">type</span><span class="o">:</span>
<span class="kr">Any</span>
<span class="kr">in</span><span class="o">:</span> <span class="p">(</span><span class="n">clojure</span><span class="p">.</span><span class="n">core</span><span class="o">/</span><span class="nf">map</span> <span class="n">clojure</span><span class="p">.</span><span class="n">core</span><span class="o">/</span><span class="n">inc</span> <span class="n">s</span><span class="p">)</span>
</code></pre></div>
<p>In both cases, our error is caught, but at different stages of analysis.</p>
<p>This flexibility is immensely powerful, and it encourage more nuanced
thinking about static types. In many situations, finding ourselves
with a heterogeous set would be a sign that something had gone wrong, but not
always. For instance, with our more general annotation, we could do this:</p>
<div class="highlight"><pre><span></span><code>imdb.core> (t/cf (let [s (clojure.set/union #{1} #{1.0})]
(map inc s)))
(clojure.lang.LazySeq Number)
</code></pre></div>
<p>I think I've demonstrated that <code>core.typed</code> is cool.
I know that it has improved my own code.
I hope that it will ultimately remove one of the greatest barriers to broad adoption of clojure.</p>Deriving the Y-Combinator in Clojure2013-08-12T00:00:00-04:002013-08-12T00:00:00-04:00Peter Fraenkeltag:blog.podsnap.com,2013-08-12:/y-combinator.html<p>At some point, everyone wakes up in the middle of the night, in a cold
sweat of panic that they don't truly understand how to derive the
Y-combinator. Well maybe not everyone, but at least me. (Note that
I'm talking about the
<a href="http://en.wikipedia.org/wiki/Fixed-point_combinator#Y_combinator">higher order function</a>,
not the startup incubator.) I ended up reading through quite a few
web pages, all of which presupposed a slightly different background,
before I finally understood. This post distills my understanding, expressed in clojure, which
happens to be what I'm into now. It can now be one of the pages that someone else finds
not …</p><p>At some point, everyone wakes up in the middle of the night, in a cold
sweat of panic that they don't truly understand how to derive the
Y-combinator. Well maybe not everyone, but at least me. (Note that
I'm talking about the
<a href="http://en.wikipedia.org/wiki/Fixed-point_combinator#Y_combinator">higher order function</a>,
not the startup incubator.) I ended up reading through quite a few
web pages, all of which presupposed a slightly different background,
before I finally understood. This post distills my understanding, expressed in clojure, which
happens to be what I'm into now. It can now be one of the pages that someone else finds
not quite adequate for understanding this concept.</p>
<p>Having read the synopsis, we know that the point here is that higher order functions,
being functions, can have fixed points - i.e. <code>f(g)=g</code>,
and that if we were able to find that fixed point
we would be able to implement recursion in a language that doesn't have it.
But it's best to forget that for the moment and just convince yourself that the following
steps follow from each other.</p>
<p>Here is a standard definition of the factorial function:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">fact</span> <span class="p">[</span><span class="nv">n</span><span class="p">]</span> <span class="p">(</span><span class="k">if</span> <span class="p">(</span><span class="nf">=</span> <span class="mi">0</span> <span class="nv">n</span><span class="p">)</span> <span class="mi">1</span> <span class="p">(</span><span class="nf">*</span> <span class="nv">n</span> <span class="p">(</span><span class="nf">fact</span> <span class="p">(</span><span class="nf">-</span> <span class="nv">n</span> <span class="mi">1</span><span class="p">)))))</span>
<span class="p">(</span><span class="k">assert</span> <span class="p">(</span><span class="nf">=</span> <span class="p">(</span><span class="nf">fact</span> <span class="mi">5</span><span class="p">)</span> <span class="mi">120</span><span class="p">))</span>
</code></pre></div>
<p>Following the usual path, we now do something that seems pointless. Rather than
explicitly call the function recursively, we pass in the function to call.</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">fact2</span> <span class="p">[</span><span class="nv">fact</span> <span class="nv">n</span><span class="p">]</span> <span class="p">(</span><span class="k">if</span> <span class="p">(</span><span class="nf">=</span> <span class="mi">0</span> <span class="nv">n</span><span class="p">)</span> <span class="mi">1</span> <span class="p">(</span><span class="nf">*</span> <span class="nv">n</span> <span class="p">(</span><span class="nf">fact</span> <span class="nv">fact</span> <span class="p">(</span><span class="nf">-</span> <span class="nv">n</span> <span class="mi">1</span><span class="p">)))))</span>
<span class="p">(</span><span class="k">assert</span> <span class="p">(</span><span class="nf">=</span> <span class="p">(</span><span class="nf">fact2</span> <span class="nv">fact2</span> <span class="mi">5</span><span class="p">)</span> <span class="mi">120</span><span class="p">))</span>
</code></pre></div>
<p>This <code>fact2</code> thing is no longer explicitly recursive, but it is of course not particularly
useful as it presupposes its own existence. We're going to try to make it more useful.</p>
<p>First, we're going to curry it, so we only have to deal with functions of one argument.</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">fact3</span> <span class="p">[</span><span class="nv">fact</span><span class="p">]</span>
<span class="p">(</span><span class="kd">fn </span><span class="p">[</span><span class="nv">n</span><span class="p">]</span> <span class="p">(</span><span class="k">if</span> <span class="p">(</span><span class="nf">=</span> <span class="mi">0</span> <span class="nv">n</span><span class="p">)</span> <span class="mi">1</span> <span class="p">(</span><span class="nf">*</span> <span class="nv">n</span> <span class="p">((</span><span class="nf">fact</span> <span class="nv">fact</span><span class="p">)</span> <span class="p">(</span><span class="nf">-</span> <span class="nv">n</span> <span class="mi">1</span><span class="p">))</span> <span class="p">))))</span>
<span class="p">(</span><span class="k">assert</span> <span class="p">(</span><span class="nf">=</span> <span class="p">((</span><span class="nf">fact3</span> <span class="nv">fact3</span><span class="p">)</span> <span class="mi">5</span><span class="p">)</span> <span class="mi">120</span><span class="p">))</span>
</code></pre></div>
<p>What we're edging towards is something where the middle bit looks as much like a normal
factorial function as possible, so I'm going to pull the <code>(fact fact)</code> bit out, passing
it in as an argument to an inner function:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">fact4</span> <span class="p">[</span><span class="nv">fact</span><span class="p">]</span> <span class="p">(</span><span class="kd">fn </span><span class="p">[</span><span class="nv">n</span><span class="p">]</span>
<span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">f</span> <span class="p">(</span><span class="kd">fn </span><span class="p">[</span><span class="nv">g</span> <span class="nv">n</span><span class="p">]</span> <span class="p">(</span><span class="k">if</span> <span class="p">(</span><span class="nf">=</span> <span class="mi">0</span> <span class="nv">n</span><span class="p">)</span> <span class="mi">1</span> <span class="p">(</span><span class="nf">*</span> <span class="nv">n</span> <span class="p">(</span><span class="nf">g</span> <span class="p">(</span><span class="nf">-</span> <span class="nv">n</span> <span class="mi">1</span><span class="p">)))))</span> <span class="p">]</span>
<span class="p">(</span><span class="nf">f</span> <span class="p">(</span><span class="nf">fact</span> <span class="nv">fact</span><span class="p">)</span> <span class="nv">n</span><span class="p">)</span>
<span class="p">)))</span>
<span class="p">(</span><span class="k">assert</span> <span class="p">(</span><span class="nf">=</span> <span class="p">((</span><span class="nf">fact4</span> <span class="nv">fact4</span><span class="p">)</span> <span class="mi">5</span><span class="p">)</span> <span class="mi">120</span><span class="p">))</span>
</code></pre></div>
<p>Now f no longer has a <code>(fact fact)</code> in it, and we'll make it even prettier by currying
it:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">fact5</span> <span class="p">[</span><span class="nv">fact</span><span class="p">]</span> <span class="p">(</span><span class="kd">fn </span><span class="p">[</span><span class="nv">n</span><span class="p">]</span>
<span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">f</span> <span class="p">(</span><span class="kd">fn </span><span class="p">[</span><span class="nv">g</span><span class="p">]</span> <span class="p">(</span><span class="kd">fn </span><span class="p">[</span><span class="nv">n</span><span class="p">]</span> <span class="p">(</span><span class="k">if</span> <span class="p">(</span><span class="nf">=</span> <span class="mi">0</span> <span class="nv">n</span><span class="p">)</span> <span class="mi">1</span> <span class="p">(</span><span class="nf">*</span> <span class="nv">n</span> <span class="p">(</span><span class="nf">g</span> <span class="p">(</span><span class="nf">-</span> <span class="nv">n</span> <span class="mi">1</span><span class="p">))))))]</span>
<span class="p">((</span><span class="nf">f</span> <span class="p">(</span><span class="nf">fact</span> <span class="nv">fact</span><span class="p">))</span> <span class="nv">n</span><span class="p">)</span>
<span class="p">)))</span>
<span class="c1">;(assert (= ((fact5 fact5) 5) 120))</span>
</code></pre></div>
<p>The exciting news is that <code>(fn [g] (fn [n] (if (= 0 n) 1 (* n (g (- n 1))))))</code> in the
middle is self contained, normal function. It's not a closure, and it looks a lot like
the original factorial. In fact, it's almost exactly like <code>fact2</code>.
Let's pull it out and give it an evocative name</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="n">def</span><span class="w"> </span><span class="n">fact</span><span class="o">-</span><span class="n">maker</span><span class="w"> </span><span class="p">(</span><span class="n">fn</span><span class="w"> </span><span class="o">[</span><span class="n">g</span><span class="o">]</span><span class="w"></span>
<span class="w"> </span><span class="p">(</span><span class="n">fn</span><span class="w"> </span><span class="o">[</span><span class="n">n</span><span class="o">]</span><span class="w"> </span><span class="p">(</span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="n">n</span><span class="p">)</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="w"> </span><span class="n">n</span><span class="w"> </span><span class="p">(</span><span class="n">g</span><span class="w"> </span><span class="p">(</span><span class="o">-</span><span class="w"> </span><span class="n">n</span><span class="w"> </span><span class="mi">1</span><span class="p">)))))))</span><span class="w"></span>
</code></pre></div>
<p>suggesting that it might be used to <em>make</em> factorial functions, in concert with
another function to which we pass it as an argument:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">fact6</span> <span class="p">[</span><span class="nv">f</span><span class="p">]</span>
<span class="p">(</span><span class="kd">fn </span><span class="p">[</span><span class="nv">fact</span><span class="p">]</span> <span class="p">(</span><span class="kd">fn </span><span class="p">[</span><span class="nv">n</span><span class="p">]</span> <span class="p">((</span><span class="nf">f</span> <span class="p">(</span><span class="nf">fact</span> <span class="nv">fact</span><span class="p">))</span> <span class="nv">n</span><span class="p">))))</span>
<span class="p">(</span><span class="k">assert</span> <span class="p">(</span><span class="nf">=</span> <span class="p">(((</span><span class="nf">fact6</span> <span class="nv">fact-maker</span><span class="p">)</span> <span class="p">(</span><span class="nf">fact6</span> <span class="nv">fact-maker</span><span class="p">))</span> <span class="mi">5</span><span class="p">)</span> <span class="mi">120</span><span class="p">))</span>
</code></pre></div>
<p>Things have started to get cool. We've broken a complicated expression that doesn't
know how to do anything but make factorial functions into two simpler functions, of
which <code>fact-maker</code> defines the mathematics of a factorial, and <code>fact6</code> has nothing to
do with factorials and could potentially be used to make <em>anything-</em><code>maker</code> into a
recursive function.</p>
<p>Now let's make spruce things up a bit, so the person invoking this doesn't have to type
<code>fact6</code> twice.</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">fact7</span> <span class="p">[</span><span class="nv">f</span><span class="p">]</span>
<span class="p">(</span><span class="k">let </span><span class="p">[</span><span class="nv">g</span> <span class="p">(</span><span class="kd">fn </span><span class="p">[</span><span class="nv">fact</span><span class="p">]</span> <span class="p">(</span><span class="kd">fn </span><span class="p">[</span><span class="nv">n</span><span class="p">]</span> <span class="p">((</span><span class="nf">f</span> <span class="p">(</span><span class="nf">fact</span> <span class="nv">fact</span><span class="p">))</span> <span class="nv">n</span><span class="p">)))]</span> <span class="p">(</span><span class="nf">g</span> <span class="nv">g</span><span class="p">)))</span>
<span class="p">(</span><span class="k">assert</span> <span class="p">(</span><span class="nf">=</span> <span class="p">((</span><span class="nf">fact7</span> <span class="nv">fact-maker</span><span class="p">)</span> <span class="mi">5</span><span class="p">)</span> <span class="mi">120</span><span class="p">))</span>
</code></pre></div>
<p>And we'll make it look like more vanilla lisp, by getting rid of the <code>let</code>:</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">fact8</span> <span class="p">[</span><span class="nv">f</span><span class="p">]</span>
<span class="p">((</span><span class="kd">fn </span><span class="p">[</span><span class="nv">g</span><span class="p">]</span> <span class="p">(</span><span class="nf">g</span> <span class="nv">g</span><span class="p">))</span>
<span class="p">(</span><span class="kd">fn </span><span class="p">[</span><span class="nv">fact</span><span class="p">]</span> <span class="p">(</span><span class="kd">fn </span><span class="p">[</span><span class="nv">n</span><span class="p">]</span> <span class="p">((</span><span class="nf">f</span> <span class="p">(</span><span class="nf">fact</span> <span class="nv">fact</span><span class="p">))</span> <span class="nv">n</span><span class="p">)))))</span>
<span class="p">(</span><span class="k">assert</span> <span class="p">(</span><span class="nf">=</span> <span class="p">((</span><span class="nf">fact8</span> <span class="nv">fact-maker</span><span class="p">)</span> <span class="mi">5</span><span class="p">)</span> <span class="mi">120</span><span class="p">))</span>
</code></pre></div>
<p>Sensing victory, we now make the variable names short and pretty.</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nv">Y</span> <span class="p">[</span><span class="nv">f</span><span class="p">]</span>
<span class="p">((</span><span class="kd">fn </span><span class="p">[</span><span class="nv">g</span><span class="p">]</span> <span class="p">(</span><span class="nf">g</span> <span class="nv">g</span><span class="p">))</span>
<span class="p">(</span><span class="kd">fn </span><span class="p">[</span><span class="nv">h</span><span class="p">]</span> <span class="p">(</span><span class="kd">fn </span><span class="p">[</span><span class="nv">n</span><span class="p">]</span> <span class="p">((</span><span class="nf">f</span> <span class="p">(</span><span class="nf">h</span> <span class="nv">h</span><span class="p">))</span> <span class="nv">n</span><span class="p">)))))</span>
<span class="p">(</span><span class="k">assert</span> <span class="p">(</span><span class="nf">=</span> <span class="p">((</span><span class="nf">Y</span> <span class="nv">fact-maker</span><span class="p">)</span> <span class="mi">5</span><span class="p">)</span> <span class="mi">120</span><span class="p">))</span>
</code></pre></div>
<p>And we're done. Before getting philosophical, let's verify that our lovely
combinator can be used to make other recursive functions, irrespective of the
type of argument or return value. Here's an example using a function that
returns a list.</p>
<div class="highlight"><pre><span></span><code><span class="p">(</span><span class="kd">defn </span><span class="nb">range</span><span class="nv">-maker</span> <span class="p">[</span><span class="nv">f</span><span class="p">]</span> <span class="p">(</span><span class="kd">fn </span><span class="p">[</span><span class="nv">n</span><span class="p">]</span> <span class="p">(</span><span class="k">if</span> <span class="p">(</span><span class="nf">=</span> <span class="mi">0</span> <span class="nv">n</span><span class="p">)</span> <span class="p">()</span> <span class="p">(</span><span class="nf">conj</span> <span class="p">(</span><span class="nf">f</span> <span class="p">(</span><span class="nf">-</span> <span class="nv">n</span> <span class="mi">1</span><span class="p">))</span> <span class="nv">n</span><span class="p">))))</span>
<span class="p">(</span><span class="k">assert</span> <span class="p">(</span><span class="nf">=</span> <span class="p">((</span><span class="nf">Y</span> <span class="nb">range</span><span class="nv">-maker</span><span class="p">)</span> <span class="mi">5</span><span class="p">)</span> <span class="p">(</span><span class="nb">list</span> <span class="mi">5</span> <span class="mi">4</span> <span class="mi">3</span> <span class="mi">2</span> <span class="mi">1</span><span class="p">)))</span>
</code></pre></div>
<p>Remember that the point of <code>Y</code> is that it finds fixed points. In infix notion, it's
like we were able to vary a function <code>g</code> until</p>
<div class="highlight"><pre><span></span><code> <span class="ss">(</span><span class="nv">f</span><span class="ss">(</span><span class="nv">g</span><span class="ss">))(</span><span class="nv">x</span><span class="ss">)</span> <span class="o">=</span> <span class="nv">g</span><span class="ss">(</span><span class="nv">x</span><span class="ss">)</span> <span class="k">for</span> <span class="nv">all</span> <span class="nv">x</span>
</code></pre></div>
<p>with <code>Y(f)</code> being a shortcut to <code>g</code>. If that doesn't sound impressive enough, imagine
you had a machine that, given something you want but don't know how to make, will make
an exact copy, and then it somehow figures out how to make it without being told</p>
<p>To verify and emphasize that what we've got here is a fixed point, we can
explicitly pass the output of the combinator back into the maker function.</p>
<div class="highlight"><pre><span></span><code>(assert (= ((fact-maker (Y fact-maker)) 5) 120))
</code></pre></div>
<p>This might be magic.</p>Computer Chess is not just a bad movie, it is a despicable movie.2013-08-03T12:00:00-04:002013-08-03T12:00:00-04:00Peter Fraenkeltag:blog.podsnap.com,2013-08-03:/computer-chess.html<p>"Remember when you told me to tell you when you were acting rudely and
insensitively?"</p>
<p>First things first. It it is certainly a bad movie. That isn't in
itself so rare, of course, but it's nice to think that foul intent is
always accompanied by shoddy artistry (which requires momentarily
forgetting Leni Riefenstahl, but never mind). It would also be
comforting if hateful crap were technically incompetent as well, so
you tell immediately and find something better to do. Unfortunately,
technical excellence is so easy and cheap to attain these days that
it's no longer a reliable metric for anything …</p><p>"Remember when you told me to tell you when you were acting rudely and
insensitively?"</p>
<p>First things first. It it is certainly a bad movie. That isn't in
itself so rare, of course, but it's nice to think that foul intent is
always accompanied by shoddy artistry (which requires momentarily
forgetting Leni Riefenstahl, but never mind). It would also be
comforting if hateful crap were technically incompetent as well, so
you tell immediately and find something better to do. Unfortunately,
technical excellence is so easy and cheap to attain these days that
it's no longer a reliable metric for anything, but in this case, it
would have served you well. That sort of truthfulness in
storytelling is rare and, now that I think about, presents an
opportunity for kudos. Well done, Computer Chess! You win the prize
for consistent awfulness.</p>
<p>Andrew Bujalski could have filmed a technically competent movie on an
early generation iPhone, but he chose instead to make it look like
amateur VHS. That is, if your amateur VHS setup supported multiple
takes at different camera angles, smooth tracking dollies and high-end
zooming glass. There might have been a smidgen of wit in presenting
material from a bygone time using the technology of that same time,
but the substandard production values have nothing really to do with
epoch. What we have here is at the level of prepending the name of
your souvenir gimcrack shop with "Ye Olde."</p>
<p>Andrew Bujalski could also have composed a plot with some discernible
arc and maybe a few reasons to care about the outcome. Countless
4th-quarter, ninth-inning, even 18th hole nail-biters have established
that it's practically impossible to disengage an audience once some
sort of gladiatorial doings are afoot. Kudos, again, to computer
chess for, without even the excuse of obfuscating protective sports
gear, making its characters so interchangeable that rooting for one or
another would be literally impossible.</p>
<p>Andrew Bujalski could possibly have salvaged the combination of grunge
visuals and careful avoidance of human interest by emphasizing a
documentary angle. Starting with a lovely collection of ca. 1980
chassis and phosphor CRTs, he had an edge here, but proper
followthrough would have meant dropping the bit where one of the
computers begins to emit challenging (if adolescent) philosophical
ruminations and then flash up fetal ultrasounds. We get it, Andy,
someone you know saw 2001 and told you about it. That makes you
awesome.</p>
<p>Anyway, all that just makes it a bad movie. What makes it a
despicable movie - an <em>American</em> movie in all the ways that
pseudo-mumblecore wants most not to be - is the casual application of
the most simpleminded, anti-intellectual cliches to researchers who are
supposed to hail from the greatest computer science departments in the
world, during a generation that produced some of the most important
science in history.</p>
<p>Bujalski's computer programmers are defined entirely by their absence
of social skills, without even a hint of some compensatory life of the
mind. Even Chloe and Edgar on 24 get to chatter excitedly about
"avoiding the precompiled headers" and "hyperencrypted subchannels"
(n.b. I only made up one of those), hinting at something, which if not
quite enviable prowess, at least implies that they do something all
day besides remembering not to bathe. I'm not asking for a lecture on
minimax and the history of AI, though now that I think about it, the
biggest, dumbest, most sugar-sodden audience in the world sat still
while dinosaur cloning was explained to them, so surely an arthouse
crowd can be expected to endure just a wee dram of edification.
Failing that, it could turn out that these nerds are lovable yentas
(like Albert Einstein, as one's memory of intolerable movie trailers
reminds us) or help a more sympathetic kid to pass the big test so he
can enjoy a tastefully screened coming of age.</p>
<p>Let it be quite clear that my standards are low. I will admit
characterizations based wholly on daffy stereotypes, limned clumsily
with mispronounced technobabble. Pressed to the limit, I can accept
sheer nonsense - say, that computer programmers in the 1970s were fond
of cheese and said "yowsa" a lot - but there has to be something. You
can't define an entire class of humanity by their funny outfits and
bad hair. Scrap that, obviously you can. But it makes you an awful
person, and it makes your film despicable.</p>Medical Moneyball!!2013-07-21T12:00:00-04:002013-07-21T12:00:00-04:00Peter Fraenkeltag:blog.podsnap.com,2013-07-21:/startup-of-darkness.html<p><img src="./images/ayn.jpg" width=500></p>
<p>The story below changes or omits all the proper names, because that's the decent
thing to do, but it would be even more decent not to write this at all, as it
wouldn't be that difficult to figure out who these people are, and they don't
come off well. Balanced against my need to get all this off my chest, I'm
afraid I'm going to err on the side of selfishness.</p>
<p>Background, some of it anyway. I spent the last two decades at a
<a href="http://morganstanley.com">variety</a> of <a href="http://pragmatrading.com">financial</a>
<a href="http://ubs.com">institutions</a>. There have been interesting technical
challenges and smart people to work with …</p><p><img src="./images/ayn.jpg" width=500></p>
<p>The story below changes or omits all the proper names, because that's the decent
thing to do, but it would be even more decent not to write this at all, as it
wouldn't be that difficult to figure out who these people are, and they don't
come off well. Balanced against my need to get all this off my chest, I'm
afraid I'm going to err on the side of selfishness.</p>
<p>Background, some of it anyway. I spent the last two decades at a
<a href="http://morganstanley.com">variety</a> of <a href="http://pragmatrading.com">financial</a>
<a href="http://ubs.com">institutions</a>. There have been interesting technical
challenges and smart people to work with (especially early on), but over the
years an unfortunate pattern emerged. After a brief period of intellectual
excitement, one is drawn - ratcheted with audible viscous smacks, like
the beginnings of a death by quicksand - into managerial and procedural
purgatory. Despite the fact that the end result is stultifying boredom, the
mechanism by which this happens is not uninteresting... but it's also another
story. The summary version is that you agree incrementally to new
responsibilities and end up never getting to code or, it seems, to think.</p>
<p>Determined to end this sad pattern, I managed to engineer extrication from my
last engagement. If, hypothetically, contract law were relevant to that
extrication, then I would be contractually bound not to go into the details,
which might also be interesting, again despite the the topic itself being lack
of interest. I'm under no constraints whatsoever in discussing my current
activities, and anyway it's unlikely that <a href="http://hackerschool.com">Hacker School</a>
would object to being described as unbelievably marvelous. Essentially, I get
to program and study study programming all day, surrounded by people who
validate the importance of this self-indulgence. I'm happy.</p>
<p>The motto of Hacker School - <strong>never graduate</strong> - notwithstanding, most good
things come to an end, and I think this one does in around a month (I've refused
to look up the exact details). I will have to find gainful, respectable
employment.</p>
<p>With this in mind, I was not immediately dismissive when a former colleague, who
is such an unambiguously nice guy, so I won't even give him an incriminating
pseudonym, and his potential business partner, whom we'll call Danny, approached
me for a CTO role (whatever that means) in their incipient venture. Danny has
been associated with a venture capital company associated with a few well known
internet start-ups. You've heard of both the company and the start-ups; you've
probably been a "customer" of at least one of the latter, and an unwitting user
of another. A shallow internet search confirms Danny's direct responsibility
for these humdingers, though remarkable textual similarities in the results do
suggest the hand of a busy PR firm. More complex searches are ambiguous. For
example, Danny's real name and the name of the well-known CEOs of the these
companies never show up together in one article. Every time Danny's name is
mentioned, it's part of a friendly interview about his new VC/incubator project.</p>
<p>Danny describes his latest idea as "moneyball for medicine." This is not an
original formulation, and his use of it is inaccurate and silly enough that I
expect it to be dropped. If he doesn't have the sense to drop it, then I won't
feel bad about your ability to discover his identity through web searches. (For
the moment, the google won't help you.) The idea begins reasonably enough, with
an actual (but apparently not yet published) paper by real people at Johns
Hopkins that seems to demonstrate that, when doctors are told the cost of the
procedures they are about to order, the cost of treatment declines in a
statistically significant way, without a corresponding decline in the quality of
treatment. This is a powerful result, because it is unlikely to inspire whining
about "death panels" and because it could be implemented really cheaply. While
it's notoriously complicated to calculate the web of negotiated cash flows that
result from any medical event, it's relatively easy to estimate the economic
cost, e.g. whether or not insurance pays for the CT scan, we can still
approximate its by amortizing the device over its lifetime, factoring in
salaries of the technician, etc. If these actual costs are what we want to
reduce, and they're easy to calculate, then some large electronic health record
provider like Epic could just add an extra column to their existing systems and
be done with it. In other words, this result is compelling precisely because it
is not a good business model.</p>
<p>True says Danny, but these systems were all built decades ago, and they're
absolutely loathed by the physicians who use them because of their clunky
Windows-95y interfaces. In fact, says Danny, patients "literally bleed to
death" while doctors and nurses are fumbling with this ancient software. How
awful! "Is it fair to say our first deliverable would be a more ergonomic GUI?
Because GUIs are hard, and competing on the basis of your ergonomics is an
expensive game, requiring both programmers and a deep understanding of medical
workflow, and..." Danny interrupts me with a look that says, "too many notes."
These are hospitals, he says, so the doctors will really have to use whatever
system the administrators choose. OK... so it's putting the cost column on
somebody else's GUI? No, says Danny, going on to repeat the "literally bleeding"
bit and to say the worst thing he can think of about the existing systems, which
turns out to be that "it's like they hadn't heard of AJAX. That's what it's
called, right?" I'm confused, the skype connection is not great, and I might
have misunderstood something. "So," I ask, "is it fair to say that our first
deliverable would be a more ergonomic GUI? Because GUIs are hard, and..." No,
that's not the point at all. (I learn later that this sort of warble is called
pivoting.)</p>
<p>The point, it seems, is... MONEYBALL!! At this point we're sitting at a
monumental, heavily polished dining room table in a Long Island mansionette. I
call it that instead of just mansion, because I know that there are larger
single-family residences in existence, but in truth I've only seen such places
in movies and on guided tours. This one is owned by a cardio-thoracic surgeon
who not only has a lot of money but is quite sure that he deserves it - unless
the objectivist shrine of ornamentally bound Ayn Rand volumes in the entrance
foyer is meant ironically, which seems unlikely, or at any rate would be very
lonely for it. His wife clops in and out of Carmela Soprano's kitchen on skinny
heels, to ask which machine we would most like coffee from. Cardio-thoracic
man, whom we'll call Vince, seems an odd choice if medical cost savings are the
goal, though it later occurred to me that we could make a lot of headway simply
by taking back all of his money and using it on, say, measles vaccines. Vince
is more taken with the MONEYBALL!! angle. MONEYBALL means that we'll provide
"real-time, actionable information" and "quality of care metrics" to code
patients by cost-effectiveness. Danny illustrates this by taking from his
briefcase a pair of Google Glasses, which we spend about 15 minutes trying to
get to perform.</p>
<p>Despite the distraction (the glasses are bright red), it occurs to me to ask for
an example of a "metric," because I'm still having a hard time with this. A few
hours later, we get an answer - from a different doctor actually, who was
smarter and generally more sympathetic and so, following precedent, won't get a
nickname. It turns out that negative outcomes in heart surgery tend to occur
immediately. 1% of the patients die in the hospital, while the other 99 both
survive and experience the benefits they were supposed to derive from the
procedure. This differs from other sorts of medical intervention, where various
factors make the outcome a little murkier. Dead/not-dead is about as clean as
it gets, metric-wise. Isn't, I wonder aloud, standard error going to be a
problem? Danny looks perplexed, but quickly rallies, by recapping the plot of
Moneyball (the movie). Pointdexter (me) rudely interrupts with something about
$\sqrt(N)$ and the Poisson distribution. Sympathetic doctor begins to
understand. Danny rolls his eyes. Vince looks serious and fondles his phone.
After 100 surgeries, the expected mortality is one person... plus or minus one
person. You'd have to accumulate 10000 surgeries to start distinguishing
between surgeons who differed in skill by 10%, or to pass judgment on the
spending of surgeons whose patients ended up differing in cost by 10%. 10000
operations is about the number performed over the career of a cardio-thoracic
surgeon in a major hospital. So "real time" might mean telling a
surgeon how he did at his retirement party.</p>
<p>I never saw or read Moneyball, but I don't think the point was that, if one
wants hard enough to become fabulously successful by using statistics, one
inexorably will. (That was a different baseball movie, other than the part
about statistics.) I am, to be clear, pro-statistics. We will get an amazing
data set from mandatory electronic health records, and, given the vastness of
the data and the enormous coefficient by which you get to multiply even the
smallest improvements in outcome or cost-effectiveness, this is an area to which
considerable resources should be directed. That's what NIH grants are for.
It's not a startup.</p>
<p>Danny's take on the days proceedings was an email consisting primarily of this
sentence: "I would hope that you both will be able to get the ball rolling and
have an evolving conversation as we move forward."</p>
<p>My friend reminds me that people like Danny, who can tell a good story,
fervently and in the right plutocratic echelons, can be very useful when
starting a business. That's probably true. It may even be true, along the
lines
<a href="http://thinkexist.com/quotation/the_test_of_a_first-rate_intelligence_is_the/7656.html">suggested</a>
by F. Scott Fitzgerald, that the abiding of nonsense is a form of genius. I
don't need Danny to be a fraud to reach the conclusion that I must stay far away
from him. Evasion drives me crazy. Glib misuse of science enrages me. And,
I've discovered, I get physically ill if I go the work day without coding.</p>Varieties of laziness: clojure reducers, scala views and closure functors2013-07-16T12:00:00-04:002013-07-16T12:00:00-04:00Peter Fraenkeltag:blog.podsnap.com,2013-07-16:/clojure-reducers-scala-views-and-closure-functors.html<p><a href="http://clojure.com/blog/2012/05/08/reducers-a-library-and-model-for-collection-processing.html">This</a> is very cool.</p>
<p>(Thanks to <a href="http://swannodette.github.io/">David Nolen</a>, who pointed out errors in
the original.)</p>
<p>I hadn't realized that the standard higher order sequence functions compose in a manner
that requires quite a bit of run-time machinery. For example nested <code>map</code> calls will cause
a separate lazy sequence to be instantiated at each level of nesting, with each level
"pulling" results from the next innermost as needed. The function calls are thus temporally
interleaved, but the functions are not actually composed. We can see this happening by
adding some <code>println</code>s to some unary functions and then running a nested …</p><p><a href="http://clojure.com/blog/2012/05/08/reducers-a-library-and-model-for-collection-processing.html">This</a> is very cool.</p>
<p>(Thanks to <a href="http://swannodette.github.io/">David Nolen</a>, who pointed out errors in
the original.)</p>
<p>I hadn't realized that the standard higher order sequence functions compose in a manner
that requires quite a bit of run-time machinery. For example nested <code>map</code> calls will cause
a separate lazy sequence to be instantiated at each level of nesting, with each level
"pulling" results from the next innermost as needed. The function calls are thus temporally
interleaved, but the functions are not actually composed. We can see this happening by
adding some <code>println</code>s to some unary functions and then running a nested <code>map</code> over vector
data. Since vectors are chunked in 32-element blocks, the function calls will be
interleaved in 32-element blocks as well (or for less than 32 elements, not interleaved
at all):</p>
<div class="highlight"><pre><span></span><code><span class="nv">user></span> <span class="p">(</span><span class="kd">defn </span><span class="nv">princ</span> <span class="p">[</span><span class="nv">i</span><span class="p">]</span> <span class="p">(</span><span class="k">do </span><span class="p">(</span><span class="nb">println </span><span class="s">"incrementing "</span> <span class="nv">i</span><span class="p">)</span> <span class="p">(</span><span class="nb">inc </span><span class="nv">i</span><span class="p">)))</span>
<span class="nv">user></span> <span class="p">(</span><span class="kd">defn </span><span class="nv">prud</span> <span class="p">[</span><span class="nv">i</span><span class="p">]</span> <span class="p">(</span><span class="k">do </span><span class="p">(</span><span class="nb">println </span><span class="s">"doubling "</span> <span class="nv">i</span><span class="p">))</span> <span class="p">(</span><span class="nb">* </span><span class="mi">2</span> <span class="nv">i</span><span class="p">))</span>
<span class="nv">user></span> <span class="p">(</span><span class="k">def </span><span class="nv">foo</span> <span class="p">(</span><span class="nb">map </span><span class="nv">princ</span> <span class="p">[</span><span class="mi">1</span> <span class="mi">2</span> <span class="mi">3</span> <span class="mi">4</span><span class="p">]))</span>
<span class="o">#</span><span class="ss">'user/foo</span>
<span class="nv">user></span> <span class="p">(</span><span class="nb">take </span><span class="mi">2</span> <span class="nv">foo</span><span class="p">)</span>
<span class="p">(</span><span class="nf">incrementing</span> <span class="mi">1</span>
<span class="nv">incrementing</span> <span class="mi">2</span>
<span class="nv">incrementing</span> <span class="mi">3</span>
<span class="nv">incrementing</span> <span class="mi">4</span>
<span class="mi">2</span> <span class="mi">3</span><span class="p">)</span>
<span class="nv">user></span> <span class="p">(</span><span class="k">def </span><span class="nv">foo</span> <span class="p">(</span><span class="nb">map </span><span class="nv">princ</span> <span class="p">(</span><span class="nb">map </span><span class="nv">prud</span> <span class="p">[</span><span class="mi">1</span> <span class="mi">2</span> <span class="mi">3</span> <span class="mi">4</span><span class="p">])))</span>
<span class="o">#</span><span class="ss">'user/foo</span>
<span class="nv">user></span> <span class="p">(</span><span class="nb">take </span><span class="mi">2</span> <span class="nv">foo</span><span class="p">)</span>
<span class="p">(</span><span class="nf">doubling</span> <span class="mi">1</span>
<span class="nv">doubling</span> <span class="mi">2</span>
<span class="nv">doubling</span> <span class="mi">3</span>
<span class="nv">doubling</span> <span class="mi">4</span>
<span class="nv">incrementing</span> <span class="mi">2</span>
<span class="nv">incrementing</span> <span class="mi">4</span>
<span class="nv">incrementing</span> <span class="mi">6</span>
<span class="nv">incrementing</span> <span class="mi">8</span>
<span class="mi">3</span> <span class="mi">5</span><span class="p">)</span>
<span class="nv">user></span>
</code></pre></div>
<p>If you increase the range to more than 32, you'll see blocks of 32 interleaved (try it
yourself or trust me).</p>
<p>Reducers actually do what you want:</p>
<div class="highlight"><pre><span></span><code><span class="nv">user></span> <span class="p">(</span><span class="k">def </span><span class="nv">foo</span> <span class="p">(</span><span class="nf">r/map</span> <span class="nv">princ</span> <span class="p">(</span><span class="nf">r/map</span> <span class="nv">prud</span> <span class="p">[</span><span class="mi">1</span> <span class="mi">2</span> <span class="mi">3</span> <span class="mi">4</span><span class="p">])))</span>
<span class="o">#</span><span class="ss">'user/foo</span>
<span class="nv">user></span> <span class="p">(</span><span class="nb">reduce str </span><span class="p">(</span><span class="nf">r/take</span> <span class="mi">2</span> <span class="nv">foo</span><span class="p">))</span>
<span class="nv">doubling</span> <span class="mi">1</span>
<span class="nv">incrementing</span> <span class="mi">2</span>
<span class="nv">doubling</span> <span class="mi">2</span>
<span class="nv">incrementing</span> <span class="mi">4</span>
<span class="nv">doubling</span> <span class="mi">3</span>
<span class="nv">incrementing</span> <span class="mi">6</span>
<span class="s">"35"</span>
<span class="nv">user></span>
</code></pre></div>
<p>The functions in standard scala are even worse, actually instantiating non-lazy collection
at every step.</p>
<div class="highlight"><pre><span></span><code><span class="n">scala</span><span class="o">></span> <span class="p">(</span><span class="mi">1</span> <span class="n">to</span> <span class="mi">5</span><span class="p">).</span><span class="n">map</span><span class="p">(</span><span class="n">x</span> <span class="o">=></span> <span class="p">{</span><span class="n">println</span><span class="p">(</span><span class="s">"doubling "</span><span class="o">+</span><span class="n">x</span><span class="p">);</span> <span class="mi">2</span><span class="o">*</span><span class="n">x</span><span class="p">}).</span><span class="n">map</span><span class="p">(</span><span class="n">x</span><span class="o">=></span><span class="p">{</span><span class="n">println</span><span class="p">(</span><span class="s">"incrementing "</span><span class="o">+</span><span class="n">x</span><span class="p">);</span><span class="n">x</span><span class="o">+</span><span class="mi">1</span><span class="p">}).</span><span class="n">take</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span>
<span class="n">doubling</span> <span class="mi">1</span>
<span class="n">doubling</span> <span class="mi">2</span>
<span class="n">doubling</span> <span class="mi">3</span>
<span class="n">doubling</span> <span class="mi">4</span>
<span class="n">doubling</span> <span class="mi">5</span>
<span class="n">incrementing</span> <span class="mi">2</span>
<span class="n">incrementing</span> <span class="mi">4</span>
<span class="n">incrementing</span> <span class="mi">6</span>
<span class="n">incrementing</span> <span class="mi">8</span>
<span class="n">incrementing</span> <span class="mi">10</span>
<span class="n">res17</span><span class="p">:</span> <span class="n">scala</span><span class="p">.</span><span class="n">collection</span><span class="p">.</span><span class="n">immutable</span><span class="p">.</span><span class="nc">IndexedSeq</span><span class="p">[</span><span class="nc">Int</span><span class="p">]</span> <span class="o">=</span> <span class="nc">Vector</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">5</span><span class="p">)</span>
</code></pre></div>
<p>If you increase the range to something large, you'll still see doubling and incrementing
occuring in separate blocks.</p>
<p>Since 2.8, scala has provided an easy way to create lazy "views," nested mapping over which
produces the expected interleaving. Scala doesn't chunk any of its collections, so we
can't actually distinguish yet between interleaving and composition. But wait.</p>
<div class="highlight"><pre><span></span><code><span class="n">scala</span><span class="o">></span> <span class="p">(</span><span class="mi">1</span> <span class="n">to</span> <span class="mi">5</span><span class="p">).</span><span class="n">view</span><span class="p">.</span><span class="n">map</span><span class="p">(</span><span class="n">x</span> <span class="o">=></span> <span class="p">{</span><span class="n">println</span><span class="p">(</span><span class="s">"doubling "</span><span class="o">+</span><span class="n">x</span><span class="p">);</span> <span class="mi">2</span><span class="o">*</span><span class="n">x</span><span class="p">}).</span><span class="n">map</span><span class="p">(</span><span class="n">x</span><span class="o">=></span><span class="p">{</span><span class="n">println</span><span class="p">(</span><span class="s">"incrementing "</span><span class="o">+</span><span class="n">x</span><span class="p">);</span><span class="n">x</span><span class="o">+</span><span class="mi">1</span><span class="p">}).</span><span class="n">take</span><span class="p">(</span><span class="mi">2</span><span class="p">).</span><span class="n">force</span>
<span class="n">doubling</span> <span class="mi">1</span>
<span class="n">incrementing</span> <span class="mi">2</span>
<span class="n">doubling</span> <span class="mi">2</span>
<span class="n">incrementing</span> <span class="mi">4</span>
<span class="n">res18</span><span class="p">:</span> <span class="nc">Seq</span><span class="p">[</span><span class="nc">Int</span><span class="p">]</span> <span class="o">=</span> <span class="nc">Vector</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">5</span><span class="p">)</span>
</code></pre></div>
<p>Since 2.9, scala has offered what they call parallel collections, which are created
from normal collections using the <code>par</code> method, and over which <code>map</code> at all can operate
in parallel over multiple threads. Rich Hickey <a href="http://clojure.com/blog/2012/05/15/anatomy-of-reducer.html">intends</a> parallelize these operations in clojure using the reducer framework
with its composed functions, but scala didn't do it this way:
<code>par</code>allelizing a collection does cause it to be processed in parallel (as you
can see from the thread names), but sequential
operations are not composed and full intermediate sequences are built:</p>
<div class="highlight"><pre><span></span><code><span class="n">scala</span><span class="o">></span> <span class="k">def</span> <span class="nf">tgn</span> <span class="o">=</span> <span class="nc">Thread</span><span class="p">.</span><span class="n">currentThread</span><span class="p">.</span><span class="n">getName</span>
<span class="n">scala</span><span class="o">></span> <span class="kd">val</span> <span class="n">foo</span> <span class="o">=</span> <span class="p">(</span><span class="mi">1</span> <span class="n">to</span> <span class="mi">5</span><span class="p">).</span><span class="n">par</span><span class="p">.</span><span class="n">map</span><span class="p">(</span><span class="n">x</span> <span class="o">=></span> <span class="p">{</span><span class="n">println</span><span class="p">(</span><span class="n">tgn</span><span class="o">+</span><span class="s">" doubling "</span><span class="o">+</span><span class="n">x</span><span class="p">);</span> <span class="mi">2</span><span class="o">*</span><span class="n">x</span><span class="p">}).</span><span class="n">map</span><span class="p">(</span><span class="n">x</span><span class="o">=></span><span class="p">{</span><span class="n">println</span><span class="p">(</span><span class="n">tgn</span><span class="o">+</span><span class="s">" incrementing "</span><span class="o">+</span><span class="n">x</span><span class="p">);</span><span class="n">x</span><span class="o">+</span><span class="mi">1</span><span class="p">}).</span><span class="n">take</span><span class="p">(</span><span class="mi">3</span><span class="p">)</span>
<span class="nc">ForkJoinPool</span><span class="o">-</span><span class="mi">1</span><span class="o">-</span><span class="n">worker</span><span class="o">-</span><span class="mi">1</span> <span class="n">doubling</span> <span class="mi">1</span>
<span class="nc">ForkJoinPool</span><span class="o">-</span><span class="mi">1</span><span class="o">-</span><span class="n">worker</span><span class="o">-</span><span class="mi">7</span> <span class="n">doubling</span> <span class="mi">3</span>
<span class="nc">ForkJoinPool</span><span class="o">-</span><span class="mi">1</span><span class="o">-</span><span class="n">worker</span><span class="o">-</span><span class="mi">1</span> <span class="n">doubling</span> <span class="mi">2</span>
<span class="nc">ForkJoinPool</span><span class="o">-</span><span class="mi">1</span><span class="o">-</span><span class="n">worker</span><span class="o">-</span><span class="mi">5</span> <span class="n">doubling</span> <span class="mi">4</span>
<span class="nc">ForkJoinPool</span><span class="o">-</span><span class="mi">1</span><span class="o">-</span><span class="n">worker</span><span class="o">-</span><span class="mi">3</span> <span class="n">doubling</span> <span class="mi">5</span>
<span class="nc">ForkJoinPool</span><span class="o">-</span><span class="mi">1</span><span class="o">-</span><span class="n">worker</span><span class="o">-</span><span class="mi">7</span> <span class="n">incrementing</span> <span class="mi">4</span>
<span class="nc">ForkJoinPool</span><span class="o">-</span><span class="mi">1</span><span class="o">-</span><span class="n">worker</span><span class="o">-</span><span class="mi">5</span> <span class="n">incrementing</span> <span class="mi">8</span>
<span class="nc">ForkJoinPool</span><span class="o">-</span><span class="mi">1</span><span class="o">-</span><span class="n">worker</span><span class="o">-</span><span class="mi">7</span> <span class="n">incrementing</span> <span class="mi">10</span>
<span class="nc">ForkJoinPool</span><span class="o">-</span><span class="mi">1</span><span class="o">-</span><span class="n">worker</span><span class="o">-</span><span class="mi">1</span> <span class="n">incrementing</span> <span class="mi">2</span>
<span class="nc">ForkJoinPool</span><span class="o">-</span><span class="mi">1</span><span class="o">-</span><span class="n">worker</span><span class="o">-</span><span class="mi">3</span> <span class="n">incrementing</span> <span class="mi">6</span>
<span class="n">foo</span><span class="p">:</span> <span class="n">scala</span><span class="p">.</span><span class="n">collection</span><span class="p">.</span><span class="n">parallel</span><span class="p">.</span><span class="n">immutable</span><span class="p">.</span><span class="nc">ParSeq</span><span class="p">[</span><span class="nc">Int</span><span class="p">]</span> <span class="o">=</span> <span class="nc">ParVector</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">7</span><span class="p">)</span>
</code></pre></div>
<p>What's more, <code>par</code> un<code>view</code>ifies collections:</p>
<div class="highlight"><pre><span></span><code><span class="n">scala</span><span class="o">></span> <span class="p">(</span><span class="mi">1</span> <span class="n">to</span> <span class="mi">5</span><span class="p">).</span><span class="n">view</span>
<span class="n">res15</span><span class="p">:</span> <span class="n">scala</span><span class="p">.</span><span class="n">collection</span><span class="p">.</span><span class="nc">SeqView</span><span class="p">[</span><span class="nc">Int</span><span class="p">,</span><span class="n">scala</span><span class="p">.</span><span class="n">collection</span><span class="p">.</span><span class="n">immutable</span><span class="p">.</span><span class="nc">IndexedSeq</span><span class="p">[</span><span class="nc">Int</span><span class="p">]]</span> <span class="o">=</span> <span class="nc">SeqView</span><span class="p">(...)</span>
<span class="n">scala</span><span class="o">></span> <span class="p">(</span><span class="mi">1</span> <span class="n">to</span> <span class="mi">5</span><span class="p">).</span><span class="n">view</span><span class="p">.</span><span class="n">par</span>
<span class="n">res16</span><span class="p">:</span> <span class="n">scala</span><span class="p">.</span><span class="n">collection</span><span class="p">.</span><span class="n">parallel</span><span class="p">.</span><span class="nc">ParSeq</span><span class="p">[</span><span class="nc">Int</span><span class="p">]</span> <span class="o">=</span> <span class="nc">ParArray</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">)</span>
</code></pre></div>
<p>So there seems to be no way to have parallelism with any semblance of laziness, let
alone composition.</p>
<div class="highlight"><pre><span></span><code><span class="n">scala</span><span class="o">></span> <span class="kd">val</span> <span class="n">foo</span> <span class="o">=</span> <span class="p">(</span><span class="mi">1</span> <span class="n">to</span> <span class="mi">5</span><span class="p">).</span><span class="n">view</span><span class="p">.</span><span class="n">par</span><span class="p">.</span><span class="n">map</span><span class="p">(</span><span class="n">x</span> <span class="o">=></span> <span class="p">{</span><span class="n">println</span><span class="p">(</span><span class="n">tgn</span><span class="o">+</span><span class="s">" doubling "</span><span class="o">+</span><span class="n">x</span><span class="p">);</span> <span class="mi">2</span><span class="o">*</span><span class="n">x</span><span class="p">}).</span><span class="n">map</span><span class="p">(</span><span class="n">x</span><span class="o">=></span><span class="p">{</span><span class="n">println</span><span class="p">(</span><span class="n">tgn</span><span class="o">+</span><span class="s">" incrementing "</span><span class="o">+</span><span class="n">x</span><span class="p">);</span><span class="n">x</span><span class="o">+</span><span class="mi">1</span><span class="p">}).</span><span class="n">take</span><span class="p">(</span><span class="mi">3</span><span class="p">)</span>
<span class="nc">ForkJoinPool</span><span class="o">-</span><span class="mi">1</span><span class="o">-</span><span class="n">worker</span><span class="o">-</span><span class="mi">7</span> <span class="n">doubling</span> <span class="mi">1</span>
<span class="nc">ForkJoinPool</span><span class="o">-</span><span class="mi">1</span><span class="o">-</span><span class="n">worker</span><span class="o">-</span><span class="mi">5</span> <span class="n">doubling</span> <span class="mi">3</span>
<span class="nc">ForkJoinPool</span><span class="o">-</span><span class="mi">1</span><span class="o">-</span><span class="n">worker</span><span class="o">-</span><span class="mi">1</span> <span class="n">doubling</span> <span class="mi">2</span>
<span class="nc">ForkJoinPool</span><span class="o">-</span><span class="mi">1</span><span class="o">-</span><span class="n">worker</span><span class="o">-</span><span class="mi">5</span> <span class="n">doubling</span> <span class="mi">5</span>
<span class="nc">ForkJoinPool</span><span class="o">-</span><span class="mi">1</span><span class="o">-</span><span class="n">worker</span><span class="o">-</span><span class="mi">3</span> <span class="n">doubling</span> <span class="mi">4</span>
<span class="nc">ForkJoinPool</span><span class="o">-</span><span class="mi">1</span><span class="o">-</span><span class="n">worker</span><span class="o">-</span><span class="mi">5</span> <span class="n">incrementing</span> <span class="mi">6</span>
<span class="nc">ForkJoinPool</span><span class="o">-</span><span class="mi">1</span><span class="o">-</span><span class="n">worker</span><span class="o">-</span><span class="mi">1</span> <span class="n">incrementing</span> <span class="mi">8</span>
<span class="nc">ForkJoinPool</span><span class="o">-</span><span class="mi">1</span><span class="o">-</span><span class="n">worker</span><span class="o">-</span><span class="mi">3</span> <span class="n">incrementing</span> <span class="mi">4</span>
<span class="nc">ForkJoinPool</span><span class="o">-</span><span class="mi">1</span><span class="o">-</span><span class="n">worker</span><span class="o">-</span><span class="mi">7</span> <span class="n">incrementing</span> <span class="mi">2</span>
<span class="nc">ForkJoinPool</span><span class="o">-</span><span class="mi">1</span><span class="o">-</span><span class="n">worker</span><span class="o">-</span><span class="mi">1</span> <span class="n">incrementing</span> <span class="mi">10</span>
<span class="n">foo</span><span class="p">:</span> <span class="n">scala</span><span class="p">.</span><span class="n">collection</span><span class="p">.</span><span class="n">parallel</span><span class="p">.</span><span class="nc">ParSeq</span><span class="p">[</span><span class="nc">Int</span><span class="p">]</span> <span class="o">=</span> <span class="nc">ParArray</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">7</span><span class="p">)</span>
</code></pre></div>
<p><a href="./futures-on-functors-of-function-functors-for-fun-and-function.html">A month</a> ago,
before learning about <code>view</code>s, I tried to roll my own lazily composeable
collections with something like the following:</p>
<div class="highlight"><pre><span></span><code><span class="k">class</span> <span class="nc">SeqF</span><span class="p">[</span><span class="nc">A</span><span class="p">]</span> <span class="p">(</span><span class="n">l</span> <span class="p">:</span> <span class="nc">Seq</span><span class="p">[()</span><span class="o">=></span><span class="nc">A</span><span class="p">])</span> <span class="p">{</span>
<span class="k">def</span> <span class="nf">map</span><span class="p">[</span><span class="nc">B</span><span class="p">](</span><span class="n">f</span> <span class="p">:</span> <span class="nc">A</span> <span class="o">=></span><span class="nc">B</span><span class="p">)</span> <span class="o">=</span> <span class="k">new</span> <span class="nc">SeqF</span><span class="p">(</span><span class="n">l</span><span class="p">.</span><span class="n">map</span><span class="p">(</span> <span class="n">x</span> <span class="o">=></span> <span class="p">()</span> <span class="o">=></span> <span class="n">f</span><span class="p">(</span><span class="n">x</span><span class="p">())))</span>
<span class="k">def</span> <span class="nf">take</span><span class="p">(</span><span class="n">n</span> <span class="p">:</span> <span class="nc">Int</span><span class="p">)</span> <span class="o">=</span> <span class="k">new</span> <span class="nc">SeqF</span><span class="p">(</span><span class="n">l</span><span class="p">.</span><span class="n">take</span><span class="p">(</span><span class="n">n</span><span class="p">))</span>
<span class="k">def</span> <span class="nf">zipWithIndex</span> <span class="o">=</span> <span class="k">new</span> <span class="nc">SeqF</span><span class="p">(</span> <span class="n">l</span><span class="p">.</span><span class="n">zipWithIndex</span><span class="p">.</span><span class="n">map</span><span class="p">(</span> <span class="n">x</span> <span class="o">=></span> <span class="p">()</span> <span class="o">=></span> <span class="p">(</span><span class="n">x</span><span class="p">.</span><span class="n">_1</span><span class="p">(),</span><span class="n">x</span><span class="p">.</span><span class="n">_2</span><span class="p">)</span> <span class="p">)</span> <span class="p">)</span>
<span class="k">def</span> <span class="nf">scanLeft</span><span class="p">[</span><span class="nc">B</span><span class="p">](</span><span class="n">b0</span> <span class="p">:</span> <span class="nc">B</span><span class="p">)(</span><span class="n">f</span> <span class="p">:</span> <span class="p">(</span><span class="nc">B</span><span class="p">,</span><span class="nc">A</span><span class="p">)</span><span class="o">=></span><span class="nc">B</span><span class="p">)</span> <span class="p">:</span> <span class="nc">SeqF</span><span class="p">[</span><span class="nc">B</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span>
<span class="kd">val</span> <span class="n">l2</span> <span class="o">=</span> <span class="n">l</span><span class="p">.</span><span class="n">scanLeft</span><span class="p">(</span> <span class="p">()</span><span class="o">=></span><span class="n">b0</span> <span class="p">)(</span> <span class="p">(</span><span class="n">b</span><span class="p">,</span><span class="n">a</span><span class="p">)</span> <span class="o">=></span> <span class="p">()</span> <span class="o">=></span> <span class="n">f</span><span class="p">(</span><span class="n">b</span><span class="p">(),</span><span class="n">a</span><span class="p">())</span> <span class="p">)</span>
<span class="k">new</span> <span class="nc">SeqF</span><span class="p">[</span><span class="nc">B</span><span class="p">](</span><span class="n">l2</span><span class="p">)</span>
<span class="p">}</span>
<span class="c1">// (Define the rest of the seq operations here!)</span>
<span class="k">def</span> <span class="nf">values</span> <span class="o">=</span> <span class="n">l</span><span class="p">.</span><span class="n">map</span><span class="p">(</span><span class="n">_</span><span class="p">())</span> <span class="c1">// execute the composed functions</span>
<span class="k">def</span> <span class="nf">parvalues</span> <span class="o">=</span> <span class="n">l</span><span class="p">.</span><span class="n">par</span><span class="p">.</span><span class="n">map</span><span class="p">(</span><span class="n">_</span><span class="p">())</span> <span class="c1">// do so in parallel</span>
<span class="p">}</span>
<span class="k">object</span> <span class="nc">SeqF</span> <span class="p">{</span>
<span class="k">class</span> <span class="nc">SeqFMaker</span><span class="p">[</span><span class="nc">A</span><span class="p">](</span><span class="n">l</span> <span class="p">:</span> <span class="nc">Seq</span><span class="p">[</span><span class="nc">A</span><span class="p">])</span> <span class="p">{</span>
<span class="kd">val</span> <span class="n">it</span> <span class="o">=</span> <span class="n">l</span><span class="p">.</span><span class="n">iterator</span>
<span class="k">def</span> <span class="nf">wrapf</span><span class="p">()</span> <span class="o">=</span> <span class="k">new</span> <span class="nc">SeqF</span><span class="p">(</span><span class="n">l</span><span class="p">.</span><span class="n">map</span><span class="p">(</span> <span class="n">x</span> <span class="o">=></span> <span class="p">()</span> <span class="o">=></span> <span class="n">x</span><span class="p">)</span> <span class="p">)</span>
<span class="p">}</span>
<span class="k">implicit</span> <span class="k">def</span> <span class="nf">SeqSeqFMaker</span><span class="p">[</span><span class="nc">A</span><span class="p">](</span><span class="n">l</span> <span class="p">:</span> <span class="nc">Seq</span><span class="p">[</span><span class="nc">A</span><span class="p">])</span> <span class="o">=</span> <span class="k">new</span> <span class="nc">SeqFMaker</span><span class="p">[</span><span class="nc">A</span><span class="p">](</span><span class="n">l</span><span class="p">)</span>
<span class="p">}</span>
<span class="k">import</span> <span class="nc">SeqF</span><span class="p">.</span><span class="n">_</span>
</code></pre></div>
<p>A <code>SeqF</code> holds a vector of functions of <code>Unit</code>, and the various higher order methods
create new <code>SeqF</code>s whose functions compose with the originals.
(The F is for functor, of which this is an example, though containing values in
a closure is conceptually more complicated than doing so with a direct reference.)
There's a <a href="http://www.artima.com/weblogs/viewpost.jsp?thread=179766">pimpy</a> companion
object that enables the <code>wrap</code> method to create a <code>SeqF</code>, which has a <code>values</code> method
to execute all the functions and produce a normal collection. I can add a new method,
<code>parvalues</code>, which converts the vector of closures into a parallel collection before
execution. And it works!</p>
<div class="highlight"><pre><span></span><code><span class="n">scala</span><span class="o">></span> <span class="kd">val</span> <span class="n">foo</span> <span class="o">=</span> <span class="p">(</span><span class="mi">1</span> <span class="n">to</span> <span class="mi">100</span><span class="p">).</span><span class="n">toVector</span><span class="p">.</span><span class="n">wrapf</span><span class="p">.</span><span class="n">map</span><span class="p">(</span><span class="n">x</span> <span class="o">=></span> <span class="p">{</span><span class="n">println</span><span class="p">(</span><span class="n">tgn</span><span class="o">+</span><span class="s">" doubling "</span><span class="o">+</span><span class="n">x</span><span class="p">);</span> <span class="mi">2</span><span class="o">*</span><span class="n">x</span><span class="p">}).</span><span class="n">map</span><span class="p">(</span><span class="n">x</span><span class="o">=></span><span class="p">{</span><span class="n">println</span><span class="p">(</span><span class="n">tgn</span><span class="o">+</span><span class="s">" incrementing "</span><span class="o">+</span><span class="n">x</span><span class="p">);</span><span class="n">x</span><span class="o">+</span><span class="mi">1</span><span class="p">}).</span><span class="n">take</span><span class="p">(</span><span class="mi">5</span><span class="p">).</span><span class="n">parvalues</span>
<span class="nc">ForkJoinPool</span><span class="o">-</span><span class="mi">1</span><span class="o">-</span><span class="n">worker</span><span class="o">-</span><span class="mi">3</span> <span class="n">doubling</span> <span class="mi">1</span>
<span class="nc">ForkJoinPool</span><span class="o">-</span><span class="mi">1</span><span class="o">-</span><span class="n">worker</span><span class="o">-</span><span class="mi">3</span> <span class="n">incrementing</span> <span class="mi">2</span>
<span class="nc">ForkJoinPool</span><span class="o">-</span><span class="mi">1</span><span class="o">-</span><span class="n">worker</span><span class="o">-</span><span class="mi">7</span> <span class="n">doubling</span> <span class="mi">2</span>
<span class="nc">ForkJoinPool</span><span class="o">-</span><span class="mi">1</span><span class="o">-</span><span class="n">worker</span><span class="o">-</span><span class="mi">7</span> <span class="n">incrementing</span> <span class="mi">4</span>
<span class="nc">ForkJoinPool</span><span class="o">-</span><span class="mi">1</span><span class="o">-</span><span class="n">worker</span><span class="o">-</span><span class="mi">1</span> <span class="n">doubling</span> <span class="mi">3</span>
<span class="nc">ForkJoinPool</span><span class="o">-</span><span class="mi">1</span><span class="o">-</span><span class="n">worker</span><span class="o">-</span><span class="mi">1</span> <span class="n">incrementing</span> <span class="mi">6</span>
<span class="nc">ForkJoinPool</span><span class="o">-</span><span class="mi">1</span><span class="o">-</span><span class="n">worker</span><span class="o">-</span><span class="mi">7</span> <span class="n">doubling</span> <span class="mi">4</span>
<span class="nc">ForkJoinPool</span><span class="o">-</span><span class="mi">1</span><span class="o">-</span><span class="n">worker</span><span class="o">-</span><span class="mi">5</span> <span class="n">doubling</span> <span class="mi">5</span>
<span class="nc">ForkJoinPool</span><span class="o">-</span><span class="mi">1</span><span class="o">-</span><span class="n">worker</span><span class="o">-</span><span class="mi">5</span> <span class="n">incrementing</span> <span class="mi">10</span>
<span class="nc">ForkJoinPool</span><span class="o">-</span><span class="mi">1</span><span class="o">-</span><span class="n">worker</span><span class="o">-</span><span class="mi">7</span> <span class="n">incrementing</span> <span class="mi">8</span>
<span class="n">foo</span><span class="p">:</span> <span class="n">scala</span><span class="p">.</span><span class="n">collection</span><span class="p">.</span><span class="n">parallel</span><span class="p">.</span><span class="nc">ParSeq</span><span class="p">[</span><span class="nc">Int</span><span class="p">]</span> <span class="o">=</span> <span class="nc">ParVector</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">7</span><span class="p">,</span> <span class="mi">9</span><span class="p">,</span> <span class="mi">11</span><span class="p">)</span>
</code></pre></div>
<p>If the code doesn't convince you, there's additional evidence of composition from the
threads in which each calculation is taking place. Note how the first element
is processed completely in worker-3, the second and fourth in worker-7, the third in
worker-1 and the 5th in worker 8; in all cases both doubling and incrementing take
place in the same thread, which would be improbable if the functions hadn't been composed.</p>
<p>We can now enumerate several kinds of laziness when nesting maps:</p>
<ol>
<li>Intermediate collections are fully instantiated, and the unary functions are executed
in blocks. This occurs with standard scala collections.</li>
<li>Intermediate collections are instantiated as lazy sequences, so the
unary functions will be interleaved
(modulo any chunking optimizations in the specific collection). While technically
lazy, the computer is actually hauling quite a lot of data between the intermediate
sequences. It happens this way with scala views and standard clojure collections.</li>
<li>The unary functions are composed into one, collapsing the nested operation into
a single map, the output of which will be instantiated lazily, as needed. The clojure
reducers package works this way.</li>
<li>The unary functions are composed into one, collapsing the nested operation into
a single map, from which a single instantiated vector of closures is created. My
SeqF class operates in this fashion. This may be somewhat less lazy than 3, but
maybe more amenable to parallelization.</li>
</ol>futures on functors of function functors for fun and function2013-06-16T16:51:00-04:002013-06-16T16:51:00-04:00Peter Fraenkeltag:blog.podsnap.com,2013-06-16:/futures-on-functors-of-function-functors-for-fun-and-function.html<p>Here's a brief presentation on using closures and futures to do background processing:</p>
<iframe src="http://blog.podsnap.com/extra/reveal/func.html" height=400 width=800></iframe>
<p>And <a href="http://github.com/pnf/flip">here</a>'s the code on github.</p>Three Laws of Functor Confusion2013-06-14T16:51:00-04:002013-06-14T16:51:00-04:00Peter Fraenkeltag:blog.podsnap.com,2013-06-14:/three-laws-of-functor-confusion.html<p>There's a conspiracy to prevent you from understanding functors, and the
here are some of the tools used to keep you in the dark:</p>
<ol>
<li>In explanations about functional programming, authors blithely use <code>f</code> to mean either a function or an object for which there's a functor, e.g.</li>
</ol>
<div class="highlight"><pre><span></span><code> <span class="n">f</span> <span class="o">::</span> <span class="p">(</span><span class="n">a</span> <span class="o">-></span> <span class="n">b</span> <span class="o">-></span> <span class="n">c</span><span class="p">)</span>
<span class="n">fmap</span> <span class="o">::</span> <span class="nc">Functor</span> <span class="n">f</span> <span class="o">=></span> <span class="p">(</span><span class="n">d</span> <span class="o">-></span> <span class="n">e</span><span class="p">)</span> <span class="o">-></span> <span class="n">f</span> <span class="n">d</span> <span class="o">-></span> <span class="n">f</span> <span class="n">e</span>
<span class="n">fmap</span> <span class="n">f</span> <span class="o">::</span> <span class="nc">Functor</span> <span class="n">f</span> <span class="o">=></span> <span class="n">f</span> <span class="n">a</span> <span class="o">-></span> <span class="n">f</span> <span class="p">(</span><span class="n">b</span> <span class="o">-></span> <span class="n">c</span><span class="p">)</span> <span class="o">--</span> <span class="nc">Identify</span> <span class="n">d</span> <span class="k">with</span> <span class="n">a</span><span class="p">,</span> <span class="n">and</span> <span class="n">e</span> <span class="k">with</span> <span class="p">(</span><span class="n">b</span> <span class="o">-></span> <span class="n">c</span><span class="p">)</span>
</code></pre></div>
<ol>
<li>The word <em>functor</em> sometimes means a mapping between categories and other times means an object in a category for which such …</li></ol><p>There's a conspiracy to prevent you from understanding functors, and the
here are some of the tools used to keep you in the dark:</p>
<ol>
<li>In explanations about functional programming, authors blithely use <code>f</code> to mean either a function or an object for which there's a functor, e.g.</li>
</ol>
<div class="highlight"><pre><span></span><code> <span class="n">f</span> <span class="o">::</span> <span class="p">(</span><span class="n">a</span> <span class="o">-></span> <span class="n">b</span> <span class="o">-></span> <span class="n">c</span><span class="p">)</span>
<span class="n">fmap</span> <span class="o">::</span> <span class="nc">Functor</span> <span class="n">f</span> <span class="o">=></span> <span class="p">(</span><span class="n">d</span> <span class="o">-></span> <span class="n">e</span><span class="p">)</span> <span class="o">-></span> <span class="n">f</span> <span class="n">d</span> <span class="o">-></span> <span class="n">f</span> <span class="n">e</span>
<span class="n">fmap</span> <span class="n">f</span> <span class="o">::</span> <span class="nc">Functor</span> <span class="n">f</span> <span class="o">=></span> <span class="n">f</span> <span class="n">a</span> <span class="o">-></span> <span class="n">f</span> <span class="p">(</span><span class="n">b</span> <span class="o">-></span> <span class="n">c</span><span class="p">)</span> <span class="o">--</span> <span class="nc">Identify</span> <span class="n">d</span> <span class="k">with</span> <span class="n">a</span><span class="p">,</span> <span class="n">and</span> <span class="n">e</span> <span class="k">with</span> <span class="p">(</span><span class="n">b</span> <span class="o">-></span> <span class="n">c</span><span class="p">)</span>
</code></pre></div>
<ol>
<li>The word <em>functor</em> sometimes means a mapping between categories and other times means an object in a category for which such a mapping exists.</li>
<li>Back in OO land, people have been using the term <em>functor</em>
<a href="https://news.ycombinator.com/item?id=2856074">since 1992</a>
to mean an object used to encapsulate a function, i.e. pretty much any
class with any method, as long as the intent is to fake a first class function. If the method
is in C++ and called <code>operator()</code> or in scala and called <code>apply()</code> then it will even look like a function.</li>
</ol>
<p>That bit about scala's <code>apply()</code> is a sub-annoyance. The use of the word here has nothing to
do with applicative functors or with the way <code>apply</code> is used in lisp.</p>Podsnappery1900-01-01T00:00:00-05:041900-01-01T00:00:00-05:04Charles Dickenstag:blog.podsnap.com,1900-01-01:/about-mrpodsnap.html<p>Mr Podsnap was well to do, and stood very high in Mr Podsnap's opinion.
Beginning with a good inheritance, he had married a good inheritance,
and had thriven exceedingly in the Marine Insurance way, and was
quite satisfied. He never could make out why everybody was not quite
satisfied, and he felt conscious that he set a brilliant social example
in being particularly well satisfied with most things, and, above all
other things, with himself.</p>
<p>Thus happily acquainted with his own merit and importance, Mr Podsnap
settled that whatever he put behind him he put out of existence. There
was …</p><p>Mr Podsnap was well to do, and stood very high in Mr Podsnap's opinion.
Beginning with a good inheritance, he had married a good inheritance,
and had thriven exceedingly in the Marine Insurance way, and was
quite satisfied. He never could make out why everybody was not quite
satisfied, and he felt conscious that he set a brilliant social example
in being particularly well satisfied with most things, and, above all
other things, with himself.</p>
<p>Thus happily acquainted with his own merit and importance, Mr Podsnap
settled that whatever he put behind him he put out of existence. There
was a dignified conclusiveness--not to add a grand convenience--in
this way of getting rid of disagreeables which had done much towards
establishing Mr Podsnap in his lofty place in Mr Podsnap's satisfaction.
'I don't want to know about it; I don't choose to discuss it; I don't
admit it!' Mr Podsnap had even acquired a peculiar flourish of his
right arm in often clearing the world of its most difficult problems, by
sweeping them behind him (and consequently sheer away) with those words
and a flushed face. For they affronted him.</p>
<p>Mr Podsnap's world was not a very large world, morally; no, nor even
geographically: seeing that although his business was sustained upon
commerce with other countries, he considered other countries, with that
important reservation, a mistake, and of their manners and customs would
conclusively observe, 'Not English!' when, PRESTO! with a flourish of
the arm, and a flush of the face, they were swept away. Elsewhere, the
world got up at eight, shaved close at a quarter-past, breakfasted at
nine, went to the City at ten, came home at half-past five, and dined
at seven. Mr Podsnap's notions of the Arts in their integrity might have
been stated thus. Literature; large print, respectfully descriptive of
getting up at eight, shaving close at a quarter past, breakfasting
at nine, going to the City at ten, coming home at half-past five,
and dining at seven. Painting and Sculpture; models and portraits
representing Professors of getting up at eight, shaving close at a
quarter past, breakfasting at nine, going to the City at ten, coming
home at half-past five, and dining at seven. Music; a respectable
performance (without variations) on stringed and wind instruments,
sedately expressive of getting up at eight, shaving close at a quarter
past, breakfasting at nine, going to the City at ten, coming home at
half-past five, and dining at seven. Nothing else to be permitted to
those same vagrants the Arts, on pain of excommunication. Nothing else
To Be--anywhere!</p>
<p>As a so eminently respectable man, Mr Podsnap was sensible of its being
required of him to take Providence under his protection. Consequently he
always knew exactly what Providence meant. Inferior and less respectable
men might fall short of that mark, but Mr Podsnap was always up to it.
And it was very remarkable (and must have been very comfortable) that
what Providence meant, was invariably what Mr Podsnap meant.</p>
<p>These may be said to have been the articles of a faith and school
which the present chapter takes the liberty of calling, after its
representative man, Podsnappery. They were confined within close bounds,
as Mr Podsnap's own head was confined by his shirt-collar; and they
were enunciated with a sounding pomp that smacked of the creaking of Mr
Podsnap's own boots.</p>
<p>There was a Miss Podsnap. And this young rocking-horse was being trained
in her mother's art of prancing in a stately manner without ever getting
on. But the high parental action was not yet imparted to her, and
in truth she was but an undersized damsel, with high shoulders, low
spirits, chilled elbows, and a rasped surface of nose, who seemed to
take occasional frosty peeps out of childhood into womanhood, and to
shrink back again, overcome by her mother's head-dress and her father
from head to foot--crushed by the mere dead-weight of Podsnappery.</p>
<p>A certain institution in Mr Podsnap's mind which he called 'the young
person' may be considered to have been embodied in Miss Podsnap, his
daughter. It was an inconvenient and exacting institution, as requiring
everything in the universe to be filed down and fitted to it. The
question about everything was, would it bring a blush into the cheek of
the young person? And the inconvenience of the young person was, that,
according to Mr Podsnap, she seemed always liable to burst into
blushes when there was no need at all. There appeared to be no line of
demarcation between the young person's excessive innocence, and another
person's guiltiest knowledge. Take Mr Podsnap's word for it, and the
soberest tints of drab, white, lilac, and grey, were all flaming red to
this troublesome Bull of a young person.</p>
<p>The Podsnaps lived in a shady angle adjoining Portman Square. They were
a kind of people certain to dwell in the shade, wherever they dwelt.
Miss Podsnap's life had been, from her first appearance on this planet,
altogether of a shady order; for, Mr Podsnap's young person was likely
to get little good out of association with other young persons, and had
therefore been restricted to companionship with not very congenial older
persons, and with massive furniture. Miss Podsnap's early views of life
being principally derived from the reflections of it in her father's
boots, and in the walnut and rosewood tables of the dim drawing-rooms,
and in their swarthy giants of looking-glasses, were of a sombre cast;
and it was not wonderful that now, when she was on most days solemnly
tooled through the Park by the side of her mother in a great tall
custard-coloured phaeton, she showed above the apron of that vehicle
like a dejected young person sitting up in bed to take a startled look
at things in general, and very strongly desiring to get her head under
the counterpane again.</p>
<p>Said Mr Podsnap to Mrs Podsnap, 'Georgiana is almost eighteen.'</p>
<p>Said Mrs Podsnap to Mr Podsnap, assenting, 'Almost eighteen.'</p>
<p>Said Mr Podsnap then to Mrs Podsnap, 'Really I think we should have some
people on Georgiana's birthday.'</p>
<p>Said Mrs Podsnap then to Mr Podsnap, 'Which will enable us to clear off
all those people who are due.'</p>
<p>So it came to pass that Mr and Mrs Podsnap requested the honour of the
company of seventeen friends of their souls at dinner; and that they
substituted other friends of their souls for such of the seventeen
original friends of their souls as deeply regretted that a prior
engagement prevented their having the honour of dining with Mr and Mrs
Podsnap, in pursuance of their kind invitation; and that Mrs Podsnap
said of all these inconsolable personages, as she checked them off with
a pencil in her list, 'Asked, at any rate, and got rid of;' and that
they successfully disposed of a good many friends of their souls in this
way, and felt their consciences much lightened.</p>
<p>There were still other friends of their souls who were not entitled to
be asked to dinner, but had a claim to be invited to come and take a
haunch of mutton vapour-bath at half-past nine. For the clearing off
of these worthies, Mrs Podsnap added a small and early evening to the
dinner, and looked in at the music-shop to bespeak a well-conducted
automaton to come and play quadrilles for a carpet dance.</p>
<p>Mr and Mrs Veneering, and Mr and Mrs Veneering's bran-new bride and
bridegroom, were of the dinner company; but the Podsnap establishment
had nothing else in common with the Veneerings. Mr Podsnap could
tolerate taste in a mushroom man who stood in need of that sort
of thing, but was far above it himself. Hideous solidity was the
characteristic of the Podsnap plate. Everything was made to look as
heavy as it could, and to take up as much room as possible. Everything
said boastfully, 'Here you have as much of me in my ugliness as if I
were only lead; but I am so many ounces of precious metal worth so much
an ounce;--wouldn't you like to melt me down?' A corpulent straddling
epergne, blotched all over as if it had broken out in an eruption rather
than been ornamented, delivered this address from an unsightly silver
platform in the centre of the table. Four silver wine-coolers, each
furnished with four staring heads, each head obtrusively carrying a big
silver ring in each of its ears, conveyed the sentiment up and down the
table, and handed it on to the pot-bellied silver salt-cellars. All the
big silver spoons and forks widened the mouths of the company expressly
for the purpose of thrusting the sentiment down their throats with every
morsel they ate.</p>
<p>The majority of the guests were like the plate, and included several
heavy articles weighing ever so much. But there was a foreign gentleman
among them: whom Mr Podsnap had invited after much debate with
himself--believing the whole European continent to be in mortal alliance
against the young person--and there was a droll disposition, not only on
the part of Mr Podsnap but of everybody else, to treat him as if he were
a child who was hard of hearing.</p>
<p>As a delicate concession to this unfortunately-born foreigner, Mr
Podsnap, in receiving him, had presented his wife as 'Madame Podsnap;'
also his daughter as 'Mademoiselle Podsnap,' with some inclination to
add 'ma fille,' in which bold venture, however, he checked himself. The
Veneerings being at that time the only other arrivals, he had added (in
a condescendingly explanatory manner), 'Monsieur Vey-nair-reeng,' and
had then subsided into English.</p>
<p>'How Do You Like London?' Mr Podsnap now inquired from his station of
host, as if he were administering something in the nature of a powder or
potion to the deaf child; 'London, Londres, London?'</p>
<p>The foreign gentleman admired it.</p>
<p>'You find it Very Large?' said Mr Podsnap, spaciously.</p>
<p>The foreign gentleman found it very large.</p>
<p>'And Very Rich?'</p>
<p>The foreign gentleman found it, without doubt, enormement riche.</p>
<p>'Enormously Rich, We say,' returned Mr Podsnap, in a condescending
manner. 'Our English adverbs do Not terminate in Mong, and We Pronounce
the "ch" as if there were a "t" before it. We say Ritch.'</p>
<p>'Reetch,' remarked the foreign gentleman.</p>
<p>'And Do You Find, Sir,' pursued Mr Podsnap, with dignity, 'Many
Evidences that Strike You, of our British Constitution in the Streets Of
The World's Metropolis, London, Londres, London?'</p>
<p>The foreign gentleman begged to be pardoned, but did not altogether
understand.</p>
<p>'The Constitution Britannique,' Mr Podsnap explained, as if he were
teaching in an infant school. 'We Say British, But You Say Britannique,
You Know' (forgivingly, as if that were not his fault). 'The
Constitution, Sir.'</p>
<p>The foreign gentleman said, 'Mais, yees; I know eem.'</p>
<p>A youngish sallowish gentleman in spectacles, with a lumpy forehead,
seated in a supplementary chair at a corner of the table, here caused
a profound sensation by saying, in a raised voice, 'ESKER,' and then
stopping dead.</p>
<p>'Mais oui,' said the foreign gentleman, turning towards him. 'Est-ce
que? Quoi donc?'</p>
<p>But the gentleman with the lumpy forehead having for the time delivered
himself of all that he found behind his lumps, spake for the time no
more.</p>
<p>'I Was Inquiring,' said Mr Podsnap, resuming the thread of his
discourse, 'Whether You Have Observed in our Streets as We should say,
Upon our Pavvy as You would say, any Tokens--'</p>
<p>The foreign gentleman, with patient courtesy entreated pardon; 'But what
was tokenz?'</p>
<p>'Marks,' said Mr Podsnap; 'Signs, you know, Appearances--Traces.'</p>
<p>'Ah! Of a Orse?' inquired the foreign gentleman.</p>
<p>'We call it Horse,' said Mr Podsnap, with forbearance. 'In England,
Angleterre, England, We Aspirate the "H," and We Say "Horse." Only our
Lower Classes Say "Orse!"'</p>
<p>'Pardon,' said the foreign gentleman; 'I am alwiz wrong!'</p>
<p>'Our Language,' said Mr Podsnap, with a gracious consciousness of being
always right, 'is Difficult. Ours is a Copious Language, and Trying to
Strangers. I will not Pursue my Question.'</p>
<p>But the lumpy gentleman, unwilling to give it up, again madly said,
'ESKER,' and again spake no more.</p>
<p>'It merely referred,' Mr Podsnap explained, with a sense of meritorious
proprietorship, 'to Our Constitution, Sir. We Englishmen are Very Proud
of our Constitution, Sir. It Was Bestowed Upon Us By Providence. No
Other Country is so Favoured as This Country.'</p>
<p>'And ozer countries?--' the foreign gentleman was beginning, when Mr
Podsnap put him right again.</p>
<p>'We do not say Ozer; we say Other: the letters are "T" and "H;" You say
Tay and Aish, You Know; (still with clemency). The sound is "th"--"th!"'</p>
<p>'And OTHER countries,' said the foreign gentleman. 'They do how?'</p>
<p>'They do, Sir,' returned Mr Podsnap, gravely shaking his head; 'they
do--I am sorry to be obliged to say it--AS they do.'</p>
<p>'It was a little particular of Providence,' said the foreign gentleman,
laughing; 'for the frontier is not large.'</p>
<p>'Undoubtedly,' assented Mr Podsnap; 'But So it is. It was the Charter
of the Land. This Island was Blest, Sir, to the Direct Exclusion of
such Other Countries as--as there may happen to be. And if we were all
Englishmen present, I would say,' added Mr Podsnap, looking round upon
his compatriots, and sounding solemnly with his theme, 'that there is in
the Englishman a combination of qualities, a modesty, an independence,
a responsibility, a repose, combined with an absence of everything
calculated to call a blush into the cheek of a young person, which one
would seek in vain among the Nations of the Earth.'</p>
<p>Having delivered this little summary, Mr Podsnap's face flushed, as he
thought of the remote possibility of its being at all qualified by
any prejudiced citizen of any other country; and, with his favourite
right-arm flourish, he put the rest of Europe and the whole of Asia,
Africa, and America nowhere.</p>
<p>The audience were much edified by this passage of words; and Mr Podsnap,
feeling that he was in rather remarkable force to-day, became smiling
and conversational.</p>
<p>'Has anything more been heard, Veneering,' he inquired, 'of the lucky
legatee?'</p>
<p>'Nothing more,' returned Veneering, 'than that he has come into
possession of the property. I am told people now call him The Golden
Dustman. I mentioned to you some time ago, I think, that the young lady
whose intended husband was murdered is daughter to a clerk of mine?'</p>
<p>'Yes, you told me that,' said Podsnap; 'and by-the-bye, I wish you would
tell it again here, for it's a curious coincidence--curious that the
first news of the discovery should have been brought straight to your
table (when I was there), and curious that one of your people should
have been so nearly interested in it. Just relate that, will you?'</p>
<p>Veneering was more than ready to do it, for he had prospered exceedingly
upon the Harmon Murder, and had turned the social distinction it
conferred upon him to the account of making several dozen of bran-new
bosom-friends. Indeed, such another lucky hit would almost have set him
up in that way to his satisfaction. So, addressing himself to the most
desirable of his neighbours, while Mrs Veneering secured the next most
desirable, he plunged into the case, and emerged from it twenty minutes
afterwards with a Bank Director in his arms. In the mean time, Mrs
Veneering had dived into the same waters for a wealthy Ship-Broker, and
had brought him up, safe and sound, by the hair. Then Mrs Veneering had
to relate, to a larger circle, how she had been to see the girl, and how
she was really pretty, and (considering her station) presentable.
And this she did with such a successful display of her eight aquiline
fingers and their encircling jewels, that she happily laid hold of a
drifting General Officer, his wife and daughter, and not only restored
their animation which had become suspended, but made them lively friends
within an hour.</p>
<p>Although Mr Podsnap would in a general way have highly disapproved of
Bodies in rivers as ineligible topics with reference to the cheek of the
young person, he had, as one may say, a share in this affair which made
him a part proprietor. As its returns were immediate, too, in the way
of restraining the company from speechless contemplation of the
wine-coolers, it paid, and he was satisfied.</p>
<p>And now the haunch of mutton vapour-bath having received a gamey
infusion, and a few last touches of sweets and coffee, was quite ready,
and the bathers came; but not before the discreet automaton had got
behind the bars of the piano music-desk, and there presented the
appearance of a captive languishing in a rose-wood jail. And who now
so pleasant or so well assorted as Mr and Mrs Alfred Lammle, he all
sparkle, she all gracious contentment, both at occasional intervals
exchanging looks like partners at cards who played a game against All
England.</p>
<p>There was not much youth among the bathers, but there was no youth
(the young person always excepted) in the articles of Podsnappery. Bald
bathers folded their arms and talked to Mr Podsnap on the hearthrug;
sleek-whiskered bathers, with hats in their hands, lunged at Mrs Podsnap
and retreated; prowling bathers, went about looking into ornamental
boxes and bowls as if they had suspicions of larceny on the part of the
Podsnaps, and expected to find something they had lost at the bottom;
bathers of the gentler sex sat silently comparing ivory shoulders. All
this time and always, poor little Miss Podsnap, whose tiny efforts (if
she had made any) were swallowed up in the magnificence of her mother's
rocking, kept herself as much out of sight and mind as she could,
and appeared to be counting on many dismal returns of the day. It was
somehow understood, as a secret article in the state proprieties of
Podsnappery that nothing must be said about the day. Consequently this
young damsel's nativity was hushed up and looked over, as if it were
agreed on all hands that it would have been better that she had never
been born.</p>
<p>The Lammles were so fond of the dear Veneerings that they could not for
some time detach themselves from those excellent friends; but at length,
either a very open smile on Mr Lammle's part, or a very secret elevation
of one of his gingerous eyebrows--certainly the one or the other--seemed
to say to Mrs Lammle, 'Why don't you play?' And so, looking about her,
she saw Miss Podsnap, and seeming to say responsively, 'That card?' and
to be answered, 'Yes,' went and sat beside Miss Podsnap.</p>
<p>Mrs Lammle was overjoyed to escape into a corner for a little quiet
talk.</p>
<p>It promised to be a very quiet talk, for Miss Podsnap replied in a
flutter, 'Oh! Indeed, it's very kind of you, but I am afraid I DON'T
talk.'</p>
<p>'Let us make a beginning,' said the insinuating Mrs Lammle, with her
best smile.</p>
<p>'Oh! I am afraid you'll find me very dull. But Ma talks!'</p>
<p>That was plainly to be seen, for Ma was talking then at her usual
canter, with arched head and mane, opened eyes and nostrils.</p>
<p>'Fond of reading perhaps?'</p>
<p>'Yes. At least I--don't mind that so much,' returned Miss Podsnap.</p>
<p>'M-m-m-m-music.' So insinuating was Mrs Lammle that she got half a dozen
ms into the word before she got it out.</p>
<p>'I haven't nerve to play even if I could. Ma plays.'</p>
<p>(At exactly the same canter, and with a certain flourishing appearance
of doing something, Ma did, in fact, occasionally take a rock upon the
instrument.)</p>
<p>'Of course you like dancing?'</p>
<p>'Oh no, I don't,' said Miss Podsnap.</p>
<p>'No? With your youth and attractions? Truly, my dear, you surprise me!'</p>
<p>'I can't say,' observed Miss Podsnap, after hesitating considerably, and
stealing several timid looks at Mrs Lammle's carefully arranged face,
'how I might have liked it if I had been a--you won't mention it, WILL
you?'</p>
<p>'My dear! Never!'</p>
<p>'No, I am sure you won't. I can't say then how I should have liked it,
if I had been a chimney-sweep on May-day.'</p>
<p>'Gracious!' was the exclamation which amazement elicited from Mrs
Lammle.</p>
<p>'There! I knew you'd wonder. But you won't mention it, will you?'</p>
<p>'Upon my word, my love,' said Mrs Lammle, 'you make me ten times more
desirous, now I talk to you, to know you well than I was when I sat over
yonder looking at you. How I wish we could be real friends! Try me as a
real friend. Come! Don't fancy me a frumpy old married woman, my dear;
I was married but the other day, you know; I am dressed as a bride now,
you see. About the chimney-sweeps?'</p>
<p>'Hush! Ma'll hear.'</p>
<p>'She can't hear from where she sits.'</p>
<p>'Don't you be too sure of that,' said Miss Podsnap, in a lower voice.
'Well, what I mean is, that they seem to enjoy it.'</p>
<p>'And that perhaps you would have enjoyed it, if you had been one of
them?'</p>
<p>Miss Podsnap nodded significantly.</p>
<p>'Then you don't enjoy it now?'</p>
<p>'How is it possible?' said Miss Podsnap. 'Oh it is such a dreadful
thing! If I was wicked enough--and strong enough--to kill anybody, it
should be my partner.'</p>
<p>This was such an entirely new view of the Terpsichorean art as
socially practised, that Mrs Lammle looked at her young friend in some
astonishment. Her young friend sat nervously twiddling her fingers in
a pinioned attitude, as if she were trying to hide her elbows. But this
latter Utopian object (in short sleeves) always appeared to be the great
inoffensive aim of her existence.</p>
<p>'It sounds horrid, don't it?' said Miss Podsnap, with a penitential
face.</p>
<p>Mrs Lammle, not very well knowing what to answer, resolved herself into
a look of smiling encouragement.</p>
<p>'But it is, and it always has been,' pursued Miss Podsnap, 'such a trial
to me! I so dread being awful. And it is so awful! No one knows what
I suffered at Madame Sauteuse's, where I learnt to dance and make
presentation-curtseys, and other dreadful things--or at least where they
tried to teach me. Ma can do it.'</p>
<p>'At any rate, my love,' said Mrs Lammle, soothingly, 'that's over.'</p>
<p>'Yes, it's over,' returned Miss Podsnap, 'but there's nothing gained by
that. It's worse here, than at Madame Sauteuse's. Ma was there, and Ma's
here; but Pa wasn't there, and company wasn't there, and there were not
real partners there. Oh there's Ma speaking to the man at the piano! Oh
there's Ma going up to somebody! Oh I know she's going to bring him
to me! Oh please don't, please don't, please don't! Oh keep away, keep
away, keep away!' These pious ejaculations Miss Podsnap uttered with her
eyes closed, and her head leaning back against the wall.</p>
<p>But the Ogre advanced under the pilotage of Ma, and Ma said, 'Georgiana,
Mr Grompus,' and the Ogre clutched his victim and bore her off to his
castle in the top couple. Then the discreet automaton who had surveyed
his ground, played a blossomless tuneless 'set,' and sixteen disciples
of Podsnappery went through the figures of - 1, Getting up at eight and
shaving close at a quarter past - 2, Breakfasting at nine - 3, Going to
the City at ten - 4, Coming home at half-past five - 5, Dining at seven,
and the grand chain.</p>
<p>While these solemnities were in progress, Mr Alfred Lammle (most loving
of husbands) approached the chair of Mrs Alfred Lammle (most loving of
wives), and bending over the back of it, trifled for some few seconds
with Mrs Lammle's bracelet. Slightly in contrast with this brief airy
toying, one might have noticed a certain dark attention in Mrs Lammle's
face as she said some words with her eyes on Mr Lammle's waistcoat, and
seemed in return to receive some lesson. But it was all done as a breath
passes from a mirror.</p>
<p>And now, the grand chain riveted to the last link, the discreet
automaton ceased, and the sixteen, two and two, took a walk among
the furniture. And herein the unconsciousness of the Ogre Grompus was
pleasantly conspicuous; for, that complacent monster, believing that
he was giving Miss Podsnap a treat, prolonged to the utmost stretch
of possibility a peripatetic account of an archery meeting; while his
victim, heading the procession of sixteen as it slowly circled about,
like a revolving funeral, never raised her eyes except once to steal a
glance at Mrs Lammle, expressive of intense despair.</p>
<p>At length the procession was dissolved by the violent arrival of a
nutmeg, before which the drawing-room door bounced open as if it were a
cannon-ball; and while that fragrant article, dispersed through several
glasses of coloured warm water, was going the round of society, Miss
Podsnap returned to her seat by her new friend.</p>
<p>'Oh my goodness,' said Miss Podsnap. 'THAT'S over! I hope you didn't
look at me.'</p>
<p>'My dear, why not?'</p>
<p>'Oh I know all about myself,' said Miss Podsnap.</p>
<p>'I'll tell you something I know about you, my dear,' returned Mrs Lammle
in her winning way, 'and that is, you are most unnecessarily shy.'</p>
<p>'Ma ain't,' said Miss Podsnap. '--I detest you! Go along!' This shot
was levelled under her breath at the gallant Grompus for bestowing an
insinuating smile upon her in passing.</p>
<p>'Pardon me if I scarcely see, my dear Miss Podsnap,' Mrs Lammle was
beginning when the young lady interposed.</p>
<p>'If we are going to be real friends (and I suppose we are, for you are
the only person who ever proposed it) don't let us be awful. It's awful
enough to BE Miss Podsnap, without being called so. Call me Georgiana.'</p>
<p>'Dearest Georgiana,' Mrs Lammle began again.</p>
<p>'Thank you,' said Miss Podsnap.</p>
<p>'Dearest Georgiana, pardon me if I scarcely see, my love, why your
mamma's not being shy, is a reason why you should be.'</p>
<p>'Don't you really see that?' asked Miss Podsnap, plucking at her fingers
in a troubled manner, and furtively casting her eyes now on Mrs Lammle,
now on the ground. 'Then perhaps it isn't?'</p>
<p>'My dearest Georgiana, you defer much too readily to my poor opinion.
Indeed it is not even an opinion, darling, for it is only a confession
of my dullness.'</p>
<p>'Oh YOU are not dull,' returned Miss Podsnap. 'I am dull, but you
couldn't have made me talk if you were.'</p>
<p>Some little touch of conscience answering this perception of her having
gained a purpose, called bloom enough into Mrs Lammle's face to make it
look brighter as she sat smiling her best smile on her dear Georgiana,
and shaking her head with an affectionate playfulness. Not that it meant
anything, but that Georgiana seemed to like it.</p>
<p>'What I mean is,' pursued Georgiana, 'that Ma being so endowed with
awfulness, and Pa being so endowed with awfulness, and there being
so much awfulness everywhere--I mean, at least, everywhere where I
am--perhaps it makes me who am so deficient in awfulness, and frightened
at it--I say it very badly--I don't know whether you can understand what
I mean?'</p>
<p>'Perfectly, dearest Georgiana!' Mrs Lammle was proceeding with every
reassuring wile, when the head of that young lady suddenly went back
against the wall again and her eyes closed.</p>
<p>'Oh there's Ma being awful with somebody with a glass in his eye! Oh I
know she's going to bring him here! Oh don't bring him, don't bring him!
Oh he'll be my partner with his glass in his eye! Oh what shall I do!'
This time Georgiana accompanied her ejaculations with taps of her feet
upon the floor, and was altogether in quite a desperate condition. But,
there was no escape from the majestic Mrs Podsnap's production of an
ambling stranger, with one eye screwed up into extinction and the other
framed and glazed, who, having looked down out of that organ, as if he
descried Miss Podsnap at the bottom of some perpendicular shaft, brought
her to the surface, and ambled off with her. And then the captive at the
piano played another 'set,' expressive of his mournful aspirations after
freedom, and other sixteen went through the former melancholy motions,
and the ambler took Miss Podsnap for a furniture walk, as if he had
struck out an entirely original conception.</p>
<p>In the mean time a stray personage of a meek demeanour, who had wandered
to the hearthrug and got among the heads of tribes assembled there in
conference with Mr Podsnap, eliminated Mr Podsnap's flush and
flourish by a highly unpolite remark; no less than a reference to the
circumstance that some half-dozen people had lately died in the streets,
of starvation. It was clearly ill-timed after dinner. It was not adapted
to the cheek of the young person. It was not in good taste.</p>
<p>'I don't believe it,' said Mr Podsnap, putting it behind him.</p>
<p>The meek man was afraid we must take it as proved, because there were
the Inquests and the Registrar's returns.</p>
<p>'Then it was their own fault,' said Mr Podsnap.</p>
<p>Veneering and other elders of tribes commended this way out of it. At
once a short cut and a broad road.</p>
<p>The man of meek demeanour intimated that truly it would seem from
the facts, as if starvation had been forced upon the culprits in
question--as if, in their wretched manner, they had made their weak
protests against it--as if they would have taken the liberty of staving
it off if they could--as if they would rather not have been starved upon
the whole, if perfectly agreeable to all parties.</p>
<p>'There is not,' said Mr Podsnap, flushing angrily, 'there is not a
country in the world, sir, where so noble a provision is made for the
poor as in this country.'</p>
<p>The meek man was quite willing to concede that, but perhaps it
rendered the matter even worse, as showing that there must be something
appallingly wrong somewhere.</p>
<p>'Where?' said Mr Podsnap.</p>
<p>The meek man hinted Wouldn't it be well to try, very seriously, to find
out where?</p>
<p>'Ah!' said Mr Podsnap. 'Easy to say somewhere; not so easy to say
where! But I see what you are driving at. I knew it from the first.
Centralization. No. Never with my consent. Not English.'</p>
<p>An approving murmur arose from the heads of tribes; as saying, 'There
you have him! Hold him!'</p>
<p>He was not aware (the meek man submitted of himself) that he was driving
at any ization. He had no favourite ization that he knew of. But he
certainly was more staggered by these terrible occurrences than he was
by names, of howsoever so many syllables. Might he ask, was dying of
destitution and neglect necessarily English?</p>
<p>'You know what the population of London is, I suppose,' said Mr Podsnap.</p>
<p>The meek man supposed he did, but supposed that had absolutely nothing
to do with it, if its laws were well administered.</p>
<p>'And you know; at least I hope you know;' said Mr Podsnap, with
severity, 'that Providence has declared that you shall have the poor
always with you?'</p>
<p>The meek man also hoped he knew that.</p>
<p>'I am glad to hear it,' said Mr Podsnap with a portentous air. 'I am
glad to hear it. It will render you cautious how you fly in the face of
Providence.'</p>
<p>In reference to that absurd and irreverent conventional phrase, the meek
man said, for which Mr Podsnap was not responsible, he the meek man had
no fear of doing anything so impossible; but--</p>
<p>But Mr Podsnap felt that the time had come for flushing and flourishing
this meek man down for good. So he said:</p>
<p>'I must decline to pursue this painful discussion. It is not pleasant to
my feelings; it is repugnant to my feelings. I have said that I do not
admit these things. I have also said that if they do occur (not that I
admit it), the fault lies with the sufferers themselves. It is not for
ME'--Mr Podsnap pointed 'me' forcibly, as adding by implication though
it may be all very well for YOU--'it is not for me to impugn the
workings of Providence. I know better than that, I trust, and I have
mentioned what the intentions of Providence are. Besides,' said
Mr Podsnap, flushing high up among his hair-brushes, with a strong
consciousness of personal affront, 'the subject is a very disagreeable
one. I will go so far as to say it is an odious one. It is not one to be
introduced among our wives and young persons, and I--' He finished with
that flourish of his arm which added more expressively than any words,
And I remove it from the face of the earth.</p>
<p>Simultaneously with this quenching of the meek man's ineffectual fire;
Georgiana having left the ambler up a lane of sofa, in a No Thoroughfare
of back drawing-room, to find his own way out, came back to Mrs Lammle.
And who should be with Mrs Lammle, but Mr Lammle. So fond of her!</p>
<p>'Alfred, my love, here is my friend. Georgiana, dearest girl, you must
like my husband next to me.'</p>
<p>Mr Lammle was proud to be so soon distinguished by this special
commendation to Miss Podsnap's favour. But if Mr Lammle were prone to be
jealous of his dear Sophronia's friendships, he would be jealous of her
feeling towards Miss Podsnap.</p>
<p>'Say Georgiana, darling,' interposed his wife.</p>
<p>'Towards--shall I?--Georgiana.' Mr Lammle uttered the name, with a
delicate curve of his right hand, from his lips outward. 'For never have
I known Sophronia (who is not apt to take sudden likings) so attracted
and so captivated as she is by--shall I once more?--Georgiana.'</p>
<p>The object of this homage sat uneasily enough in receipt of it, and then
said, turning to Mrs Lammle, much embarrassed:</p>
<p>'I wonder what you like me for! I am sure I can't think.'</p>
<p>'Dearest Georgiana, for yourself. For your difference from all around
you.'</p>
<p>'Well! That may be. For I think I like you for your difference from all
around me,' said Georgiana with a smile of relief.</p>
<p>'We must be going with the rest,' observed Mrs Lammle, rising with a
show of unwillingness, amidst a general dispersal. 'We are real friends,
Georgiana dear?'</p>
<p>'Real.'</p>
<p>'Good night, dear girl!'</p>
<p>She had established an attraction over the shrinking nature upon which
her smiling eyes were fixed, for Georgiana held her hand while she
answered in a secret and half-frightened tone:</p>
<p>'Don't forget me when you are gone away. And come again soon. Good
night!'</p>
<p>Charming to see Mr and Mrs Lammle taking leave so gracefully, and going
down the stairs so lovingly and sweetly. Not quite so charming to see
their smiling faces fall and brood as they dropped moodily into separate
corners of their little carriage. But to be sure that was a sight behind
the scenes, which nobody saw, and which nobody was meant to see.</p>
<p>Certain big, heavy vehicles, built on the model of the Podsnap plate,
took away the heavy articles of guests weighing ever so much; and the
less valuable articles got away after their various manners; and the
Podsnap plate was put to bed. As Mr Podsnap stood with his back to the
drawing-room fire, pulling up his shirtcollar, like a veritable cock
of the walk literally pluming himself in the midst of his possessions,
nothing would have astonished him more than an intimation that Miss
Podsnap, or any other young person properly born and bred, could not be
exactly put away like the plate, brought out like the plate, polished
like the plate, counted, weighed, and valued like the plate. That such
a young person could possibly have a morbid vacancy in the heart for
anything younger than the plate, or less monotonous than the plate;
or that such a young person's thoughts could try to scale the region
bounded on the north, south, east, and west, by the plate; was a
monstrous imagination which he would on the spot have flourished into
space. This perhaps in some sort arose from Mr Podsnap's blushing young
person being, so to speak, all cheek; whereas there is a possibility
that there may be young persons of a rather more complex organization.</p>
<p>If Mr Podsnap, pulling up his shirt-collar, could only have heard
himself called 'that fellow' in a certain short dialogue, which passed
between Mr and Mrs Lammle in their opposite corners of their little
carriage, rolling home!</p>
<p>'Sophronia, are you awake?'</p>
<p>'Am I likely to be asleep, sir?'</p>
<p>'Very likely, I should think, after that fellow's company. Attend to
what I am going to say.'</p>
<p>'I have attended to what you have already said, have I not? What else
have I been doing all to-night.'</p>
<p>'Attend, I tell you,' (in a raised voice) 'to what I am going to say.
Keep close to that idiot girl. Keep her under your thumb. You have her
fast, and you are not to let her go. Do you hear?'</p>
<p>'I hear you.'</p>
<p>'I foresee there is money to be made out of this, besides taking that
fellow down a peg. We owe each other money, you know.'</p>
<p>Mrs Lammle winced a little at the reminder, but only enough to shake her
scents and essences anew into the atmosphere of the little carriage, as
she settled herself afresh in her own dark corner.</p>
<ul>
<li>Charles Dickens, 1865 (approximated as 1900 for <code>strftime</code>'s sake)</li>
</ul>