TL;DR

It's not too long, but, to summarize the summary, if you read Rich Hickey's 2014 blog post on transducers first, his 2012 post on reducers will be easier to understand.

Brief Definitions

Herewith, all in one place, are Clojuresque definitions of:

  • reducible
  • reducing function
  • transducer
  • reducer
  • folder
  • decomplected

Longer elaborations of these definitions follow in the subsequent section.

reducing function

Anything that can be used as the first argument of the reduce function, e.g. + or conj. Generally, it's a binary function that returns something of the type of its first argument, which is supposed to be a kind of accumulation.

reducible

Anything that can be used as the second argument of the reduce function. For the purposes of the next few column inches, it's a collection - a vector, list, etc.

         (reduce reducing-function reducible)

transducer

A function that turns one reducing function into another, by modifying the second argument or invoking the input function a number of times other than one.

reducer

I lied a little when I defined reducible; actually, it's anything that implements the CollReduce protocol, whose coll-reduce method is invoked by reduce. Thus, the exact behavior of reduce is a property of the specific thing being reduced. A reducer is a just a special kind of reducible that has a transducer embedded in it, ready to be invoked whenever reduce is finally called.

folder

A reducible that also implements CollFold, enabling you to to use fold in lieu of reduce; the two functions yield the same answer, but fold can be parallelized.

decomplected

Basically, a synonym for "separated," as in "separation of concerns."

Elaborations

Transducer

First off, don't be misled by the real meaning, which isn't even a good metaphor for the way we're using it here. For my sins, I know a little about transducers.

Here's a transducer that doubles all incoming elements before passing them to the input reducing function:

(defn double1 [reduction-function]
  (fn [result input]
    (reduction-function result (* 2 input))))

So that (reduce (double1 +) 0 c) would produce $2\sum_i c_i$ while (reduce (double1 *) 1 c) would return $\prod^n_i 2 c_i = 2^n \prod^n_i c_i$. And here's one that doubles incoming elements in a different way, by repeating them twice:

(defn double2 [reduction-function]
  (fn [result input]
    (reduction-function (reduction-function result input) input)))

So that (reduce (double2 +) 0 coll) will return $\sum_i (c_i + c_i) = 2 \sum_i c_i$, the same as with double1, but (reduce (double2 *) 1 coll) would return the $\prod_i c_i c_i = (\prod c_i)^2$.

Three cool things to note:

  1. The things can be composed, e.g. (reduce ((comp double1 double2) +) 0 c) gives you $\sum_i 2(c_i +c_i) = 4\sum_i c_i$.
  2. The needs of reduce are a superset of those of map. You could create a kind of map that takes reducing functions instead of unary functions, and, unlike reduce, it can be lazy:

    defn process [xfn c]
        (lazy-seq (when-let [s (seq c)]
                      (concat ((xfn #(concat %1 (list %2))) '() (first s))
                              (process xfn (rest s))))))
    ;; (process double1 [1 2 3]) ; yields (2 4 6)
    ;; (process double2 [1 2 3]) ; yields (1 1 2 2 3 3)
    ;; (take 10 (process double2 (range))) ;; yields (0 0 1 1 2 2 3 3 4 4)
    

    In fact, the sequence function in Clojure 1.7 will have a two-argument form

    ([xform coll]
       (clojure.lang.LazyTransformer/create xform coll))
    

    that does the same thing as my process but much more efficiently.

  3. As long as the transducer doesn't diddle with result, it will be totally oblivious to how it is called, from what kind of a collection, or even if it's a traditional collection at all. The earliest compelling examples of this generality are core.async/channels, which, in their latest incarnation can take transducer arguments to process data that flows through them.

reducer

For example

(def fatrange (reducer (range 10) double1))

is a reducible with a multiplication hiding out, waiting to be invoked when you (reduce + fatrange) Since you can compose any number of transducers, and they'll all get invoked at the time of reduction, it's a lot more efficient than the conventional

(reduce + (map f1 (map f2 (map f3 ...))))

which creates a vast apparatus of lazy sequences feeding each other. Moreover, of course, the transducers can do more than just modify individual elements, as illustrated above with double2.

The core.reducers namespace ships with a bevvy of functions like r/map, r/filter - all convenience functions creating reducers with specific transducers already attached.

folder

The differences between the reduce and fold are that the latter:

  1. assumes that its reducing function is associative, so that it can
  2. attack the problem in parallel by repeatedly dividing the collection in two, following the approach famously outlined by Guy Steele and
  3. using the magic of Java's fork/join framework to perform this work efficiently with a finite number of threads.

If you call reduce on a folder, or fold on a reducer, the reduction proceeds normally. Only if you use fold, and the collection implements CollFold does the parallelization occur.

Some of the r/whatever methods actually create folders, allowing them to be used with either fold or reduce.

It's interesting (to me) to notice that, while the redicible is the 2nd argument to reduce (as is the convention in functional languages), it has to be the first argument in the internal implementations, so that protocol delegation can occur properly. Under some circumstances, reduction is handled in Java, via the .reduce method of the IReduce interface. This illustrates the "best of both worlds" approach that Clojure takes to residence on the JVM. We're straight functional lisp when we want to be, and we're OO when we need to be.

decomplected

Search engines chiefly come up with blog posts about transducers, plus a few curricula vitae. It's definitely not the opposite of complected, so it definitely has nothing do with this. One is given to believe that this repurposing in nomenclature is less confusing than fancy math words like "monoid," by I'm not so sure.



Comments

comments powered by Disqus