TL;DR
It's not too long, but, to summarize the summary, if you read Rich Hickey's 2014 blog post on transducers first, his 2012 post on reducers will be easier to understand.
Brief Definitions
Herewith, all in one place, are Clojuresque definitions of:
- reducible
- reducing function
- transducer
- reducer
- folder
- decomplected
Longer elaborations of these definitions follow in the subsequent section.
reducing function
Anything that can be used as the first argument of the reduce
function, e.g.
+
or conj
. Generally, it's a binary function that returns something of
the type of its first argument, which is supposed to be a kind of accumulation.
reducible
Anything that can be used as the second argument of the reduce
function.
For the purposes of the next few column inches, it's a collection - a vector, list, etc.
(reduce reducing-function reducible)
transducer
A function that turns one reducing function into another, by modifying the second argument or invoking the input function a number of times other than one.
reducer
I lied a little when I defined reducible; actually, it's anything that implements
the CollReduce
protocol, whose coll-reduce
method is invoked by reduce
.
Thus, the exact behavior of reduce
is a property of the specific thing being reduced.
A reducer is a just a special kind of reducible that has a transducer
embedded in it, ready to be invoked whenever reduce
is finally called.
folder
A reducible that also implements CollFold
, enabling you to to use
fold
in lieu of reduce
; the two functions yield
the same answer, but fold
can be parallelized.
decomplected
Basically, a synonym for "separated," as in "separation of concerns."
Elaborations
Transducer
First off, don't be misled by the real meaning, which isn't even a good metaphor for the way we're using it here. For my sins, I know a little about transducers.
Here's a transducer that doubles all incoming elements before passing them to the input reducing function:
(defn double1 [reduction-function]
(fn [result input]
(reduction-function result (* 2 input))))
So that (reduce (double1 +) 0 c)
would produce $2\sum_i c_i$
while (reduce (double1 *) 1 c)
would return
$\prod^n_i 2 c_i = 2^n \prod^n_i c_i$.
And here's one that doubles incoming elements in a different way, by repeating them twice:
(defn double2 [reduction-function]
(fn [result input]
(reduction-function (reduction-function result input) input)))
So that (reduce (double2 +) 0 coll)
will return
$\sum_i (c_i + c_i) = 2 \sum_i c_i$, the same as with double1
,
but (reduce (double2 *) 1 coll)
would return the
$\prod_i c_i c_i = (\prod c_i)^2$.
Three cool things to note:
- The things can be composed, e.g.
(reduce ((comp double1 double2) +) 0 c)
gives you $\sum_i 2(c_i +c_i) = 4\sum_i c_i$. -
The needs of
reduce
are a superset of those ofmap
. You could create a kind ofmap
that takes reducing functions instead of unary functions, and, unlikereduce
, it can be lazy:defn process [xfn c] (lazy-seq (when-let [s (seq c)] (concat ((xfn #(concat %1 (list %2))) '() (first s)) (process xfn (rest s)))))) ;; (process double1 [1 2 3]) ; yields (2 4 6) ;; (process double2 [1 2 3]) ; yields (1 1 2 2 3 3) ;; (take 10 (process double2 (range))) ;; yields (0 0 1 1 2 2 3 3 4 4)
In fact, the
sequence
function in Clojure 1.7 will have a two-argument form([xform coll] (clojure.lang.LazyTransformer/create xform coll))
that does the same thing as my
process
but much more efficiently. -
As long as the transducer doesn't diddle with
result
, it will be totally oblivious to how it is called, from what kind of a collection, or even if it's a traditional collection at all. The earliest compelling examples of this generality arecore.async/chan
nels, which, in their latest incarnation can take transducer arguments to process data that flows through them.
reducer
For example
(def fatrange (reducer (range 10) double1))
is a reducible with a multiplication hiding out, waiting to be invoked
when you (reduce + fatrange)
Since you can compose any number of
transducers, and they'll all get invoked at the time of reduction,
it's a lot more efficient than the conventional
(reduce + (map f1 (map f2 (map f3 ...))))
which creates a vast apparatus of lazy
sequences feeding each other. Moreover, of course, the transducers
can do more than just modify individual elements, as illustrated above
with double2
.
The core.reducers
namespace ships with a bevvy of functions like r/map
,
r/filter
- all convenience functions creating reducers with specific
transducers already attached.
folder
The differences between the reduce
and fold
are that the latter:
- assumes that its reducing function is associative, so that it can
- attack the problem in parallel by repeatedly dividing the collection in two, following the approach famously outlined by Guy Steele and
- using the magic of Java's fork/join framework to perform this work efficiently with a finite number of threads.
If you call reduce
on a folder
, or fold
on a reducer
, the reduction
proceeds normally. Only if you use fold
, and the collection implements CollFold
does
the parallelization occur.
Some of the r/whatever
methods actually create folders, allowing
them to be used with either fold
or reduce
.
It's interesting (to me) to notice that, while the redicible is the 2nd argument to
reduce
(as is the convention in functional languages), it has to be the first
argument in the internal implementations, so that protocol delegation can occur properly.
Under some circumstances, reduction is handled in Java, via the .reduce
method
of the IReduce
interface. This illustrates the "best of both worlds" approach that
Clojure takes to residence on the JVM. We're straight functional lisp when we want to be,
and we're OO when we need to be.
decomplected
Search engines chiefly come up with blog posts about transducers, plus a few curricula vitae. It's definitely not the opposite of complected, so it definitely has nothing do with this. One is given to believe that this repurposing in nomenclature is less confusing than fancy math words like "monoid," by I'm not so sure.
Comments
comments powered by Disqus