After my recent attempt to provide type annotations for transducers, several people pointed out that I wasn't accounting for state. The signature of a pure function transformation, whether in Clojure
(t/defalias Transducer (t/TFn [[a :variance :covariant]
[b :variance :contravariant]]
(t/All [r] [[r a -> r] -> [r b ->r]])))
or Haskell
type Transducer a b = forall r . (r -> a -> r) -> (r -> b -> r)
nowhere acknowledges that the transducer might need to maintain, for example, the previous value in the series, in order to remove duplicates.
The failure is most obvious to hardcore Haskell programmers, who, as a rule, would rather eat broken glass than modify a hidden state variable in place. State is kept explicitly, though one might use monad sugar to avoid having to look at it all the time.
Hardness of core is a relative thing, and Clojure programmers profess more than a little reverence for referential transparency, so it was a bit surprising to see this:
(defn t-dedup []
(fn [xf]
(let [prev (volatile! ::none)]
(fn
([result input]
(let [prior @prev]
(vreset! prev input) ;; <= gack!
(if (= prior input)
result
(xf result input))))))))
In case my ostensibly clarifying excisions make it not obvious, this de-duplicator
is the suggested exemplar of a stateful transducer, taken straight from
http://clojure.org/transducers (not including the comment).1 When applied to a reducing function, it will
produce a new function, closing over a volatile prev
variable. Not only is the
resulting function impure, but it is cryptically so. Having no way to detect whether
the "stateful function" has been contaminated by prior use, one must take special care to insure
that it is not used in more than one reduction, to say nothing of more than one thread.
Some of my best friends are Python programmers, but there have to be limits.
The answer is right in front of us
Reduction may already be the world's best example of state done right! Novelty
accrues in the reduced quantity, successive versions of which are passed as the left
argument to the reduction function. The quantity is never modified; rather, as usual in functional
programming, new versions are created. If that reduction function is +
, state is accumulated
in the form of a sum; that the sum also happens to be the value we're interested in at
the end of the reduction is a useful coincidence.
In fact, Clojure already uses the reduction value to hold state other than result, via the
reduced
function, used to indicate that reduction may terminate early (as in
take
). What reduced
does is wrap the value up in a container class
(defn reduced [x] (clojure.lang.Reduced. x))
Subsequently, we find out whether to bail by calling reduced?
, which
ultimately does nothing more than check what class we have:
static public boolean isReduced(Object r){ return r instanceof Reduced; }
As Reduced
implements IDeref
, the wrapped value can ultimately be extracted
with @
, which just calls the class's deref
method.
This happens to be the one and only way in which state is currently embedded in the reduction, but if I were king, things would be different.
L'etat, c'est moi
In Clojure, the obvious way to attach state to some arbitrary value to the accumulated
reduction is with the inbuilt metadata
facility. The object returned by (with-meta obj {:boffo "foo"}
is identical to obj
from all perspectives other than that of
silly individuals who put (:boffo (meta obj))
in their code,
so we could have have our state, without the reduce
consumer
having to eat it. The only problem is that you can't attach metadata
to scalar primitives (like numbers), which tend to figure prominently in numerical
reductions. This, presumably, is why reduced
didn't take that route.
Let's define a type that holds state for the deduplicator. It should hold the previous value, and of course we still have to keep the accumulated reduction value:
(deftype DedupStateWrapper [acc prev])
In the transducer, we'll check to see whether the result is already wrapped
(defn t-dedup1 [rf]
(fn [result input]
(if (instance? DedupStateWrapper result)
...
and, if so, check whether the new input
matches the previous:
(if (= input (.prev result))
...
If it's a duplicate, the result
shouldn't change; otherwise, call the
original reduction function and wrap its return value, setting the state
to new input
:
result
(DedupStateWrapper. (rf (.acc result) input) input))
...
Finally, if the result
hasn't been wrapped yet, input
can't possibly be duplicate, so invoke the reduction function
and wrap it up:
(DedupStateWrapper. (rf result input) input))))
Let's try it out:
playground.transducers> (reduce (t-dedup1 +) 0 [1 1 2 2])
#<DedupStateWrapper playground.transducers.DedupStateWrapper@1bf0ed5c>
Oh right.
playground.transducers> (.acc (reduce (t-dedup1 +) 0 [1 1 2 2]))
3
I'd hoped to avoid the manual unwrapping - before realizing that metadata wouldn't work - but we can argue that it's a good thing, forcing some acknowledgment of recent statefulness.
In any
case, without modifying reduce
to do our unwrapping (as it already does for
Reduced
), this is something we just have to live with.
So let's clean up. First, we might as well make a more general-sounding StateWrapper
,
since all transducers basically want the same thing. We'll make it print all pretty
like,
(deftype StateWrapper [value state]
Object
(toString [this] (str "(" (.value this) "," (.state this) ")")))
and write some convenience functions to extract its innards:2
(defn get-state [r]
(when (instance? StateWrapper r)
[(.value r) (.state r)]))
(defn unwrap [r]
(if (instance? StateWrapper r) (unwrap (.value r)) r))
The deduplicator is now:
(defn t-dedup [rf]
(fn [result input]
(debug result input)
(if-let [[result-val input-prev] (get-state result)]
(if (= input input-prev)
result
(StateWrapper. (rf result-val input) input))
(StateWrapper. (rf result input) input))))
and we'll also write an uncontroversially stateless transducer that truncates numbers to integers, which will be useful for testing.
(defn t-trunc [reduction-function]
(fn [result input]
(debug result input)
(reduction-function result (int input))))
Composition
We'll now take a series of floating point numbers [1.0 2.0 2.5 2.5 3.0]
and run them through a stack
of composed transducers to deduplicate, truncate and deduplicate again:
playground.transducers> (unwrap (reduce ((comp t-dedup t-trunc t-dedup) conj) [] [1.0 2.0 2.5 2.5 3.0]))
[1 2 3]
Had we applied corresponding transformations to the entire series, rather than as composed transducers, we would see the following steps:
[1.0 2.0 2.5 2.5 3.0]
[1.0 2.0 2.5 3.0] ; duplicates removed
[1 2 2 3] ; truncated, revealing more duplication
[1 2 3] ; duplicates resulting from truncation removed
In transducer land, we're going to get to this result by a different route, having composed the transformation to run on each input iteratively. With so much wrapping and unwrapping going on, it's helpful to turn on debugging. Note that we can tell whether we're in the first or second deduplication by whether the input is, respectively, a double or an integer.
On the first round,
t-dedup [] 1.0 ;; first t-dedup receives empty vector and 1.0; passes down to t-trunc
t-trunc [] 1.0 ;; t-trunc gets empty vector and 1.0; truncates and passes to t-dedup
t-dedup [] 1 ;; second t-dedup receives empty vector and truncated value; passes to conj
test-conj [] 1 ;; performs conj, returns [1]
On the way back up the composed function stack, the second t-dedup
wraps the conj'd result and
the current input as ([1],1)
and passes it up to t-trunc
, which passes it up, unmodified,
to the first t-dedup
, which wraps it again, together with its first input. The current result,
which will be passed to the next iteration is thus (([1],1),1.0)
:
t-dedup (([1],1),1.0) 2.0 ;; First t-dedup compares 1.0 and 2.0;
unwraps and invokes t-trunc
t-trunc ([1],1) 2.0 ;; t-trunc converts to int; invokes 2nd t-dedup
t-dedup ([1],1) 2 ;; 2nd t-dedup compares 1 and 2; unwraps and invokes conj
test-conj [1] 2
t-dedup (([1 2],2),2.0) 2.5 ;; First t-dedup compares 2.0, 2.5;
unwraps and invokes t-trunc
t-trunc ([1 2],2) 2.5 ;; t-trunc converts to int; invokes 2nd t-dedup
t-dedup ([1 2],2) 2 ;; 2nd t-dedup finds 2==2; DOESN'T invoke conj
Skipping over conj
, we move forward:
t-dedup (([1 2],2),2.5) 2.5 ;; First t-dedup finds 2.5==2.5; DOESN'T invoke t-trunc
t-dedup (([1 2],2),2.5) 3.0 ;; First t-dedup compares 2.5, 3.0; invokes t-trunc
t-trunc ([1 2],2) 3.0 ;; t-trunc converts to int; invokes 2nd t-dedup
t-dedup ([1 2],2) 3 ;; 2nd dedup compares 2,3; invokes conj
test-conj [1 2] 3
The final [1 2 3]
passes up through two layers of wrapping, so the value returned by
reduce
is (([1 2 3] 3), 3.0)
, which unwrap
peels apart recursively.
Unsolved problems
-
All this wrapping and unwrapping doesn't come for free, computationally speaking. It seems possible that the JIT will optimize some of it away, but I haven't checked yet.
-
Having abolished crypto-state, we're a little closer to solving the type problem, but not quite there yet.
To be continued...
Comments
comments powered by Disqus