In another post, I prattled on at some length about the scala Set
class. To understand its nuances, it was helpful to print out a graph of class and trait inheritance.
Here's a contrived example that's simpler than Set
:
trait C1 {}
trait C2 {}
trait D extends C1 with C2 {}
trait E1 extends D {}
trait E2 extends D {}
trait E3 extends C2 {}
trait F extends E1 with E2 with E3 {}
The hierarchy of F looks like:
C1 C2
\ / \
D \
/ \ \
E1 E2 E3
\ /__/
F
which the proposed utility will display as:
interface F
interface E1
interface D
interface C1
interface C2
interface E2
interface D
...
interface E3
interface C2
...
Note that the type data will come via Java introspection, and Java doesn't know from traits, so they'll show up here
as interfaces. The ...
stands in for the traits of traits we've already printed out and don't need to print again.
A slightly less contrived example
class scala.math.BigInt
class scala.math.ScalaNumber
class java.lang.Number
class java.lang.Object
interface java.io.Serializable
interface scala.math.ScalaNumericConversions
interface scala.math.ScalaNumericAnyConversions
interface scala.Serializable
interface java.io.Serializable
...
contains more information than scaldocs, in a form that I find more digestible than scaladocs. In addition to the direct "children," as specified in the class's declaration
class BigInt extends ScalaNumber with ScalaNumericConversions with Serializable
we can see the the linear trail of class extension all the way to java.lang.Object
,
as well as the combining graph of traits and interfaces, showing both routes to java.io.Serializable
.
What I want to talk about, though, is not the utility of the outline, but the ways one might generate it. This is a classic depth-first-search, with the minor twists that (1) we want to indent by depth and (2), because it's a directed acyclic graph rather than a tree, we need to watch out for nodes that show up more than once, so as not to print out the same subgraphs over and over again.
A standard recursive solution
looks like this, using recursion and carrying along a mutable set in which to record the visited nodes:
import scala.collection.mutable.{Set => MSet}
def inhGraph1(c : Class[_],
indent : Int = 0,
seen : MSet[Class[_]] = MSet[Class[_]]()) : Unit = {
println(" "*indent + c.toString)
if(seen.contains(c)) {
// Do not recurse to children if we have seen this node before.
println(" "*indent + "...")
} else {
// Record that we have seen the node.
seen += c
// Check out the superclass
val s = c.getSuperclass
if( s != null)
inhGraph1(s,indent+2,seen)
// And the interfaces
for (cc <- c.getInterfaces)
inhGraph1(cc,indent+2,seen)
}
}
That's simple enough, but this mutable set business is a bit unsightly. Mutability may, in this case, be pragmatic and obviously harmless, but we're civilized, and we don't do that kind of thing. Better, I think is
using lovely, immutable, persistent data structures to write code that doesn't actually work
def inhGraph2(c : Class[_],
indent : Int = 0,
seen : Set[Class[_]] = Set[Class[_]]()) : Unit = {
println(" "*indent + c.toString)
if(seen.contains(c)) {
println(" "*indent + "...")
} else {
val s = c.getSuperclass
if( s != null)
inhGraph2(s,indent+2,seen+c) // <----
for (cc <- c.getInterfaces)
inhGraph2(cc,indent+2,seen+c) // <----
}
}
Here, we use the Set.+
operator to produce brand new immutable sets
sporting new members. The not actually working part is that we fail to
keep track correctly of what's already been printed so the output
scala> inhGraph2(classOf[F])
interface F
interface E1
interface D
interface C1
interface C2
interface E2
interface D
interface C1
interface C2
interface E3
interface C2
repeats the entire trait history of D
unnecessarily. Although we duly
recorded our visit to D
, the Set
we recorded it in is popped off
the stack the moment the recursive call returns, and we lose it.
Whoops.
Tail recursion to the rescue,
albeit not for the usual reasons of performance and memory conservation. The fundamental design issue is that we need two data structures:
- A stack, with which to track what nodes to visit next and how much to indent them.
- A set, with which to track what nodes we already visited.
Normally, we get the stack for free by hitching a ride on the recursive call stack, but that convenience sometimes blinds us to all the other stuff that's being stacked unnecessarily, or, in our case, counter-effectively.
We avoid the problem if we can contort our algorithm such that nothing important
is done with seen
after the recursive call returns.
That's another way of saying that we need our function to be
tail-recursive.
At the same time we still the stack; we just don't want
seen
on that stack, so we
maintain the stack ourselves,
rather than language to do it.
The inner recursive function will look like
def visit(stack: List[(Class[_],Int)], seen: Set[Class[_]]) : Unit
where the List
maintains the current path of nodes and indentation, and
the Set
grows monotonically with every iteration. One cool thing about a stack
built from a List
of tuples, is that we can use pattern matching to examine and
destructure it, as
stack match { case (c,depth)::tail => ...
While we're being clever, we'll also make the code a little more general, by separating
the traversal functionality from the specifics of introspection, paramterizing on
a type T
rather than specifying Class[_]
and allowing arbitrary functions
to define what to print and how to obtain children of a node. In sum:
def dfs[T](o:T)(f_pre:(T,Int)=>Unit) // What to do with each node
(f_dup:(T,Int)=>Unit) // What to do if the node is a duplicate
(get_kids:T=>List[T]) // Find children of the node
: Unit = {
def visit(stack: List[(T,Int)], seen: Set[T]) : Unit = {
stack match {
case (o,depth)::tail => {
f_pre(o,depth)
if(seen(o)) {
f_dup(o,depth)
visit(tail,seen)
} else {
visit( get_kids(o).zip(Stream.continually(depth+1)) ::: tail, seen + o)
}
}
case _ =>
}
}
visit(List((o,0)), Set[T]())
}
def inhGraph3(c : Class[_]) = dfs[Class[_]](c){(c,i)=>println(" "*i+c.toString)}
{(c,i)=>println(" "*i+"...")}
{c : Class[_] => {c.getInterfaces.toList} ::: Option(c.getSuperclass).toList}
This works and yields the same output as inhGraph1
. The only
awkward bit
is
get_kids(o).zip(Stream.continually(depth+1))
which converts a list of child nodes o1,o2,o3
into a list of tuples (o1,d), (o2,d), (o3,d)
, all having the same
second member. It feels a little too tricky and too reliant on non-fundamental parts of the API, but I'm not sure
there's a better way to do it.
One ratification of our functional design is that can essentially be
transcribed into clojure:
(defn dfs [o f-pre f-seen f-kids]
(loop [ [[o depth] & stack] (list [o 0])
seen #{} ]
(when o
(f-pre depth o)
(if (seen o)
(do
(f-seen depth o)
(recur stack seen))
(recur (concat (map vector (f-kids o) (repeat (inc depth))))
(conj seen o))))))
(defn inh-graph [o] (dfs (.getClass o)
#(println (apply str (repeat %1 " ")) %2)
#(println (apply str (repeat %1 " ")) "..." )
#(let [ifcs (seq (.getInterfaces %))
sc (.getSuperclass %)]
(if sc (conj ifcs sc) ifcs))))
Neato.
Comments
comments powered by Disqus