Headfirst Search

In another post, I prattled on at some length about the scala Set class. To understand its nuances, it was helpful to print out a graph of class and trait inheritance. Here's a contrived example that's simpler than Set:

trait C1 {}
trait C2 {}
trait D extends C1 with C2 {}
trait E1 extends D {}
trait E2 extends D {}
trait E3 extends C2 {}
trait F extends E1 with E2 with E3 {}

The hierarchy of F looks like:

which the proposed utility will display as:

interface F
  interface E1
    interface D
      interface C1
      interface C2
  interface E2
    interface D
    ...
  interface E3
    interface C2
    ...

Note that the type data will come via Java introspection, and Java doesn't know from traits, so they'll show up here as interfaces. The ... stands in for the traits of traits we've already printed out and don't need to print again.

A slightly less contrived example

class scala.math.BigInt
 class scala.math.ScalaNumber
  class java.lang.Number
   class java.lang.Object
   interface java.io.Serializable
 interface scala.math.ScalaNumericConversions
  interface scala.math.ScalaNumericAnyConversions
 interface scala.Serializable
  interface java.io.Serializable
  ...

contains more information than scaldocs, in a form that I find more digestible than scaladocs. In addition to the direct "children," as specified in the class's declaration

class BigInt extends ScalaNumber with ScalaNumericConversions with Serializable

we can see the the linear trail of class extension all the way to java.lang.Object, as well as the combining graph of traits and interfaces, showing both routes to java.io.Serializable.

What I want to talk about, though, is not the utility of the outline, but the ways one might generate it. This is a classic depth-first-search, with the minor twists that (1) we want to indent by depth and (2), because it's a directed acyclic graph rather than a tree, we need to watch out for nodes that show up more than once, so as not to print out the same subgraphs over and over again.

A standard recursive solution

looks like this, using recursion and carrying along a mutable set in which to record the visited nodes:

import scala.collection.mutable.{Set => MSet}
def inhGraph1(c : Class[_],
             indent : Int = 0,
             seen : MSet[Class[_]] = MSet[Class[_]]()) : Unit = {
  println(" "*indent + c.toString)
  if(seen.contains(c)) {
    // Do not recurse to children if we have seen this node before.
    println(" "*indent + "...")
  } else {
    // Record that we have seen the node.
    seen += c
    // Check out the superclass
    val s = c.getSuperclass
    if( s != null)
       inhGraph1(s,indent+2,seen)
    // And the interfaces
    for (cc <- c.getInterfaces)
        inhGraph1(cc,indent+2,seen)
  }
}

That's simple enough, but this mutable set business is a bit unsightly. Mutability may, in this case, be pragmatic and obviously harmless, but we're civilized, and we don't do that kind of thing. Better, I think is

using lovely, immutable, persistent data structures to write code that doesn't actually work

def inhGraph2(c : Class[_],
             indent : Int = 0,
             seen : Set[Class[_]] = Set[Class[_]]()) : Unit = {
  println(" "*indent + c.toString)
  if(seen.contains(c)) {
    println(" "*indent + "...")
  } else {
    val s = c.getSuperclass
    if( s != null)
       inhGraph2(s,indent+2,seen+c)      // <----
    for (cc <- c.getInterfaces)
        inhGraph2(cc,indent+2,seen+c)   // <----
  }
}

Here, we use the Set.+ operator to produce brand new immutable sets sporting new members. The not actually working part is that we fail to keep track correctly of what's already been printed so the output

scala> inhGraph2(classOf[F])
interface F
  interface E1
    interface D
      interface C1
      interface C2
  interface E2
    interface D
      interface C1
      interface C2
  interface E3
    interface C2

repeats the entire trait history of D unnecessarily. Although we duly recorded our visit to D, the Set we recorded it in is popped off the stack the moment the recursive call returns, and we lose it.

Whoops.

Tail recursion to the rescue,

albeit not for the usual reasons of performance and memory conservation. The fundamental design issue is that we need two data structures:

A stack, with which to track what nodes to visit next and how much to indent them.
A set, with which to track what nodes we already visited.

Normally, we get the stack for free by hitching a ride on the recursive call stack, but that convenience sometimes blinds us to all the other stuff that's being stacked unnecessarily, or, in our case, counter-effectively.

We avoid the problem if we can contort our algorithm such that nothing important is done with seen after the recursive call returns. That's another way of saying that we need our function to be tail-recursive.

At the same time we still the stack; we just don't want seen on that stack, so we

maintain the stack ourselves,

rather than language to do it.

The inner recursive function will look like

    def visit(stack: List[(Class[_],Int)], seen: Set[Class[_]]) : Unit

where the List maintains the current path of nodes and indentation, and the Set grows monotonically with every iteration. One cool thing about a stack built from a List of tuples, is that we can use pattern matching to examine and destructure it, as

   stack match { case (c,depth)::tail => ...

While we're being clever, we'll also make the code a little more general, by separating the traversal functionality from the specifics of introspection, paramterizing on a type T rather than specifying Class[_] and allowing arbitrary functions to define what to print and how to obtain children of a node. In sum:

def dfs[T](o:T)(f_pre:(T,Int)=>Unit)  // What to do with each node
               (f_dup:(T,Int)=>Unit)  // What to do if the node is a duplicate
               (get_kids:T=>List[T])  // Find children of the node
               : Unit = {
    def visit(stack: List[(T,Int)], seen: Set[T]) : Unit = {
       stack match {
          case (o,depth)::tail => {
             f_pre(o,depth)
             if(seen(o)) {
               f_dup(o,depth)
               visit(tail,seen)
              } else {
               visit( get_kids(o).zip(Stream.continually(depth+1)) ::: tail, seen + o)
              }
           }
           case _ =>
       }
    }
    visit(List((o,0)), Set[T]())
 }

def inhGraph3(c : Class[_]) = dfs[Class[_]](c){(c,i)=>println(" "*i+c.toString)}
                                    {(c,i)=>println(" "*i+"...")}
                                    {c : Class[_] => {c.getInterfaces.toList} ::: Option(c.getSuperclass).toList}

This works and yields the same output as inhGraph1. The only

awkward bit

   get_kids(o).zip(Stream.continually(depth+1))

which converts a list of child nodes o1,o2,o3 into a list of tuples (o1,d), (o2,d), (o3,d), all having the same second member. It feels a little too tricky and too reliant on non-fundamental parts of the API, but I'm not sure there's a better way to do it.

One ratification of our functional design is that can essentially be

transcribed into clojure:

(defn dfs [o f-pre f-seen f-kids]
  (loop [ [[o depth] & stack]  (list [o 0])
         seen                #{}         ]
    (when o
      (f-pre depth o)
      (if (seen o)
        (do 
          (f-seen depth o)
          (recur stack seen))
        (recur (concat (map vector (f-kids o) (repeat (inc depth))))
               (conj seen o))))))

(defn inh-graph [o]  (dfs (.getClass o)
                          #(println (apply str (repeat %1 " ")) %2)
                          #(println (apply str (repeat %1 " ")) "..." )
                          #(let [ifcs (seq (.getInterfaces %))
                                 sc   (.getSuperclass %)]
                             (if sc (conj ifcs sc) ifcs))))

Neato.