In honor of !!con, I was trying to think of programming-relevant uses of double exclamation marks. Other than denoting the end of a particularly funny programming joke (for example, a mangled homage to a famous paper), it seemed the best place to look might be in the world of double factorials. As it turns out, that wasn't as fruitful as I'd hoped, but I learned something interesting on the way to finding that out, and here it is.

# TL;DR

You can learn interesting things about the order of complexity of an algorithm without knowing anything about how it works.

# Informational entropy

In statistical mechanics, the entropy \(S\) of a physical system that can exist in any of multiple states \(i\), with probability \(p_i\) is

where \(k_B\) is the Boltzmann constant. If you're not a physics person, don't worry about \(k_B\), or about why it turns out that this is the same entropy that relates temperature \(T\) to work \(Q\):

This thermodynamic relationship tells us that the amount of work you have to do to move a smidgen of entropy \(\delta S\) around is proportional to \(\delta S\).

Claude Shannon defined informational entropy for a variable with multiple discrete values \(i\) that it can assume with probability \(p_i\) in almost the same way,

the difference being a trivial multiplicative constant, since we're not specific about the base of the \(\log\), and we've dropped the Boltzmann constant (which is an artifact of the units we use to measure energy).

Note that if you have a system with exactly one state, its entropy is zero, since the probability of that state is \(p=1\) and \($\log 1 = 0\). Also, if you have a system with \(N\) equally probable states, its entropy is just \(\log N\):

# Sorting Algorithms

Suppose my variable is a list of \(n\) comparable values, all different, in random order. The number of possible random arrangements is \(N_n = n!\), and therefore the entropy is \(S = \log n!\):

This looks like it will come in close to \(n \log n\). More exactly and formally, that result is known as Sterling's approximation, which is not horribly difficult to derive, but we'll just take it as given that

If we were to sort this list, we would reduce its entropy to \(\log 1 = 0\), so we conclude that the amount of work that must be done to sort a random list is proportional to \((n \log n - n)\), which for large \(n\) is really \($n \log n\). Of course you could expend more work sorting the list if you don't know what you're doing.

Knowing this, it is not surprising that the non-awful sorting algorithms are all \(\mathcal{O}(n \log n)\), but I think it
*is* surprising that we could come to this conclusions without even looking at the algorithms themselves.

# The mystery of heapsort

You know about heapsort. Once our list of numbers is in a binary heap, then removing the minimum entry is an \(\mathcal{O}(\log n)\) operation, and doing that n times takes \(\mathcal{O}(n \log n)\), which makes it a contender.

Something I've often wondered about is the step where we heapify the random list in the first place. We start out with something that will take \(\mathcal{O}(n \log n)\) to sort by any number of methods, then do some non-trivial work, and end up with a heap, from which it will still take \(\mathcal{O}(n \log n)\) to obtain the sorted list. What's the point?

It's interesting to think about how the act of heapifying affects the entropy of the system, i.e. what is the entropy of a heap compared to that of a totally shuffled list. I am indebted to David Blackston, who suggested an interesting approach to calculating this entropy. Unfortunately, he's a luddite, so I can't link to him. Here is my somewhat physicsy (i.e. despicably over-simplified) variation on his calculation:

First, we convince ourselves that if we can write \(n=2k+1\), the total number of possible heaps for a randomized list satisfies the recurrence relation (Dave likes recurrence relations):

The way to think about this is

- We know the smallest number is at the top of the heap, so that's fixed.
- The remaining numbers can be divided equally in "2k choose k" different sets.
- Each of those sets can be represented in \(H_k\) possible heaps.

Now let's add some apparently unnecessary complications to what we already know about the number of possible random lists of length \(n=2k+1\), multiplying and dividing by the same thing:

where in the last step we use \(\binom{n}{m} = \frac{n!}{(n-m)!m!}\). Define

so

Since k+1 is likely to be large, it's close enough to k for government work, and we'll also drop the \(\log k\) term since it looks like \(\alpha_k\) will be larger.

So it must be, therefore, that \(\log\alpha\) is linear in its subscript, for some constant a:

That is, heapify has removed entropy proportional to n from the fully randomized list. So now we know what the point was. It did get us at least partially on the way down from the starting entropy. Furthermore, we also know that heapifying really oughtn't cost more than \(\mathcal{O}(n)\).

That last sentence might seem counter-intuitive, since adding an element to a heap takes \(\mathcal{O}(\log n)\), so it would seem that building up a heap of n elements should take \(\mathcal{O}(n \log n)\). Actually, it doesn't. You can analyze heapify carefully, but the gist is that, since the vast majority of elements in a heap end up living in the bottom few rows, you rarely climb the full \(\log n\) when inserting.

Once again, without knowing anything about the algorithm, we've been able to pronounce authoritatively on its order of complexity. I think this is cool.

# !!

The double factorial is defined for odd numbers n as

As it happens, the number of possible *heap ordered trees* with \(k+1\) nodes is \((2k+1)!!\). When I first learned this,
I didn't immediately realize that such trees are not just binary heaps: any node can be of any order. So, while
good times were had calculating the entropy of an object known only to be a heap ordered tree (aka "increasing
ordered tree"), the experience didn't prove helpful in understanding heapsort.

## Comments

comments powered by Disqus