Wednesday, July 15, 2020

Using math as an interviewer.

One of my go to interview questions is

"Generate a random binary tree".

We are only interested in structure of the tree and number of nodes will be < 1000.

I say that the ideal case is all trees are generated with the same probability (uniformly random), but explain that any algorithm where the probability of every tree is non-zero will do. Two trees are distinct if they are structurally different.

[Generating a random binary tree where each tree is equally like is not an easy problem (definitely too hard for an interview, though I have had a couple of candidate who did this!) and if you want to try it here is an old post about it: https://ruffnsluff.blogspot.com/2015/03/generate-random-binary-tree.html]


I make sure to tell them I don't need uniform (all equal probability) because it is a hard problem.

 Inspite of the explicit mention that it is a hard problem and not what I am looking for, candidates come up with all sorts of algorithms and claim it is uniformly random.

As an interviewer you need to be able to quickly tell if the algorithm is truly generating uniformly random or not.

I use a "trick" which has worked with almost every candidates' solution so far.

All I need to do is look at the random number generator calls their algorithm is making.

Typically the probabilities they generate are of the form $\frac{1}{n}$ or $\frac{1}{2}$ or $\frac{1}{3}$ or some such limited set like that.

The number of trees on $n$ nodes is the Catalan number: $ \frac{\binom{2n}{n}}{n+1}$ which tends to have many weird prime number factors (see, Bertrand's postulate). For eg, for $n=3$ we expect the probability of each tree to be $\frac{1}{5}$. For $n=4$ the probability of each tree needs to be $\frac{1}{14}$ (7 is the weird prime here).

No matter how the algorithm multiplies and adds these probabilities from the limited set together, they will fail to get the factor of the weird prime in the the denominator and thus can never generate the probability we require! I don't have to walk through their complicated algorithm and compute a tree etc.

[Caveat: Here we are assuming that the algorithm always terminates in a finite number of steps, no matter how the random numbers are generated. Section about this added at the end.]

For eg, consider this algorithm to generate a random tree.

1. Add root to tree.
2. Pick an existing node in the tree randomly that has child space
  2.1 Randomly add a child node to it.
3. Continue till tree has n nodes.


Does this generate the trees uniformly?

The probabilities involved are $\frac{1}{k}$ where $k \le n$. No matter how you combine these you will not get the the weird prime factor (the one between $n$ and $2n$) in the denominator. So this cannot possibly result in a uniform distribution.

 This looking at the denominators of the random number calls has been pretty useful when trying to see if some algorithm/process generates the probabilities we need.


As another example, here is a typical flawed algorithm to shuffle an array randomly, so that each permutation is equally likely.


for (int i = 0; i < array.length; i++) {
  int r = random(array.length); // random position where ith card goes.
  swap(array, i, r); // Swap a[i] and a[r]
}

Why is this not uniform? The random number generator only generates probability $\frac{1}{n}$. The number of permutations is $n!$ and we can never achieve $\frac{1}{n!}$ no matter how we combine those $\frac{1}{n}$s (for $n \gt 2$).

Why do we require the algorithm always terminate?


Consider the following

do {
  int bit1 = random_bit();
  int bit2 = random_bit();
  int result = bit1 + 2*bit2;
  if (result < 3) return result; //0, 1 or 2.
} while (true);

This generates a number in $\{0, 1, 2\}$ with probability $\frac{1}{3}$, even though the random number calls only generates probabilities of $\frac{1}{2}$ (the random bit calls).

Why does the denominator thing not work? Because we now have an infinite series for the probability calculation and that could give any result (even irrational numbers! See newer blog post.).

The probability of getting a $0$ is

$$\frac{1}{4} + \frac{1}{4^2} + \frac{1}{4^3} + \dots  = \frac{1}{3}$$

This corresponds to getting a 0, or a 4 (so while loop continues) and then a 0 or a 4,4, 0 and so on.

Note that the algorithm not halting with a probability zero is different from the algorithm always halting!

No comments:

Post a Comment