Monday, December 1, 2014

Solution to find the three missing numbers

The puzzle (posted earlier):

You have an array of $2n+3$ positive integers, such that each element appears exactly twice in the array, except for three of them (say $a,b,c$) which appear exactly once.

Can you give an $O(n)$ time and $O(1)$ space (assuming RAM model) algorithm to find out what the three unique integers $a,b,c$ are?

Solution

If we had just one number missing, this would be the classic puzzle whose solution is to XOR (bitwise exclusive or, denoted $\oplus$) all the numbers. The result will be the unique number.

A similar solution works in the two unique case (missing $a,b$), where you XOR all the number together, and use a set bit position in the result (which is $a \oplus b$) to differentiate between the two unique numbers by making another pass and bucketing based on that bit. [See Collins' comments on the question post for more details.]

When you try to apply that to the three unique case, we compute $S = a \oplus b \oplus c$, but this might well be $0$. Even if some bit of $S$ was set to $1$, the corresponding bits of $a,b,c$ could all be $1$ and we cannot be sure of using that position to distinguish one of them.

But, consider this: If $S$ differed from $a$ at some bit position, then that bit position could be used as a distinguishing bit for $a,b,c$. Because if $a,b,c$ were all the same at that position, then $S$ would be too, and cannot differ from $a$. $S$ will always differ from some bit of $a$ as otherwise we will $S = a$ and thus $b = c$.

But, we don't really know what $a,b,c$ are, to find out a differing bit.

That does not matter though, if we consider the smallest position where they might differ.

If $f(x,y)$ is the smallest bit position (rightmost bit position being $0$) where $x$ and $y$ differ, then we cannot have that $f(S,a) = f(S,b) = f(S,c)$, as otherwise $a,b,c$ would agree at that position and thus would not differ from $S$ at that position. [Note that $f(x,y)$ can be computed in $O(1)$ time and space by using the classic bithack: $x \& (x-1)$.]

Thus the algorithm becomes:

Compute $S = a \oplus b \oplus c$ by doing an XOR of all $a_i$.

Now make another pass through the array and compute the XOR $T$ of all $\displaystyle 2^{f(S, a_i)}$, where $a_i$ are the elements of the array.

Note that $\displaystyle 2^{f(S,a_i)}$ can be computed in $O(1)$ time, and would fit in the underlying register (so $O(1)$ space). We also ignore $a_i$ for which $S = a_i$.

Now, $T$ cannot be $0$ as it is the bitwise XOR of three powers of $2$. Thus, a set bit of $T$ can now be used to distinguish one of $a,b,c$, and we can then fall back to the two unique case to find the other two.

No comments:

Post a Comment