Friday, October 17, 2014

Solution to Construct the BST algo puzzle.

[This is a solution to the construct the binary search tree algorithm puzzle posted earlier]

A brief description of the puzzle:

If you insert $a_1$, then $a_2$, then $a_3$ and so on till $a_n$ into a binary search tree, with no rebalancing etc, you get a unique tree with $a_1$ as the root, $a_2$ as a child of the root etc.

A naive algorithm to construct the unique tree is $\Omega(n^2)$ in the worst-case.
(e.g: $a_1 \lt a_2 \lt \dots \lt a_n$).

Give an $O(n\log n)$ time algorithm to construct the tree.

We will present three solutions.

Solution 1)

This solution is based on Shyam's idea.

Initially we start with $(-\infty, \infty)$.

When we insert $a_1$ this interval splits into two: $(-\infty, a_1), (a_1, \infty)$. The left interval corresponds to the left sub-tree of $a_1$ and the right interval corresponds to the right sub-tree of $a_1$.

Basically, the binary search tree can be looked at as a collection of disjoint intervals, whose union is $(-\infty, \infty)$ (ignoring the split points).

Each time you insert a new value into a tree, some interval is split into two, around that value.

Now we can maintain the intervals as a red-black tree whose keys are the left end points, and the values are the right endpoints, plus a pointer to a node in the target tree (target tree is the tree which we want to construct). This node in the target tree holds the value which resulted in the creation of that interval.

Given a new value which we need to insert into the target tree, say $a_j$, we look-up the predecessor of $a_j$ in the tree of intervals ($O(\log n)$ time), and can now add the new node at the appropriate place in the target tree, in $O(1)$ time (and update the book-keeping).

Solution 2)

This solution is based on Arvind's idea, and is perhaps essentially the same as Solution 1). It is the simplest, though.

The key observation used in this solution is the following:

When you insert $x$ into a binary search tree, it will either be the right child of it's predecessor among the elements already present in the tree, or will be the left child of it's successor among the elements already present in the tree.

So you maintain a red-black tree along with the target tree. The nodes in the red-black tree maintain pointers to the corresponding node in the target tree, and also the when it was inserted (i.e. for $a_j$ you store $j$).

Now when you are inserting $a_j$, lookup both the successor and predecessor of $a_j$ in the red-black tree.

Say we get $a_p$ and $a_s$ with $a_p \lt a_j \lt a_s$.

Now if $p \lt s$ (predecessor was inserted before), then $a_j$ will be the left left child of $a_s$ in the target tree. If $p \gt s$, then $a_j$ will be the right child of $a_p$ in the target tree.

Solution 3)

This uses a lesser known data-structure called Cartesian Trees.

Given points $$(x_1, y_1), (x_2, y_2), \dots, (x_n, y_n)$$ with $x_1 \lt x_2 \lt \dots \lt x_n$, a cartesian tree of these points is basically a treap (a tree + heap), with the $x_i$ forming the binary search tree values, and $y_i$ from the heap values: i.e. each node of the tree with contain an $(x_i, y_i)$ pair, and when you just look at the $x_i$, the tree will be a binary search tree, and when you just look at the $y_i$, the tree will be a heap.

This tree can be constructed in $O(n)$ time (see the wiki page for details).

Suppose the sorted list of $a_1, \dots, a_n$ is $b_1, \dots, b_n$, with $b_1 \lt b_2 \lt \dots \lt b_n$. Also, suppose $b_k = a_{f(k)}$.

Then we are looking for the Cartesian Tree corresponding to the points $$(b_1, f(1)), (b_2, f(2)), \dots, (b_n, f(n))$$ with the $f(i)$ forming a min-heap.

We can sort $a_i$ in $O(n \log n)$ time, and create the cartesian tree in $O(n)$ time.

No comments:

Post a Comment