Monday, September 22, 2014

Iterators part three: A General Binary Tree Iterator

[This is part three in a series about iterators]

Introduction

If you have heard of binary trees, you probably know what in-order, pre-order and post-order traversals are. These traversals are usually defined in a recursive fashion.

For instance, in-order is defined: traverse left sub-tree, visit node, traverse right sub-tree.

These kind of definitions lead to easy recursive implementations.

A common interview question is to convert these recursive method to iterative methods, which can be solved in a general fashion by doing what the compilers do: simulate the recursion using a call stack.

A possible followup to that is to add the ability to do the traversal in a lazy fashion: provide a next interface, which will return the visited nodes one by one (with next being reasonably efficient). [One could use yield, if the language provides it, but we will assume that it is unavailable.]

If you search the web, you will probably see different iteration solutions, custom made for in-order, pre-order etc.

A General Lazy Iterator for Binary Trees

In this article we will discuss a general lazy iterator for binary trees, for traversals defined recursively. This includes not just in-order etc, but also crazy traversals like traverse left node, traverse right node, visit node, visit node, traverse left node.

These general traversals can be represented as a string consisting of the letters "L", "R" and "V".

For instance, in-order traversal is "LVR", L for traverse left sub-tree, V for visit node and R for traverse right sub-tree. Post-order is "LRV" and the crazy traversal defined above is "LRVVL".

So, given a traversal method in the form a traversal string (like "LVR" and which contains at least one V), and the tree, we would like to create a lazy iterator, which will allow us to visit the nodes of the given tree in the given traversal order.

If you were to write a recursive method for the traversal, the code might look something like this (python):

# traversal_method is a string like "LVLR".
def traverse(tree, traversal_method):
    if tree == None:
        return
    for method in traversal_method:
        if method == "L":
            traverse(tree.Left, traversal_method)
        if method == "R":
            traverse(tree.Right, traversal_method)
        if method == "V":
            Visit(tree.Value) 

This recursive method can be converted to an iterative method, by maintaining a stack of (tree,  traversals_left) pairs.

For example, if the traversal required is "LVR", we first push the (root, "LVR") pair. Once the left sub-tree is traversed because of the "L" (started by pushing the left child), we update that to (root, "VR"), we then visit the node, update that node to (root, "R"), then traverse the right sub-tree.

Once a node's traversals_left becomes empty, we pop it off the stack, pick up what is on the top, examine what needs to be done and continue.

This is what the iterative version might look like (python):


# traversal_method is a string like "LVLR".
def traverse(tree, traversal_method):
    if tree == None:
        return
    stack = [(tree, traversal_method)]
    while len(stack) > 0:
        node, traversal_left = stack[-1]
        del stack[-1] # pop
        if node == None:
            continue
        # in python L[-1] is the last element of the list.
        # and L[1:] is the list without the first element.
        if len(traversal_left[1:]) != 0
            # push.
            stack.append((node, traversal_left[1:]))
        if traversal_left[0] == "L":
            stack.append(node.Left, traversal_method)
        if traversal_left[0] == "R":
            stack.append(node.Right, traversal_method)
        if traversal_left[0] == "V":
            visit(node.Value)        

In languages that support yield, an easy way to make this lazy would be to replace visit(node.Value) above with yield node.Value.

Assuming we don't have such a language feature, we want to implement a next interface, which can give us the nodes one at a time, and in a reasonably efficient manner (both worst case and amortized per call).

[We could probably blindly simulate what the C# compiler/python interpreter does behind the scenes for implementing yield, but the result might turn out to be an unreadable mess and we will probably have to spend some effort making it readable, anyway.]

We implement next as two steps.

First step is move, which will do the iteration of pushing/popping nodes into/off the stack, till we get to a node which needs to be visited, resulting in the appropriate node being at the top of the stack.

The second step of next would be to update the top of the stack indicating that a visit was just performed (and returning the tree node to the caller).

We also maintain the invariant that only the move step will change the size of the stack.

A C++ implementation might look like: [Click here to expand/collapse]

And here is a sample usage of this iterator
// Determine if two nodes of a binary search tree
// sum to target.
bool has_sum(Tree *tree, int target) {
  if (tree == NULL) return false;

  Traverser ascending(tree, "LVR");
  Traverser descending(tree, "RVL");

  Tree *left = ascending.next();
  Tree *right = descending.next();

  while (left->Value <= right->Value;) {
    int sum = left->Value + right->Value;
    if (sum == target) {
      return true;
    }
    if (sum > target) {
      right = descending.next();
    } else {
      left = ascending.next();
    }
  }
  return false;
} 

Analysis of next

If the size of the traversal description string is T [O(1) for most useful cases], and the maximum depth of tree is H, then next would run in O(TH) time, and would use O(TH) space.

Even though each call to next is O(TH), it is O(T) amortized per node visited (averaging over the whole traversal), as each node and each edge of the tree is touched only O(T) times.

Conclusion

Even though any recursive implementation which returns a list of elements, can theoretically be modified into a lazy iterative version, explicitly doing it for a binary tree was an interesting exercise.

This approach can also be generalized to general trees, where instead of "L" and "R", we can use child numbers etc.

No comments:

Post a Comment