In this next sequence of videos we're going to study a slightly more intricate application of the dynamic programming paradigm, namely to the problem of computing optimal search trees. Search trees that minimize the average search time, with respect to a given set of probabilities over the keys. So I'm going to assume in these videos that you remember the basics of the search tree data structure. So if you need to jog your memory, you might want to review the relevant video from part one. So a search tree is they contain objects, each object has a key drawn from some totally ordered sets. And the search tree property states that at each node of a search tree, say it contains some object with a key of value x, it better be the case that everything in the left subtree of that node has keys less than x and everything in the right subtree under the node with key x has to have keys bigger than X. So that has to be true simultaneously at every single node of the search tree. The motivation for the search tree property is so that searching in a search tree just involves following your nose, just like binary search in assorted array. So if you're looking for, say, an object with key seventeen, you know whether to go left or right from the root, just based on the root's key value. If the root has key value twelve, you know where seventeen, if it exists, has to be in the right sub-tree. So you recursively search the right sub-tree. If the root has value 23, you know where seventeen has to be in the left sub-tree. So that's where you recursively search. Something we originally discussed in the context of balanced binary search trees. Like red black trees. And I'm going to reiterate now. Is that, for a given set of keys, there are many, many valid search trees containing those keys. So just to remind you how this works, let's even just suppose there were only three keys in the world, x, y, and z, with x less than, y less than z. One obvious search tree would be the balanced one. So that would put the middle element y at the roots. You would have left child x and right child y. But there's also the two chain search trees containing these three keys, one with the smallest element x at the root, the other with the largest element z at the root. So, given the multiplicity of solutions, all of the different search trees one could use to store objects with a bunch of keys, an obvious question is which one is the best. What's the best search tree to use out of all of the possibilities? So I don't blame you if you've got a sense of deja vu. We did already ask and answer this question in one sense, when we discussed red black trees. There we argued that the best thing to have is a balanced binary search tree that keeps the height as small as possible and therefore the worst case search time, which is proportional to the height, as small as possible, namely logarithmic in the number of objects in the search tree. But now let's make the same kind of assumption that we made when we discussed Huffman codes. That is, let's assume that we actually have accurate statistics about how frequently, each item in the tree is going to be searched for. So maybe we know that item X is going to be searched for 80% of the time, whereas Y and Z will only be searched for 10% each. Could we then improve, upon the perfectly balanced search tree solution? So let me make this question more concrete, just by asking you to compare two candidate solutions. On the one hand the balance tree, which has Y at the root and X and Z as children. On the other hand, the chain which has X as a root, Y as its right child and then Z as the right child of Z, excuse me, Z as the right child of Y. So what is the average search time in each of these two search trees, with respect to the search frequencies I told you, 80% for X, 10% for Y, and 10% for Z? And when I say the search time for a node, I just mean how many nodes do you look at on route to discovering what you're looking for, including that last node itself. So the search time for something that's at the root for example, that would be a search time of just one, because you only look at the root to find it. All right so the correct answer to the quiz is the fourth option 1.9 and 1.3. So to see why, let's just compute the average search time in each of the two proposed search trees. In the first one, with Y at the root, well, 80% of the time we're going to suffer a search time of two, whenever we look for X we have to look at the root Y and then we look at the X. So we pay two 80% of the time and that contributes 1.6. 10% of the time we get lucky, we see a Y, that contributes a.1. 10% of the time we see Z, that contributes another.2 for a total of 1.9. By contrast think about the chain that has X at the root. Here 80% of the time we get lucky, and we only have to pay one. Two for every search for X so that contributes only 8. to the total. It is true our worst case search time has actually gone up. When we see its Z we suffer a search time of three which never ever happened in the balance case. But we pay that three only ten% of the time, that contributes a.3. The remaining ten% of the time we suffer a search time of two to look for Y. That gives us a total of 1.3. And the moral of the story, of course, is that this example exposes an interesting algorithmic opportunity. So the obvious, quote unquote, solution of a perfectly balanced search tree, need not be the best search tree when frequency of access is non uniform. You might want to have unbalanced trees, like this chain if it gets extremely frequent searches, closer to the roots to have smaller search time. So the obvious question is then, given a bunch of items and known frequencies of access, what is the optimal. Search tree. Which search tree minimizes the average search time? So that brings us to the formal problem definition. We're told n objects that we gotta store in a search tree and we're told the frequency of access of each. So let's just keep things simple and the notation straightforward, let's just say the items are named from one, two, all the way up to n, and that is also the order of their respective keys. So key one is the frequency of searching for the item with the smallest key, and so on. You might wonder where these frequencies come from. How would you know exactly how frequently every possible key will be searched for. It's going to depend on the application. And you know there will be applications where you're not going to have these kinds of statistics. And that's where you'd probably want to turn to a general purpose balanced binary search tree solution, something like a red black tree, which guarantees you that every search is going to be reasonably fast. But it's not hard to think about applications where you are going to be able to learn pretty accurate statistics about how frequently different things are searched for. One example might be something like a spell checker. So if you would implement that by storing all of the legally spelled words in a search tree, and as you're scanning a document, and every time you hit a word, you look it up in the search tree to see if it's correctly spelled or incorrectly spelled. You can imagine that after, you know, scanning through a number of documents, you would have pretty good estimates, about how frequently things get searched for. And then you could use those estimates to build a highly optimized binary search tree for all future documents. If you're in some other kind of application where you're concerned about these frequencies changing over time, so, for example, if they're privy to trends in the industry, you could imagine rebuilding the search tree every day or every week or whatever, based on the latest statistics that you've got. In any case, if you're lucky enough to have such statistics, what you're going to want to do is to build a search tree, which, on one hand is valid. It should satisfy the search tree property and on the other hand, should make the average search time as small as possible. So let me go ahead and write down a formula for the average search time. It's the one that you would expect it to be. And also introduce some notation, namely capital C of T. We'll denote the average search time of a proposed search tree T. So for these lectures, we're going to focus on the case where all searches are successful. The only thing that ever gets searched for is stuff that's actually in the tree. But everything we'll talk about in these lectures and the algorithm is easily extended to accommodate the case where you also have unsuccessful searches and statistics about how frequent the various unsuccessful searches are. But, if there is only successful searches, then we average only over the n elements that are stored in the tree. So we sum over each of the items I, we weight it by the probability or the frequency of it's access P sub I, and then that gets multiplied by the search time required in the tree T to find the item I. And as we discussed in the quiz, the search time for a given key I and a given tree T is just the number of nodes you have to visit until you get to the one containing I. So if you think about it, that's exactly the depth of the node in this tree plus one. So for example, if you're lucky enough that the key is at the root, the depth of the root is zero, and we're counting that as a search time as one. So it's depth plus one. So, one minor point. It's going to be convenient for me to not insist that the PI's sum to one. Of course, if the PIs were probabilities, they would sum to one. But I'm going to go ahead and allow them to be arbitrary positive numbers. And that, for the same reason, I'm going to sometimes call capital C of T, the weighted search time rather that the average search time, because I won't necessarily be assuming that the PIs, sum to one. With that said, go ahead and, you know? Think of that as the canonical special case in your mind as we go through these lectures. So for example, in the case where these are probabilities, where the PIs sum to one, we could always as a reference point use a red black tree as our search tree. But, as we've seen, when these PIs are not uniform, you can generally do better. And so that's the point of this computational problem. Exploit the non-uniformities in the given probabilities to come up with the best possibly unbalanced search tree as possible. So I'm sure many of you will have noticed some of the similarities between this computational problem of optimal binary search trees. And one that we already solved back in the greedy algorithm section, namely Huffkin, Huffman codes, which, amongst all prefix free binary codes, minimize the average encoding length. So, let's just be precise about the similarities and differences between the two problems, and in particular, why we can just reuse the algorithm we already saw off the shelf, to solve the optimal BST problem. So it's, of course, super similar in the two cases, is the format of the output. In both problems, the responsibility of the algorithm is to output a binary tree, and the goal is to minimize the average depth, more or less, where the average is with respect to provided frequencies, over a bunch of objects that we care about. Characters from an alphabet in the case of Huffman codes, and a bunch of objects with keys from some totally ordered set in the binary search tree case. And it is true that in the optimal BST case, we're not really averaging depths, we're averaging depths plus one but if you think about it that's exactly the same thing. More important is to understand the differences between the problem solved by Huffman codes and the computational problem that we have to solve here. In the Huffman code case, we had to output of binary code and the key constraint was that it be prefix-free. And in the language of trees what that meant is that the symbols that we're encoding had to correspond to the leaves of the tree. Symbols could not correspond to internal nodes of the tree that we output. Now in the optimal binary search tree problem we do not have this prefix free constraint, so we're going to have a label that is an object at every single node of our tree, whether it's a leaf or not. But, we have a different, seemingly quite a bit harder constraint to deal with, namely the search tree property. So remember, back in the Huffman code case we didn't even have an ordering on the symbols of the, of the alphabet. There wasn't a sense that one of them was less than another, it wouldn't even make sense to talk about the search tree. Brought in that context. Here by contrast, we're given these keys and there's this total lowering on them. And we'd better satisfy the search tree property in the tree that we output, that is in every single node in the tree that we output, it better be the case that all keys in the left sub-tree are less than the key at that node, and all keys in the right sub-tree are bigger than the key at that node. That's a constraint that we have no choice but to satisfy. This constraint is harder in the sense that no greedy algorithm, Huffman's Algorithm or otherwise, solves the optimal binary search tree problem. Rather we're going to have to turn to the more sophisticated tool of dynamic programming to design an efficient algorithm for computing optimal binary search trees. That's the solution we'll start developing in the next video.