1 00:00:00,000 --> 00:00:03,716 So now that we have motivated and formally defined the optimal binary 2 00:00:03,716 --> 00:00:06,530 search tree problem, lets think about how to solve it. 3 00:00:06,530 --> 00:00:10,671 After settling on dynamic programming as the paradigm we are going to try to use 4 00:00:10,671 --> 00:00:14,707 we're going to proceed in the usual way, turning to the optimal solution for 5 00:00:14,707 --> 00:00:18,529 clues, asking in what way is it composed of optimal solutions to smaller 6 00:00:18,529 --> 00:00:23,668 sub-problems. So let me just remind you of the formal 7 00:00:23,668 --> 00:00:26,479 problem statement. There's n objects we got to store in a 8 00:00:26,479 --> 00:00:30,445 search tree, and let's just name them in the order of their keys, and let's name 9 00:00:30,445 --> 00:00:34,210 them one, two, three, all the way up to n, for simplicity, and we're also given. 10 00:00:34,210 --> 00:00:38,268 frequencies or weights reflecting how often the different objects are searched 11 00:00:38,268 --> 00:00:40,396 for. So that's p1 up to pn positive numbers. 12 00:00:40,396 --> 00:00:44,257 Canonically we think of these summing to one, being probabilities, but actually 13 00:00:44,257 --> 00:00:47,919 won't use that fact so in general they're just arbitrary positive numbers. 14 00:00:47,919 --> 00:00:51,384 The goal is to output a search tree. It should satisfy the search tree 15 00:00:51,384 --> 00:00:55,244 property, it should contain all of these objects one through n, and amongst all 16 00:00:55,244 --> 00:00:58,313 such search trees it should minimize the weighted search time. 17 00:00:58,313 --> 00:01:02,322 So the sum over all of the items I of the probability of I times the search time 18 00:01:02,322 --> 00:01:04,500 for I, namely its depth in the tree plus one. 19 00:01:05,520 --> 00:01:09,909 Now in case you're feeling cocky about the fact that the greedy algorithm works 20 00:01:09,909 --> 00:01:13,950 to solve the seemingly similar optimal prefix-free binary code problem in the 21 00:01:13,950 --> 00:01:17,741 form of Huffman's algorithm, I want to spend a little time pointing out that 22 00:01:17,741 --> 00:01:21,782 greedy algorithms are not sufficient, are not correct, to solve the optimal search 23 00:01:21,782 --> 00:01:24,632 tree problem. So if we were to design a greedy 24 00:01:24,632 --> 00:01:28,620 algorithm what kind of intuition would motivate a particular greedy rule. 25 00:01:28,620 --> 00:01:32,991 Well, staying at the objective function it's very clear that we want the objects 26 00:01:32,991 --> 00:01:37,416 that high frequency of access to be at or near the root and we want items of low 27 00:01:37,416 --> 00:01:41,350 frequency access to be in the bottom levels of the tree, like the leaves. 28 00:01:41,350 --> 00:01:45,499 So what are some ways we can compile this intuition into a Greedy algorithm? 29 00:01:45,499 --> 00:01:49,375 Well one, perhaps motivated by the success of Huffman's algorithm, is we 30 00:01:49,375 --> 00:01:53,524 could think about a bottom up approach. Now I'm not going to define what I mean 31 00:01:53,524 --> 00:01:57,891 here precisely, but informally we want to start with the bottom most levels, with 32 00:01:57,891 --> 00:02:02,313 the leaves and the nodes you want to put there are the objects that are accessed 33 00:02:02,313 --> 00:02:06,326 the least frequently. Any reasonable way of implementing this 34 00:02:06,326 --> 00:02:09,103 kind of body of greedy rule is not going to work. 35 00:02:09,103 --> 00:02:14,772 Let me show a simple counter example. So, let's just assume we have four 36 00:02:14,772 --> 00:02:19,271 objects, one, two, three, four. What I'm showing here on the right in 37 00:02:19,271 --> 00:02:23,423 pink is two possible search trees valid for those four keys. 38 00:02:23,423 --> 00:02:27,368 And let's assume that, the frequencies are as followed. 39 00:02:27,368 --> 00:02:30,343 Object one is searched for two% of the time. 40 00:02:30,343 --> 00:02:38,880 Object two for 23% of the time. Object three, the bulk of the time 73%. 41 00:02:38,880 --> 00:02:43,502 An object for only one% of the time. Any greedy algorithm which insists on 42 00:02:43,502 --> 00:02:48,632 putting the lowest frequency objects in the very bottommost level of the tree is 43 00:02:48,632 --> 00:02:53,381 not going to produce this tree on the right, which has the two% object below, 44 00:02:53,381 --> 00:02:57,941 at a lower level than the !% object. Instead, such a greedy algorithm would 45 00:02:57,941 --> 00:03:03,070 presumably output a searchtree like the one on the left, which has the two as the 46 00:03:03,070 --> 00:03:07,620 root, and then the object four, at the lowest level, at depth two. 47 00:03:07,620 --> 00:03:11,693 But you should be able to easily convince yourself that, for these probabilities, 48 00:03:11,693 --> 00:03:15,767 it's the tree on the right which is the one you want, that's optimal, because the 49 00:03:15,767 --> 00:03:19,841 object three is the one that's searched for the bulk of the time, that's the one 50 00:03:19,841 --> 00:03:23,100 you want at the root as opposed to the object two. 51 00:03:23,100 --> 00:03:27,262 So I realize I'm being a little informal here but I hope you get the idea that a 52 00:03:27,262 --> 00:03:30,962 naive bottom-up implementation of a greedy algorithm, which if you think 53 00:03:30,962 --> 00:03:34,816 about it is really what we did in Huffman's algorithm, is not going to work 54 00:03:34,816 --> 00:03:37,025 here. The same can be said about a top-down 55 00:03:37,025 --> 00:03:39,492 approach. Perhaps the simplest top-down approach 56 00:03:39,492 --> 00:03:43,706 would be just to take the most frequently searched for object and put that at the 57 00:03:43,706 --> 00:03:47,508 root and then recursively develop an appropriate left and right sub-trees 58 00:03:47,508 --> 00:03:49,932 under that most frequently accessed element. 59 00:03:49,932 --> 00:03:54,059 So let me show you again and, formally, just the same kind of counter-example. 60 00:03:54,059 --> 00:03:57,968 We're going to use the exact same four objects, the exact same two trees. 61 00:03:57,968 --> 00:04:01,931 I, I will however, change the numbers. Now, let's imagine that object one is 62 00:04:01,931 --> 00:04:06,111 searched for almost never, just one percent of the time and each of the other 63 00:04:06,111 --> 00:04:09,586 three objects are searched for roughly a third of the time each. 64 00:04:09,586 --> 00:04:13,495 But, let me sort of break ties, so that the object number two is the most 65 00:04:13,495 --> 00:04:15,264 frequently one. Searched for 34%. 66 00:04:15,264 --> 00:04:19,621 So that, in that case the Greedy algorithm will put the 34% node up in the 67 00:04:19,621 --> 00:04:24,336 roots when really what should happen is you want a perfectly balanced sub-tree 68 00:04:24,336 --> 00:04:29,111 for the objects two, three and four because each accounts for roughly a third 69 00:04:29,111 --> 00:04:32,453 of the searches. So let's give object three 33% of the 70 00:04:32,453 --> 00:04:35,080 searches and object four 32% of the searches. 71 00:04:36,940 --> 00:04:40,575 And again I'll leave it for you to convince yourself that this is indeed a 72 00:04:40,575 --> 00:04:44,597 counter example the tree's spit out by the greedy algorithm on the left, we have 73 00:04:44,597 --> 00:04:48,426 an average search time of roughly two, where as the search tree on the right we 74 00:04:48,426 --> 00:04:50,898 have an average search time of roughly five thirds. 75 00:04:50,898 --> 00:04:54,921 So we'd like to produce the tree on the right but the greedy algorithm proposed 76 00:04:54,921 --> 00:04:59,927 here will spit out the tree on the left. This of course doesn't exhaust the list 77 00:04:59,927 --> 00:05:03,374 of potential greedy algorithms. You could try others but it's not going 78 00:05:03,374 --> 00:05:05,909 to work. There's no known greedy algorithm that 79 00:05:05,909 --> 00:05:08,900 successfully solves the optimal binary search tree problem. 80 00:05:11,000 --> 00:05:14,209 So in particular if we focus on the top down approach. 81 00:05:14,209 --> 00:05:18,133 The choice of the route. The choice of what to do at the uppermost 82 00:05:18,133 --> 00:05:20,807 level. Has very hard to predict repercussions, 83 00:05:20,807 --> 00:05:24,320 for what the two different sub-problems look like. 84 00:05:24,320 --> 00:05:27,839 So this is what stymie is, not only the top down gritty approach, but also a 85 00:05:27,839 --> 00:05:31,406 naive divide and conquer approach. So for example if we just wanted to split 86 00:05:31,406 --> 00:05:34,645 the keys into the first half, and the second half, recursively compute and 87 00:05:34,645 --> 00:05:39,150 optimal B is I need on each of those two half's, and then put them back together. 88 00:05:39,150 --> 00:05:42,952 The search tree property would say that we have to, unite those two sub-solutions 89 00:05:42,952 --> 00:05:46,002 under a root which is the median, in between the two sub-problems. 90 00:05:46,002 --> 00:05:49,006 And who is to say that the median is a good choice for the root. 91 00:05:49,006 --> 00:05:52,385 Again, because the ramifications further down the tree, maybe that's a stupid 92 00:05:52,385 --> 00:05:55,169 root. But, boy, is it tempting to try to solve 93 00:05:55,169 --> 00:05:56,878 this problem recursively, right? 94 00:05:56,878 --> 00:06:00,571 We're trying to output this binary tree. It has recursive structure. 95 00:06:00,571 --> 00:06:02,886 If only we knew which root we should pick. 96 00:06:02,886 --> 00:06:06,468 We would love to recurse twice. Once to construct an optimal left 97 00:06:06,468 --> 00:06:08,839 subtree. Once to construct an optimal right 98 00:06:08,839 --> 00:06:09,541 subtree. Okay, 99 00:06:09,541 --> 00:06:13,352 so if only we knew the right route. this is starting to sound familiar, 100 00:06:13,352 --> 00:06:15,650 actually. How did it work in all our dynamic 101 00:06:15,650 --> 00:06:17,738 programming solutions? We always said, oh. 102 00:06:17,738 --> 00:06:21,393 If only a little birdie told us this one little piece of the solution. 103 00:06:21,393 --> 00:06:24,109 Well, then, you know? Then we could, sort of, look up or 104 00:06:24,109 --> 00:06:26,458 recursively compute the rest of the solution. 105 00:06:26,458 --> 00:06:29,487 And extend it back to one for the original problem, easily. 106 00:06:29,487 --> 00:06:33,351 So maybe the same thing's true here. Maybe, if only a little birdie told us 107 00:06:33,351 --> 00:06:36,223 what the root was. Then we could look up or recursively 108 00:06:36,223 --> 00:06:38,781 compute optimal solutions to smaller subproblems. 109 00:06:38,781 --> 00:06:41,235 And paste everything back together, and be done. 110 00:06:41,235 --> 00:06:45,935 That would be great. So as usual we want to make this precise 111 00:06:45,935 --> 00:06:50,794 with an optimal substructure lemma. We want to understand the way in which an 112 00:06:50,794 --> 00:06:55,652 optimal solution to an optimal BST problem must necessarily be constructed 113 00:06:55,652 --> 00:06:58,696 from optimal solutions to smaller sub-problems. 114 00:06:58,696 --> 00:07:03,943 So in the following quiz I'm going to ask you to guess what the appropriate optimal 115 00:07:03,943 --> 00:07:09,061 substructure lemma is and then after that quiz once we've identified the right 116 00:07:09,061 --> 00:07:12,300 statement at that point I will show you the proof. 117 00:07:16,900 --> 00:07:20,232 Okay, so the answer I'm looking for is the fourth one, is D. 118 00:07:20,232 --> 00:07:24,560 Which is the strongest statement of all. So the first point is that each of the 119 00:07:24,560 --> 00:07:28,517 trees T1 and T2, now as subtrees of a binary search tree these of course are 120 00:07:28,517 --> 00:07:32,266 themselves binary search trees, valid for the trees that they contain. 121 00:07:32,266 --> 00:07:36,484 And not only can they be viewed as search trees on the keys they contain, but the 122 00:07:36,484 --> 00:07:40,337 claim which we'll prove on the next couple slides is that they are indeed 123 00:07:40,337 --> 00:07:42,784 optimal. They minimize the weighted search time 124 00:07:42,784 --> 00:07:46,741 over all possible search trees for the objects contained in those two trees. 125 00:07:46,741 --> 00:07:49,396 So that gets us down, it rules out A, it rules out B. 126 00:07:49,396 --> 00:07:53,510 We can say something stronger than that, but we can even say something stronger 127 00:07:53,510 --> 00:07:58,168 than C, that each of the two trees is optimal for the items that they contain. 128 00:07:58,168 --> 00:08:02,154 We actually know exactly which items are in T1 and which items are in T2. 129 00:08:02,154 --> 00:08:06,631 And this is by the search tree property. Search tree property said that every node 130 00:08:06,631 --> 00:08:10,672 and in particular here at the roots everything to the left of the root is 131 00:08:10,672 --> 00:08:14,767 less than that of everything to the right of the node is bigger than it. 132 00:08:14,767 --> 00:08:19,244 So the root being R by assumption we know the objects one through R minus one, but 133 00:08:19,244 --> 00:08:22,869 they got to be somewhere. The only way they can be is in the left 134 00:08:22,869 --> 00:08:25,638 sub-tree, t1. So that's exactly the contents of T1. 135 00:08:25,638 --> 00:08:29,423 Similarly, the contents of t2 are precisely the objects r+1 through n. 136 00:08:29,423 --> 00:08:33,547 So the two sub-trees are optimal. And we know exactly which keys they are, 137 00:08:33,547 --> 00:08:37,841 it's everything less than r on the left and everything bigger than r on the 138 00:08:37,841 --> 00:08:40,029 right. Okay, 139 00:08:40,029 --> 00:08:42,675 so here's where things stand at the end of this quiz. 140 00:08:42,675 --> 00:08:45,771 We've identified a statement that we're really hoping is true. 141 00:08:45,771 --> 00:08:49,715 We're really hoping that an optimal BST, binary search tree, must necessarily be 142 00:08:49,715 --> 00:08:53,759 composed in this way of optimal binary search trees for the key sets to the left 143 00:08:53,759 --> 00:08:57,354 of the root and the right of the root. If that's true, hopefully with the 144 00:08:57,354 --> 00:09:01,048 experience we now have, we can sort of envision what a dynamic programming 145 00:09:01,048 --> 00:09:04,443 algorithm might look like. I'll just fill in the details in the next 146 00:09:04,443 --> 00:09:06,690 video. If this weren't true, if we didn't have 147 00:09:06,690 --> 00:09:10,634 this optimal substructure, honestly, I have no idea how we'd get started. 148 00:09:10,634 --> 00:09:14,387 It's really not clear what an algorithm would look like if this weren't true. 149 00:09:14,387 --> 00:09:16,912 So the next couple slides, I'm going to prove this to you. 150 00:09:16,912 --> 00:09:20,279 The format, you know, will not be radically different than the ones we've 151 00:09:20,279 --> 00:09:22,477 already seen. I don't think there'll be any big 152 00:09:22,477 --> 00:09:25,844 surprises, but it's so important, this really is the whole reason why the 153 00:09:25,844 --> 00:09:29,117 algorithm is going to work, I'm still going to give you a fool proof of this 154 00:09:29,117 --> 00:09:30,380 optimal substructure lemma.