So let's take any old optimal binary search tree for keys one through n with frequencies P1 through Pn. The thing we're trying to prove asserts that the left subtree T1 should be optimal for its keys, one through r minus one, and the right subtree T2 should be optimal for its keys, r plus one through n. So we're going to proceed by contradiction, we're going to assume that's not true. What would that mean? That means for one of these two subproblems, either one through R - 1 or R + 1 through n, there's an binary search tree with even smaller weighted search cost, search time, then T1 or T2 respectively. The two cases are totally the same whether. In our contradiction, we assume T1 is not optimal or the T2 is not optimal. I'm just going to prove it in the case the T1 is not optimal. So if T1 is not optimal, there's gotta be a search tree on its keys, one through r - 1 which is better. Call that purportedly better solution T star one. Now as usual, the thing which we're going to try to contradict to get the final proof is we're going to exhibit a search tree on all of the keys, one through n which is even better than T. The T was supposed to be optimal, so that would be a contradiction. So how do we get our superior search tree for all of the keys? Well, we're just going to take T and we're going to do cut and paste. We're going to do surgery on the tree T, ripping out it's left subtree T1 and pasting in this subtree T star one. Call the resulting tree T star. So, to complete the contradiction and therefore the proof of the optimal substructure lemma, all we have to show is that the weighted search cost of T star is strictly less than that of T, that would contradict the purported optimality of T. So that's precisely what I'll show you on this next slide and it's going to be evident if we do a suitable calculation. Here it is. So let's begin just by expanding the weighted search time of our original tree, capital T, the purportedly optimal one. Let's expand its definition. So you have one sum n for each of the items i and the weight we give to a given item is just its frequency p sub i. And we multiply that by the search time of for i n T So let me now pause to tell you the point of this calculation we're about to do. So we start with the weighted search time in T, and that's, of course expressed in terms of search times in this tree T. What I want to show next is that can equally be, be well expressed in terms of search times in the subtrees T1 and T2. That is, there's a simple formula to compute the search time in T if I told you the search times in T1 and T2. That's what's going to allow us to easily analyze the ramifications of the cut and paste and to notice that in cutting and pasting in a better tree for the subproblem, we actually get a better tree for the original problem. So the right way to search time in T in terms of the weighted search time in T1 and T2, it's going to be convenient to bucket the items into three categories. Those that are in the left subtree T1, i.e. one through r minus one, those in the right subtree T2, r plus one through n, and then of course left over is the root r itself. So let's just break down the sum into its three constituent parts. So the sum and corresponding to the root r contributes the frequency of r times the search time for r, namely one because it's the root. And then we have our sum, over just the items up to r - 1 of their frequencies times their search times and similarly those r plus one up to n, their frequencies times their search times nT. So for the next step, let's observe it's very easy to connect search times in T with search times in the subtrees T1 and T2. Suppose you were, you, example, you were searching for some key in T that was in T1. So some key that was less than r. What happens when you search for this item in the tree T? Well, first you visit r, it's not r, it's less than r, so you go left, and then you just search as appropriately through the search tree T1. That is, the search time in the big tree T is simply the search time in the subtree T1 + 1, 1 because you had to visit the root r before you made it in to the subtree T1. So that is, for any object i other than the root, we can write its search time in the big tree T as simply one plus the search time in the appropriate subtree. So now, let me expand in groups and terms just to clean up this mess. So now that the dust has settled, let's inspect each of these three sums. We actually have a quite simple understanding of each of them. So, the first sum is just the sum of the Pi. So for example, in the conical case where we're dealing with actual probabilities, this is just one. The point is, whatever the Pis are, this first sum is just a constant, meaning it's independent of the tree T, with which we started. What's the second sum? So it's the sum of the objects from one to r minus one, the frequency of ithe times search4I time for i in the searchtree T1. Well, we have a much better, shorter nickname for that sum. It's the weighted search time of the search tree T1, for the objects that contains one through r minus one. Similarly this last sum, sum from i over r plus one to n of the frequency of i times the search time in T2, that's just the weighted search time in the search tree T2. So, we did this computation with the purportedly optimal search tree capital T in mind. But if you think about it, you look at this algebra. This, it applies to any search tree you like. For any search tree, the weighted search cost can be expressed as the sum of the Pis plus the weighted search cost of the left subtree of the root plus the weighted search cost of the right subtree of the root. In particular, that's true for this tree T star we got by cutting and pasting the better left subtree T star one and for T1. So applying this reasoning to T star, we can right its weighted search cost as the sum from micro one to n of all the Pis plus the weighted search cost of its left subtree of its root, which remember by construction is T star one, and then it has the same right sub tree as the tree T does, namely T2. And the point of all this algebra is that we now see in a very simple way what are the ramifications of cutting and pasting in this better left subtree, we get a better tree for the original key set contradicting the purported optimality of the tree T that we started with. That completes the proof of the optimal substructure lemma for optimal binary search trees.