1 00:00:01,100 --> 00:00:05,874 So now that we have our magic formula, our recurrence, all that's left to do is 2 00:00:05,874 --> 00:00:11,138 to systematically solve the subproblems. As usual, it is crucial that we solve the 3 00:00:11,138 --> 00:00:14,382 subproblems in the right order, from smallest to largest. 4 00:00:14,382 --> 00:00:19,033 How should we measure the size of a subproblem in the optimal binary search 5 00:00:19,033 --> 00:00:23,930 tree problem? The natural way to do it is the number of items in the subproblem. 6 00:00:23,930 --> 00:00:29,010 So if you're starting at I and you're going till J, the number of items in that 7 00:00:29,010 --> 00:00:33,723 subproblem is j - i + 1 that, and that's going to be our measure of subproblem 8 00:00:33,723 --> 00:00:35,791 size. So let's bust out our [INAUDIBLE] array. 9 00:00:35,791 --> 00:00:39,817 The number of dimensions of this array is going to be two and that's because we 10 00:00:39,817 --> 00:00:43,189 have two different degrees of freedom for annexing subproblems, 11 00:00:43,189 --> 00:00:46,260 one for the start of the contiguous interval, one for the end. 12 00:00:47,360 --> 00:00:50,930 So the outer for loop is going to control the subproblem size. 13 00:00:50,930 --> 00:00:55,450 It's going to ensure that we solve all smaller subproblems before proceeding to 14 00:00:55,450 --> 00:00:58,741 larger subproblems. Specifically, we'll be using an index s S 15 00:00:58,741 --> 00:01:03,205 and in the iteration of this outer for loop, whatever the current value of S is, 16 00:01:03,205 --> 00:01:06,385 we're only going to consider subproblems of size s + 1. 17 00:01:06,385 --> 00:01:11,016 So you should think of s as representing the difference between the larger index j 18 00:01:11,016 --> 00:01:14,698 and the earlier index i. The inner for loop controls the first 19 00:01:14,698 --> 00:01:17,711 item in the contiguous interval that we're looking at, 20 00:01:17,711 --> 00:01:22,000 so that's just i. And now, all we have to do is rewrite the 21 00:01:22,000 --> 00:01:27,280 reccurrence in terms of the array entries and with this change variable where S 22 00:01:27,280 --> 00:01:31,594 corresponds to j - i. That is for a given subproblem, starting 23 00:01:31,594 --> 00:01:34,814 with the item i and ending with the item i + s. 24 00:01:34,814 --> 00:01:39,901 We just by by brute force pick the best route, so the route here is going to be 25 00:01:39,901 --> 00:01:44,216 similar to i and i + s. Regardless of the choice of the route, we 26 00:01:44,216 --> 00:01:49,367 pick up the constant, the sum of the Pks where here, K, is ranging from the first 27 00:01:49,367 --> 00:01:54,156 item i to the last item i + s. And then we also look at the previously 28 00:01:54,156 --> 00:01:57,293 computed optimal solution values for the two relevant subproblems, 29 00:01:57,293 --> 00:02:01,368 one starting in i, ending in r - 1. The other starting in r + 1 and ending at 30 00:02:01,368 --> 00:02:03,663 i + s. So a couple of quick comments about the 31 00:02:03,663 --> 00:02:06,379 two array lookups on the right-hand side of this formula. 32 00:02:06,379 --> 00:02:10,219 so first of all, if you choose i to be, the root to be the first item i, then the 33 00:02:10,219 --> 00:02:13,685 first lookup doesn't make sense. If you choose it to be the last item, the 34 00:02:13,685 --> 00:02:17,431 second array lookup doesn't make sense. In that case, it's understood we're just 35 00:02:17,431 --> 00:02:21,178 going to interpret these lookups as zero. Of course, in an actual implementation, 36 00:02:21,178 --> 00:02:24,877 you'd have to include that code but, I'll let you take care of that on your own. 37 00:02:24,877 --> 00:02:28,542 So the second comment is just our usual sanity check and again, you should always 38 00:02:28,542 --> 00:02:31,450 do this when you write out a dynamic programming algorithm. 39 00:02:31,450 --> 00:02:34,554 When you write down your formula to populate the array entries, 40 00:02:34,554 --> 00:02:38,004 make sure that on the right-hand side, whenever you do an array lookup, 41 00:02:38,004 --> 00:02:41,502 that is indeed already computed and available for constant time lookup. 42 00:02:41,502 --> 00:02:45,494 So in this case, whatever our choice of the root is, the two relevant subproblems 43 00:02:45,494 --> 00:02:48,845 are going to involve strictly fewer items than what we started with. 44 00:02:48,845 --> 00:02:52,885 And therefore, the two subproblem lookups on the right-hand side will indeed have 45 00:02:52,885 --> 00:02:56,039 been computed in some previous iteration of the outer for loop. 46 00:02:56,039 --> 00:03:00,083 Remember, the outer for loop is ensuring we solve some problems from smallest 47 00:03:00,083 --> 00:03:02,750 number of items up to largest number of items. 48 00:03:02,750 --> 00:03:07,031 And of course, after the two for loops complete, what we really care about is 49 00:03:07,031 --> 00:03:10,590 the answer in A of one comma n, that is the optimal binary search tree 50 00:03:10,590 --> 00:03:13,760 value for all of the items that's the eventual output. 51 00:03:13,760 --> 00:03:19,374 Some students like to think about these double for loops pictorially. So let's 52 00:03:19,374 --> 00:03:22,730 imagine A, the 2-D array is laid out as a grid. 53 00:03:22,730 --> 00:03:25,467 So imagine the x-axis correspond to the index i, 54 00:03:25,467 --> 00:03:29,573 that is the first item in the set of items we're looking at and the y-axis 55 00:03:29,573 --> 00:03:33,844 corresponding to j, the last item in the current set. And let me single out the 56 00:03:33,844 --> 00:03:37,403 diagonal of this grid, so these are subproblems corresponding to 57 00:03:37,403 --> 00:03:39,922 i = j, that is subproblems with a single 58 00:03:39,922 --> 00:03:42,550 element. Now, we only ever solve problems where j 59 00:03:42,550 --> 00:03:46,875 is at least as large as i, so that means we're really only filling in the upper 60 00:03:46,875 --> 00:03:50,927 left or northwestern part of this table. So we never bother to fill in the 61 00:03:50,927 --> 00:03:53,609 southeastern, the bottom right part of this table. 62 00:03:53,609 --> 00:03:58,435 We just sort of think of it all as zero. Now, in the first outer iteration. 63 00:03:58,435 --> 00:04:01,550 So, when s0. = 0, that's when our dynamic programming 64 00:04:01,550 --> 00:04:05,614 algorithm solves, in turn, each of the n single item problems. 65 00:04:05,614 --> 00:04:10,761 So, in the first iteration of this double for loop, it's going to solve the 66 00:04:10,761 --> 00:04:14,283 subproblem A11. In the next iteration of the inner for 67 00:04:14,283 --> 00:04:17,670 loop, it's going to proceed the A22, then A33, and so on. 68 00:04:17,670 --> 00:04:22,614 In each of those, both of the array lookups are going to just correspond to 69 00:04:22,614 --> 00:04:27,219 zero and we're just going to fill in this diagonal with the base cases, 70 00:04:27,219 --> 00:04:30,200 where Aii is just the probability of item i. 71 00:04:30,200 --> 00:04:34,586 Then, as the dynamic programming algorithm proceeds, we're going to be 72 00:04:34,586 --> 00:04:38,972 filling in the upper left portion of this table diagonal by diagonal. 73 00:04:38,972 --> 00:04:43,995 Each time we increment s, the index in the outer for loop, we're going to march 74 00:04:43,995 --> 00:04:49,271 up to the next northwesternmost diagonal, and then as we step through the possible 75 00:04:49,271 --> 00:04:53,848 values of i, we're going to fill in that diagonal one at a time moving from 76 00:04:53,848 --> 00:05:00,326 southwest to northeast. When we're filling in the value of a 77 00:05:00,326 --> 00:05:06,153 subproblem on one of these diagonals, all we need to do is lookup the value for two 78 00:05:06,153 --> 00:05:11,243 subproblems on lower diagonals. Lower diagonals correspond to subproblems with 79 00:05:11,243 --> 00:05:13,285 strictly fewer items. So that's it. 80 00:05:13,285 --> 00:05:17,294 That's the dynamic programming algorithm that computes the value of an optimal 81 00:05:17,294 --> 00:05:20,178 binary search tree given a set of items with probabilities. 82 00:05:20,178 --> 00:05:22,378 I'm not going to say anything about correctness. 83 00:05:22,378 --> 00:05:24,626 It's the, the same story as we've seen in the past. 84 00:05:24,626 --> 00:05:27,902 All the heavylifting is improving the optimal substructure lemma, 85 00:05:27,902 --> 00:05:31,617 that gave us the correctness of our occurrence given that our magic formula 86 00:05:31,617 --> 00:05:35,137 is correct and we're just applying it systematically, correctness of the 87 00:05:35,137 --> 00:05:39,243 dynamic programming algorithm follows in a straightforward way, just by induction. 88 00:05:39,243 --> 00:05:42,030 Let me, however, make some comments about the running time. 89 00:05:42,030 --> 00:05:43,975 So, let's just follow the usual procedure. 90 00:05:43,975 --> 00:05:47,820 Let's just look at how many subproblems got to get solved and then how much work 91 00:05:47,820 --> 00:05:50,288 has to get done to solve each of those subproblems. 92 00:05:50,288 --> 00:05:53,943 So as far as the number of subproblems, it's all possible choices of i and j, 93 00:05:53,943 --> 00:05:58,116 where i is at most j, or in other words, it's essentially half of that n by n 94 00:05:58,116 --> 00:06:00,600 grid. So this is roughly n squared over two, 95 00:06:00,600 --> 00:06:04,990 let's just call it theta of n squared, so a quadratic number of subproblems. 96 00:06:04,990 --> 00:06:09,113 Now, for each of the subproblems, we have to evaluate this recurrence, we have to 97 00:06:09,113 --> 00:06:13,031 evaulate the formula, which conceptually is a breadth-first search through the 98 00:06:13,031 --> 00:06:16,639 number of candidates that we've identified. And a disctinction between 99 00:06:16,639 --> 00:06:20,351 this dynamic programming algorithm and all of the other ones we've seen 100 00:06:20,351 --> 00:06:24,297 recently, sequence allignment, knapsack, computing independent set of the line 101 00:06:24,297 --> 00:06:28,218 graphs, is that it's actually kind of a lot of options for what the optimal 102 00:06:28,218 --> 00:06:31,670 solution can be. That is, our breadth-first search, for the first time, 103 00:06:31,670 --> 00:06:34,493 is not over a merely constant number of possibilities. 104 00:06:34,493 --> 00:06:38,624 We have to try every possible route, each of the items in our given subproblem 105 00:06:38,624 --> 00:06:42,755 is a candidate route and we try them all. So, given a start item of i and an end 106 00:06:42,755 --> 00:06:46,729 item of j, there's j minus i plus one total items and we have to do constant 107 00:06:46,729 --> 00:06:51,911 work for each one of those choices. So there will be subproblems, some 108 00:06:51,911 --> 00:06:56,517 subproblems that we can evaluate quickly and only say constant time if i and j are 109 00:06:56,517 --> 00:06:59,622 very close to each other, but for a constant fraction of the 110 00:06:59,622 --> 00:07:03,037 subproblems we have to deal with, this is going to be linear time, 111 00:07:03,037 --> 00:07:05,625 theta of n time. So over all, that gives us a cubic 112 00:07:05,625 --> 00:07:09,299 running time, theta of n cubed. Alright, so I would say this running time 113 00:07:09,299 --> 00:07:12,456 is sort of okay, not great. So it is polynomial time, that's good. 114 00:07:12,456 --> 00:07:15,768 That's certainly way, way, way faster than enumerating all of the exponentially 115 00:07:15,768 --> 00:07:20,115 many possible binary search trees, so it blows away breath-first search. 116 00:07:20,115 --> 00:07:24,601 But it's not something I would call blazingly fast or for free primitive or 117 00:07:24,601 --> 00:07:27,361 anything like that. So you're going to be able be to solve 118 00:07:27,361 --> 00:07:30,722 problem sizes with n in the 100's, but probably not n in the 1000's. 119 00:07:30,722 --> 00:07:34,434 So that will cover some applications where you'd want to use this optimal 120 00:07:34,434 --> 00:07:38,348 binary search tree algorithm, but not all of them. So it's good for some things, 121 00:07:38,348 --> 00:07:41,860 but it's not a universal solution. On the other hand, here's a fun fact. 122 00:07:45,360 --> 00:07:49,398 And the fun fact is you can actually speed up this dynamic programming 123 00:07:49,398 --> 00:07:52,811 algorithm significantly. You can keep with the exact same 2-D 124 00:07:52,811 --> 00:07:57,134 array with the exact same semantics. Again, each index is going to correspond 125 00:07:57,134 --> 00:08:01,457 to the subproblem with the optimal binary search tree between items i and j 126 00:08:01,457 --> 00:08:04,415 inclusive. But, you can actually fill up this entire 127 00:08:04,415 --> 00:08:08,112 table all n squared entries using only a total of n squared time. 128 00:08:08,112 --> 00:08:10,900 That is on average, constant work per subproblem. 129 00:08:10,900 --> 00:08:14,458 So this fun fact, it's very clever. It's really more intricate than what we 130 00:08:14,458 --> 00:08:17,392 discussed in this video here, but it, it's not impossible to read. 131 00:08:17,392 --> 00:08:21,095 So if you're interested, I encourage you to go back to the original papers or 132 00:08:21,095 --> 00:08:24,990 search the web for some other resources on this optimized speed up version of 133 00:08:24,990 --> 00:08:28,020 this dynamic programming algorithm. I mean, at a very high level, 134 00:08:28,020 --> 00:08:31,049 sort of from 30,000 feet, the goal is to avoid doing this 135 00:08:31,049 --> 00:08:34,223 breadth-first search over all possible routes in every single subproblem. 136 00:08:34,223 --> 00:08:38,071 And it turns out there's structure, nice structure in this optimal binary search 137 00:08:38,071 --> 00:08:42,014 tree problem that allows you to piggyback on the work done in smaller subproblems. 138 00:08:42,014 --> 00:08:45,871 So, in smaller subproblems, you already searched over a bunch candidate roots and 139 00:08:45,871 --> 00:08:49,660 it turns out using the results of those previous breadth-first search is, you can 140 00:08:49,660 --> 00:08:53,740 make inferences about which subset of the current set of roots might conceivably be 141 00:08:53,740 --> 00:08:57,431 the ones that determine the recurrence. And so, that lets you avoid searching 142 00:08:57,431 --> 00:08:59,860 over all of the possible candidates for the roots, 143 00:08:59,860 --> 00:09:01,900 instead focusing just on a very small set. 144 00:09:01,900 --> 00:09:05,445 In fact, the average, on average, constant number of possible roots over 145 00:09:05,445 --> 00:09:08,554 all of the subproblems. And needless to say, this speeding up of 146 00:09:08,554 --> 00:09:12,245 the running time from cubic to quadratic really significantly increases the 147 00:09:12,245 --> 00:09:15,062 problem sizes that you can now apply this algorithm to. 148 00:09:15,062 --> 00:09:18,608 So now, instead of being stuck in the hundreds, you'd certainly be able to 149 00:09:18,608 --> 00:09:23,822 solve problem sizes in the 1000's, possibly even in the 10,000's using this 150 00:09:23,822 --> 00:09:26,060 quadratic time algorithm. Very cool.