1
00:00:01,100 --> 00:00:05,874
So now that we have our magic formula, 
our recurrence, all that's left to do is 

2
00:00:05,874 --> 00:00:11,138
to systematically solve the subproblems. 
As usual, it is crucial that we solve the 

3
00:00:11,138 --> 00:00:14,382
subproblems in the right order, from 
smallest to largest. 

4
00:00:14,382 --> 00:00:19,033
How should we measure the size of a 
subproblem in the optimal binary search 

5
00:00:19,033 --> 00:00:23,930
tree problem? The natural way to do it is 
the number of items in the subproblem. 

6
00:00:23,930 --> 00:00:29,010
So if you're starting at I and you're 
going till J, the number of items in that 

7
00:00:29,010 --> 00:00:33,723
subproblem is j - i + 1 that, and that's 
going to be our measure of subproblem 

8
00:00:33,723 --> 00:00:35,791
size. 
So let's bust out our [INAUDIBLE] array. 

9
00:00:35,791 --> 00:00:39,817
The number of dimensions of this array is 
going to be two and that's because we 

10
00:00:39,817 --> 00:00:43,189
have two different degrees of freedom for 
annexing subproblems, 

11
00:00:43,189 --> 00:00:46,260
one for the start of the contiguous 
interval, one for the end. 

12
00:00:47,360 --> 00:00:50,930
So the outer for loop is going to control 
the subproblem size. 

13
00:00:50,930 --> 00:00:55,450
It's going to ensure that we solve all 
smaller subproblems before proceeding to 

14
00:00:55,450 --> 00:00:58,741
larger subproblems. 
Specifically, we'll be using an index s S 

15
00:00:58,741 --> 00:01:03,205
and in the iteration of this outer for 
loop, whatever the current value of S is, 

16
00:01:03,205 --> 00:01:06,385
we're only going to consider subproblems 
of size s + 1. 

17
00:01:06,385 --> 00:01:11,016
So you should think of s as representing 
the difference between the larger index j 

18
00:01:11,016 --> 00:01:14,698
and the earlier index i. 
The inner for loop controls the first 

19
00:01:14,698 --> 00:01:17,711
item in the contiguous interval that 
we're looking at, 

20
00:01:17,711 --> 00:01:22,000
so that's just i. 
And now, all we have to do is rewrite the 

21
00:01:22,000 --> 00:01:27,280
reccurrence in terms of the array entries 
and with this change variable where S 

22
00:01:27,280 --> 00:01:31,594
corresponds to j - i. 
That is for a given subproblem, starting 

23
00:01:31,594 --> 00:01:34,814
with the item i and ending with the item 
i + s. 

24
00:01:34,814 --> 00:01:39,901
We just by by brute force pick the best 
route, so the route here is going to be 

25
00:01:39,901 --> 00:01:44,216
similar to i and i + s. 
Regardless of the choice of the route, we 

26
00:01:44,216 --> 00:01:49,367
pick up the constant, the sum of the Pks 
where here, K, is ranging from the first 

27
00:01:49,367 --> 00:01:54,156
item i to the last item i + s. 
And then we also look at the previously 

28
00:01:54,156 --> 00:01:57,293
computed optimal solution values for the 
two relevant subproblems, 

29
00:01:57,293 --> 00:02:01,368
one starting in i, ending in r - 1. 
The other starting in r + 1 and ending at 

30
00:02:01,368 --> 00:02:03,663
i + s. 
So a couple of quick comments about the 

31
00:02:03,663 --> 00:02:06,379
two array lookups on the right-hand side 
of this formula. 

32
00:02:06,379 --> 00:02:10,219
so first of all, if you choose i to be, 
the root to be the first item i, then the 

33
00:02:10,219 --> 00:02:13,685
first lookup doesn't make sense. 
If you choose it to be the last item, the 

34
00:02:13,685 --> 00:02:17,431
second array lookup doesn't make sense. 
In that case, it's understood we're just 

35
00:02:17,431 --> 00:02:21,178
going to interpret these lookups as zero. 
Of course, in an actual implementation, 

36
00:02:21,178 --> 00:02:24,877
you'd have to include that code but, I'll 
let you take care of that on your own. 

37
00:02:24,877 --> 00:02:28,542
So the second comment is just our usual 
sanity check and again, you should always 

38
00:02:28,542 --> 00:02:31,450
do this when you write out a dynamic 
programming algorithm. 

39
00:02:31,450 --> 00:02:34,554
When you write down your formula to 
populate the array entries, 

40
00:02:34,554 --> 00:02:38,004
make sure that on the right-hand side, 
whenever you do an array lookup, 

41
00:02:38,004 --> 00:02:41,502
that is indeed already computed and 
available for constant time lookup. 

42
00:02:41,502 --> 00:02:45,494
So in this case, whatever our choice of 
the root is, the two relevant subproblems 

43
00:02:45,494 --> 00:02:48,845
are going to involve strictly fewer items 
than what we started with. 

44
00:02:48,845 --> 00:02:52,885
And therefore, the two subproblem lookups 
on the right-hand side will indeed have 

45
00:02:52,885 --> 00:02:56,039
been computed in some previous iteration 
of the outer for loop. 

46
00:02:56,039 --> 00:03:00,083
Remember, the outer for loop is ensuring 
we solve some problems from smallest 

47
00:03:00,083 --> 00:03:02,750
number of items up to largest number of 
items. 

48
00:03:02,750 --> 00:03:07,031
And of course, after the two for loops 
complete, what we really care about is 

49
00:03:07,031 --> 00:03:10,590
the answer in A of one comma n, 
that is the optimal binary search tree 

50
00:03:10,590 --> 00:03:13,760
value for all of the items that's the 
eventual output. 

51
00:03:13,760 --> 00:03:19,374
Some students like to think about these 
double for loops pictorially. So let's 

52
00:03:19,374 --> 00:03:22,730
imagine A, the 2-D array is laid out as a 
grid. 

53
00:03:22,730 --> 00:03:25,467
So imagine the x-axis correspond to the 
index i, 

54
00:03:25,467 --> 00:03:29,573
that is the first item in the set of 
items we're looking at and the y-axis 

55
00:03:29,573 --> 00:03:33,844
corresponding to j, the last item in the 
current set. And let me single out the 

56
00:03:33,844 --> 00:03:37,403
diagonal of this grid, 
so these are subproblems corresponding to 

57
00:03:37,403 --> 00:03:39,922
i = j, 
that is subproblems with a single 

58
00:03:39,922 --> 00:03:42,550
element. 
Now, we only ever solve problems where j 

59
00:03:42,550 --> 00:03:46,875
is at least as large as i, so that means 
we're really only filling in the upper 

60
00:03:46,875 --> 00:03:50,927
left or northwestern part of this table. 
So we never bother to fill in the 

61
00:03:50,927 --> 00:03:53,609
southeastern, the bottom right part of 
this table. 

62
00:03:53,609 --> 00:03:58,435
We just sort of think of it all as zero. 
Now, in the first outer iteration. 

63
00:03:58,435 --> 00:04:01,550
So, when s0. 
= 0, that's when our dynamic programming 

64
00:04:01,550 --> 00:04:05,614
algorithm solves, in turn, each of the n 
single item problems. 

65
00:04:05,614 --> 00:04:10,761
So, in the first iteration of this double 
for loop, it's going to solve the 

66
00:04:10,761 --> 00:04:14,283
subproblem A11. 
In the next iteration of the inner for 

67
00:04:14,283 --> 00:04:17,670
loop, it's going to proceed the A22, then 
A33, and so on. 

68
00:04:17,670 --> 00:04:22,614
In each of those, both of the array 
lookups are going to just correspond to 

69
00:04:22,614 --> 00:04:27,219
zero and we're just going to fill in this 
diagonal with the base cases, 

70
00:04:27,219 --> 00:04:30,200
where Aii is just the probability of item 
i. 

71
00:04:30,200 --> 00:04:34,586
Then, as the dynamic programming 
algorithm proceeds, we're going to be 

72
00:04:34,586 --> 00:04:38,972
filling in the upper left portion of this 
table diagonal by diagonal. 

73
00:04:38,972 --> 00:04:43,995
Each time we increment s, the index in 
the outer for loop, we're going to march 

74
00:04:43,995 --> 00:04:49,271
up to the next northwesternmost diagonal, 
and then as we step through the possible 

75
00:04:49,271 --> 00:04:53,848
values of i, we're going to fill in that 
diagonal one at a time moving from 

76
00:04:53,848 --> 00:05:00,326
southwest to northeast. 
When we're filling in the value of a 

77
00:05:00,326 --> 00:05:06,153
subproblem on one of these diagonals, all 
we need to do is lookup the value for two 

78
00:05:06,153 --> 00:05:11,243
subproblems on lower diagonals. Lower 
diagonals correspond to subproblems with 

79
00:05:11,243 --> 00:05:13,285
strictly fewer items. 
So that's it. 

80
00:05:13,285 --> 00:05:17,294
That's the dynamic programming algorithm 
that computes the value of an optimal 

81
00:05:17,294 --> 00:05:20,178
binary search tree given a set of items 
with probabilities. 

82
00:05:20,178 --> 00:05:22,378
I'm not going to say anything about 
correctness. 

83
00:05:22,378 --> 00:05:24,626
It's the, the same story as we've seen in 
the past. 

84
00:05:24,626 --> 00:05:27,902
All the heavylifting is improving the 
optimal substructure lemma, 

85
00:05:27,902 --> 00:05:31,617
that gave us the correctness of our 
occurrence given that our magic formula 

86
00:05:31,617 --> 00:05:35,137
is correct and we're just applying it 
systematically, correctness of the 

87
00:05:35,137 --> 00:05:39,243
dynamic programming algorithm follows in 
a straightforward way, just by induction. 

88
00:05:39,243 --> 00:05:42,030
Let me, however, make some comments about 
the running time. 

89
00:05:42,030 --> 00:05:43,975
So, let's just follow the usual 
procedure. 

90
00:05:43,975 --> 00:05:47,820
Let's just look at how many subproblems 
got to get solved and then how much work 

91
00:05:47,820 --> 00:05:50,288
has to get done to solve each of those 
subproblems. 

92
00:05:50,288 --> 00:05:53,943
So as far as the number of subproblems, 
it's all possible choices of i and j, 

93
00:05:53,943 --> 00:05:58,116
where i is at most j, or in other words, 
it's essentially half of that n by n 

94
00:05:58,116 --> 00:06:00,600
grid. 
So this is roughly n squared over two, 

95
00:06:00,600 --> 00:06:04,990
let's just call it theta of n squared, 
so a quadratic number of subproblems. 

96
00:06:04,990 --> 00:06:09,113
Now, for each of the subproblems, we have 
to evaluate this recurrence, we have to 

97
00:06:09,113 --> 00:06:13,031
evaulate the formula, which conceptually 
is a breadth-first search through the 

98
00:06:13,031 --> 00:06:16,639
number of candidates that we've 
identified. And a disctinction between 

99
00:06:16,639 --> 00:06:20,351
this dynamic programming algorithm and 
all of the other ones we've seen 

100
00:06:20,351 --> 00:06:24,297
recently, sequence allignment, knapsack, 
computing independent set of the line 

101
00:06:24,297 --> 00:06:28,218
graphs, is that it's actually kind of a 
lot of options for what the optimal 

102
00:06:28,218 --> 00:06:31,670
solution can be. That is, our 
breadth-first search, for the first time, 

103
00:06:31,670 --> 00:06:34,493
is not over a merely constant number of 
possibilities. 

104
00:06:34,493 --> 00:06:38,624
We have to try every possible route, 
each of the items in our given subproblem 

105
00:06:38,624 --> 00:06:42,755
is a candidate route and we try them all. 
So, given a start item of i and an end 

106
00:06:42,755 --> 00:06:46,729
item of j, there's j minus i plus one 
total items and we have to do constant 

107
00:06:46,729 --> 00:06:51,911
work for each one of those choices. 
So there will be subproblems, some 

108
00:06:51,911 --> 00:06:56,517
subproblems that we can evaluate quickly 
and only say constant time if i and j are 

109
00:06:56,517 --> 00:06:59,622
very close to each other, 
but for a constant fraction of the 

110
00:06:59,622 --> 00:07:03,037
subproblems we have to deal with, this is 
going to be linear time, 

111
00:07:03,037 --> 00:07:05,625
theta of n time. 
So over all, that gives us a cubic 

112
00:07:05,625 --> 00:07:09,299
running time, theta of n cubed. 
Alright, so I would say this running time 

113
00:07:09,299 --> 00:07:12,456
is sort of okay, not great. 
So it is polynomial time, that's good. 

114
00:07:12,456 --> 00:07:15,768
That's certainly way, way, way faster 
than enumerating all of the exponentially 

115
00:07:15,768 --> 00:07:20,115
many possible binary search trees, 
so it blows away breath-first search. 

116
00:07:20,115 --> 00:07:24,601
But it's not something I would call 
blazingly fast or for free primitive or 

117
00:07:24,601 --> 00:07:27,361
anything like that. 
So you're going to be able be to solve 

118
00:07:27,361 --> 00:07:30,722
problem sizes with n in the 100's, but 
probably not n in the 1000's. 

119
00:07:30,722 --> 00:07:34,434
So that will cover some applications 
where you'd want to use this optimal 

120
00:07:34,434 --> 00:07:38,348
binary search tree algorithm, but not all 
of them. So it's good for some things, 

121
00:07:38,348 --> 00:07:41,860
but it's not a universal solution. 
On the other hand, here's a fun fact. 

122
00:07:45,360 --> 00:07:49,398
And the fun fact is you can actually 
speed up this dynamic programming 

123
00:07:49,398 --> 00:07:52,811
algorithm significantly. 
You can keep with the exact same 2-D 

124
00:07:52,811 --> 00:07:57,134
array with the exact same semantics. 
Again, each index is going to correspond 

125
00:07:57,134 --> 00:08:01,457
to the subproblem with the optimal binary 
search tree between items i and j 

126
00:08:01,457 --> 00:08:04,415
inclusive. 
But, you can actually fill up this entire 

127
00:08:04,415 --> 00:08:08,112
table all n squared entries using only a 
total of n squared time. 

128
00:08:08,112 --> 00:08:10,900
That is on average, constant work per 
subproblem. 

129
00:08:10,900 --> 00:08:14,458
So this fun fact, it's very clever. 
It's really more intricate than what we 

130
00:08:14,458 --> 00:08:17,392
discussed in this video here, 
but it, it's not impossible to read. 

131
00:08:17,392 --> 00:08:21,095
So if you're interested, I encourage you 
to go back to the original papers or 

132
00:08:21,095 --> 00:08:24,990
search the web for some other resources 
on this optimized speed up version of 

133
00:08:24,990 --> 00:08:28,020
this dynamic programming algorithm. 
I mean, at a very high level, 

134
00:08:28,020 --> 00:08:31,049
sort of from 30,000 feet, 
the goal is to avoid doing this 

135
00:08:31,049 --> 00:08:34,223
breadth-first search over all possible 
routes in every single subproblem. 

136
00:08:34,223 --> 00:08:38,071
And it turns out there's structure, nice 
structure in this optimal binary search 

137
00:08:38,071 --> 00:08:42,014
tree problem that allows you to piggyback 
on the work done in smaller subproblems. 

138
00:08:42,014 --> 00:08:45,871
So, in smaller subproblems, you already 
searched over a bunch candidate roots and 

139
00:08:45,871 --> 00:08:49,660
it turns out using the results of those 
previous breadth-first search is, you can 

140
00:08:49,660 --> 00:08:53,740
make inferences about which subset of the 
current set of roots might conceivably be 

141
00:08:53,740 --> 00:08:57,431
the ones that determine the recurrence. 
And so, that lets you avoid searching 

142
00:08:57,431 --> 00:08:59,860
over all of the possible candidates for 
the roots, 

143
00:08:59,860 --> 00:09:01,900
instead focusing just on a very small 
set. 

144
00:09:01,900 --> 00:09:05,445
In fact, the average, on average, 
constant number of possible roots over 

145
00:09:05,445 --> 00:09:08,554
all of the subproblems. 
And needless to say, this speeding up of 

146
00:09:08,554 --> 00:09:12,245
the running time from cubic to quadratic 
really significantly increases the 

147
00:09:12,245 --> 00:09:15,062
problem sizes that you can now apply this 
algorithm to. 

148
00:09:15,062 --> 00:09:18,608
So now, instead of being stuck in the 
hundreds, you'd certainly be able to 

149
00:09:18,608 --> 00:09:23,822
solve problem sizes in the 1000's, 
possibly even in the 10,000's using this 

150
00:09:23,822 --> 00:09:26,060
quadratic time algorithm. 
Very cool.