1
00:00:00,000 --> 00:00:03,716
So now that we have motivated and 
formally defined the optimal binary 

2
00:00:03,716 --> 00:00:06,530
search tree problem, lets think about how 
to solve it. 

3
00:00:06,530 --> 00:00:10,671
After settling on dynamic programming as 
the paradigm we are going to try to use 

4
00:00:10,671 --> 00:00:14,707
we're going to proceed in the usual way, 
turning to the optimal solution for 

5
00:00:14,707 --> 00:00:18,529
clues, asking in what way is it composed 
of optimal solutions to smaller 

6
00:00:18,529 --> 00:00:23,668
sub-problems. 
So let me just remind you of the formal 

7
00:00:23,668 --> 00:00:26,479
problem statement. 
There's n objects we got to store in a 

8
00:00:26,479 --> 00:00:30,445
search tree, and let's just name them in 
the order of their keys, and let's name 

9
00:00:30,445 --> 00:00:34,210
them one, two, three, all the way up to 
n, for simplicity, and we're also given. 

10
00:00:34,210 --> 00:00:38,268
frequencies or weights reflecting how 
often the different objects are searched 

11
00:00:38,268 --> 00:00:40,396
for. 
So that's p1 up to pn positive numbers. 

12
00:00:40,396 --> 00:00:44,257
Canonically we think of these summing to 
one, being probabilities, but actually 

13
00:00:44,257 --> 00:00:47,919
won't use that fact so in general they're 
just arbitrary positive numbers. 

14
00:00:47,919 --> 00:00:51,384
The goal is to output a search tree. 
It should satisfy the search tree 

15
00:00:51,384 --> 00:00:55,244
property, it should contain all of these 
objects one through n, and amongst all 

16
00:00:55,244 --> 00:00:58,313
such search trees it should minimize the 
weighted search time. 

17
00:00:58,313 --> 00:01:02,322
So the sum over all of the items I of the 
probability of I times the search time 

18
00:01:02,322 --> 00:01:04,500
for I, namely its depth in the tree plus 
one. 

19
00:01:05,520 --> 00:01:09,909
Now in case you're feeling cocky about 
the fact that the greedy algorithm works 

20
00:01:09,909 --> 00:01:13,950
to solve the seemingly similar optimal 
prefix-free binary code problem in the 

21
00:01:13,950 --> 00:01:17,741
form of Huffman's algorithm, I want to 
spend a little time pointing out that 

22
00:01:17,741 --> 00:01:21,782
greedy algorithms are not sufficient, are 
not correct, to solve the optimal search 

23
00:01:21,782 --> 00:01:24,632
tree problem. 
So if we were to design a greedy 

24
00:01:24,632 --> 00:01:28,620
algorithm what kind of intuition would 
motivate a particular greedy rule. 

25
00:01:28,620 --> 00:01:32,991
Well, staying at the objective function 
it's very clear that we want the objects 

26
00:01:32,991 --> 00:01:37,416
that high frequency of access to be at or 
near the root and we want items of low 

27
00:01:37,416 --> 00:01:41,350
frequency access to be in the bottom 
levels of the tree, like the leaves. 

28
00:01:41,350 --> 00:01:45,499
So what are some ways we can compile this 
intuition into a Greedy algorithm? 

29
00:01:45,499 --> 00:01:49,375
Well one, perhaps motivated by the 
success of Huffman's algorithm, is we 

30
00:01:49,375 --> 00:01:53,524
could think about a bottom up approach. 
Now I'm not going to define what I mean 

31
00:01:53,524 --> 00:01:57,891
here precisely, but informally we want to 
start with the bottom most levels, with 

32
00:01:57,891 --> 00:02:02,313
the leaves and the nodes you want to put 
there are the objects that are accessed 

33
00:02:02,313 --> 00:02:06,326
the least frequently. 
Any reasonable way of implementing this 

34
00:02:06,326 --> 00:02:09,103
kind of body of greedy rule is not going 
to work. 

35
00:02:09,103 --> 00:02:14,772
Let me show a simple counter example. 
So, let's just assume we have four 

36
00:02:14,772 --> 00:02:19,271
objects, one, two, three, four. 
What I'm showing here on the right in 

37
00:02:19,271 --> 00:02:23,423
pink is two possible search trees valid 
for those four keys. 

38
00:02:23,423 --> 00:02:27,368
And let's assume that, the frequencies 
are as followed. 

39
00:02:27,368 --> 00:02:30,343
Object one is searched for two% of the 
time. 

40
00:02:30,343 --> 00:02:38,880
Object two for 23% of the time. 
Object three, the bulk of the time 73%. 

41
00:02:38,880 --> 00:02:43,502
An object for only one% of the time. 
Any greedy algorithm which insists on 

42
00:02:43,502 --> 00:02:48,632
putting the lowest frequency objects in 
the very bottommost level of the tree is 

43
00:02:48,632 --> 00:02:53,381
not going to produce this tree on the 
right, which has the two% object below, 

44
00:02:53,381 --> 00:02:57,941
at a lower level than the !% object. 
Instead, such a greedy algorithm would 

45
00:02:57,941 --> 00:03:03,070
presumably output a searchtree like the 
one on the left, which has the two as the 

46
00:03:03,070 --> 00:03:07,620
root, and then the object four, at the 
lowest level, at depth two. 

47
00:03:07,620 --> 00:03:11,693
But you should be able to easily convince 
yourself that, for these probabilities, 

48
00:03:11,693 --> 00:03:15,767
it's the tree on the right which is the 
one you want, that's optimal, because the 

49
00:03:15,767 --> 00:03:19,841
object three is the one that's searched 
for the bulk of the time, that's the one 

50
00:03:19,841 --> 00:03:23,100
you want at the root as opposed to the 
object two. 

51
00:03:23,100 --> 00:03:27,262
So I realize I'm being a little informal 
here but I hope you get the idea that a 

52
00:03:27,262 --> 00:03:30,962
naive bottom-up implementation of a 
greedy algorithm, which if you think 

53
00:03:30,962 --> 00:03:34,816
about it is really what we did in 
Huffman's algorithm, is not going to work 

54
00:03:34,816 --> 00:03:37,025
here. 
The same can be said about a top-down 

55
00:03:37,025 --> 00:03:39,492
approach. 
Perhaps the simplest top-down approach 

56
00:03:39,492 --> 00:03:43,706
would be just to take the most frequently 
searched for object and put that at the 

57
00:03:43,706 --> 00:03:47,508
root and then recursively develop an 
appropriate left and right sub-trees 

58
00:03:47,508 --> 00:03:49,932
under that most frequently accessed 
element. 

59
00:03:49,932 --> 00:03:54,059
So let me show you again and, formally, 
just the same kind of counter-example. 

60
00:03:54,059 --> 00:03:57,968
We're going to use the exact same four 
objects, the exact same two trees. 

61
00:03:57,968 --> 00:04:01,931
I, I will however, change the numbers. 
Now, let's imagine that object one is 

62
00:04:01,931 --> 00:04:06,111
searched for almost never, just one 
percent of the time and each of the other 

63
00:04:06,111 --> 00:04:09,586
three objects are searched for roughly a 
third of the time each. 

64
00:04:09,586 --> 00:04:13,495
But, let me sort of break ties, so that 
the object number two is the most 

65
00:04:13,495 --> 00:04:15,264
frequently one. 
Searched for 34%. 

66
00:04:15,264 --> 00:04:19,621
So that, in that case the Greedy 
algorithm will put the 34% node up in the 

67
00:04:19,621 --> 00:04:24,336
roots when really what should happen is 
you want a perfectly balanced sub-tree 

68
00:04:24,336 --> 00:04:29,111
for the objects two, three and four 
because each accounts for roughly a third 

69
00:04:29,111 --> 00:04:32,453
of the searches. 
So let's give object three 33% of the 

70
00:04:32,453 --> 00:04:35,080
searches and object four 32% of the 
searches. 

71
00:04:36,940 --> 00:04:40,575
And again I'll leave it for you to 
convince yourself that this is indeed a 

72
00:04:40,575 --> 00:04:44,597
counter example the tree's spit out by 
the greedy algorithm on the left, we have 

73
00:04:44,597 --> 00:04:48,426
an average search time of roughly two, 
where as the search tree on the right we 

74
00:04:48,426 --> 00:04:50,898
have an average search time of roughly 
five thirds. 

75
00:04:50,898 --> 00:04:54,921
So we'd like to produce the tree on the 
right but the greedy algorithm proposed 

76
00:04:54,921 --> 00:04:59,927
here will spit out the tree on the left. 
This of course doesn't exhaust the list 

77
00:04:59,927 --> 00:05:03,374
of potential greedy algorithms. 
You could try others but it's not going 

78
00:05:03,374 --> 00:05:05,909
to work. 
There's no known greedy algorithm that 

79
00:05:05,909 --> 00:05:08,900
successfully solves the optimal binary 
search tree problem. 

80
00:05:11,000 --> 00:05:14,209
So in particular if we focus on the top 
down approach. 

81
00:05:14,209 --> 00:05:18,133
The choice of the route. 
The choice of what to do at the uppermost 

82
00:05:18,133 --> 00:05:20,807
level. 
Has very hard to predict repercussions, 

83
00:05:20,807 --> 00:05:24,320
for what the two different sub-problems 
look like. 

84
00:05:24,320 --> 00:05:27,839
So this is what stymie is, not only the 
top down gritty approach, but also a 

85
00:05:27,839 --> 00:05:31,406
naive divide and conquer approach. 
So for example if we just wanted to split 

86
00:05:31,406 --> 00:05:34,645
the keys into the first half, and the 
second half, recursively compute and 

87
00:05:34,645 --> 00:05:39,150
optimal B is I need on each of those two 
half's, and then put them back together. 

88
00:05:39,150 --> 00:05:42,952
The search tree property would say that 
we have to, unite those two sub-solutions 

89
00:05:42,952 --> 00:05:46,002
under a root which is the median, in 
between the two sub-problems. 

90
00:05:46,002 --> 00:05:49,006
And who is to say that the median is a 
good choice for the root. 

91
00:05:49,006 --> 00:05:52,385
Again, because the ramifications further 
down the tree, maybe that's a stupid 

92
00:05:52,385 --> 00:05:55,169
root. 
But, boy, is it tempting to try to solve 

93
00:05:55,169 --> 00:05:56,878
this problem recursively, 
right? 

94
00:05:56,878 --> 00:06:00,571
We're trying to output this binary tree. 
It has recursive structure. 

95
00:06:00,571 --> 00:06:02,886
If only we knew which root we should 
pick. 

96
00:06:02,886 --> 00:06:06,468
We would love to recurse twice. 
Once to construct an optimal left 

97
00:06:06,468 --> 00:06:08,839
subtree. 
Once to construct an optimal right 

98
00:06:08,839 --> 00:06:09,541
subtree. 
Okay, 

99
00:06:09,541 --> 00:06:13,352
so if only we knew the right route. 
this is starting to sound familiar, 

100
00:06:13,352 --> 00:06:15,650
actually. 
How did it work in all our dynamic 

101
00:06:15,650 --> 00:06:17,738
programming solutions? 
We always said, oh. 

102
00:06:17,738 --> 00:06:21,393
If only a little birdie told us this one 
little piece of the solution. 

103
00:06:21,393 --> 00:06:24,109
Well, then, you know? 
Then we could, sort of, look up or 

104
00:06:24,109 --> 00:06:26,458
recursively compute the rest of the 
solution. 

105
00:06:26,458 --> 00:06:29,487
And extend it back to one for the 
original problem, easily. 

106
00:06:29,487 --> 00:06:33,351
So maybe the same thing's true here. 
Maybe, if only a little birdie told us 

107
00:06:33,351 --> 00:06:36,223
what the root was. 
Then we could look up or recursively 

108
00:06:36,223 --> 00:06:38,781
compute optimal solutions to smaller 
subproblems. 

109
00:06:38,781 --> 00:06:41,235
And paste everything back together, and 
be done. 

110
00:06:41,235 --> 00:06:45,935
That would be great. 
So as usual we want to make this precise 

111
00:06:45,935 --> 00:06:50,794
with an optimal substructure lemma. 
We want to understand the way in which an 

112
00:06:50,794 --> 00:06:55,652
optimal solution to an optimal BST 
problem must necessarily be constructed 

113
00:06:55,652 --> 00:06:58,696
from optimal solutions to smaller 
sub-problems. 

114
00:06:58,696 --> 00:07:03,943
So in the following quiz I'm going to ask 
you to guess what the appropriate optimal 

115
00:07:03,943 --> 00:07:09,061
substructure lemma is and then after that 
quiz once we've identified the right 

116
00:07:09,061 --> 00:07:12,300
statement at that point I will show you 
the proof. 

117
00:07:16,900 --> 00:07:20,232
Okay, so the answer I'm looking for is 
the fourth one, is D. 

118
00:07:20,232 --> 00:07:24,560
Which is the strongest statement of all. 
So the first point is that each of the 

119
00:07:24,560 --> 00:07:28,517
trees T1 and T2, now as subtrees of a 
binary search tree these of course are 

120
00:07:28,517 --> 00:07:32,266
themselves binary search trees, valid for 
the trees that they contain. 

121
00:07:32,266 --> 00:07:36,484
And not only can they be viewed as search 
trees on the keys they contain, but the 

122
00:07:36,484 --> 00:07:40,337
claim which we'll prove on the next 
couple slides is that they are indeed 

123
00:07:40,337 --> 00:07:42,784
optimal. 
They minimize the weighted search time 

124
00:07:42,784 --> 00:07:46,741
over all possible search trees for the 
objects contained in those two trees. 

125
00:07:46,741 --> 00:07:49,396
So that gets us down, it rules out A, it 
rules out B. 

126
00:07:49,396 --> 00:07:53,510
We can say something stronger than that, 
but we can even say something stronger 

127
00:07:53,510 --> 00:07:58,168
than C, that each of the two trees is 
optimal for the items that they contain. 

128
00:07:58,168 --> 00:08:02,154
We actually know exactly which items are 
in T1 and which items are in T2. 

129
00:08:02,154 --> 00:08:06,631
And this is by the search tree property. 
Search tree property said that every node 

130
00:08:06,631 --> 00:08:10,672
and in particular here at the roots 
everything to the left of the root is 

131
00:08:10,672 --> 00:08:14,767
less than that of everything to the right 
of the node is bigger than it. 

132
00:08:14,767 --> 00:08:19,244
So the root being R by assumption we know 
the objects one through R minus one, but 

133
00:08:19,244 --> 00:08:22,869
they got to be somewhere. 
The only way they can be is in the left 

134
00:08:22,869 --> 00:08:25,638
sub-tree, t1. 
So that's exactly the contents of T1. 

135
00:08:25,638 --> 00:08:29,423
Similarly, the contents of t2 are 
precisely the objects r+1 through n. 

136
00:08:29,423 --> 00:08:33,547
So the two sub-trees are optimal. 
And we know exactly which keys they are, 

137
00:08:33,547 --> 00:08:37,841
it's everything less than r on the left 
and everything bigger than r on the 

138
00:08:37,841 --> 00:08:40,029
right. 
Okay, 

139
00:08:40,029 --> 00:08:42,675
so here's where things stand at the end 
of this quiz. 

140
00:08:42,675 --> 00:08:45,771
We've identified a statement that we're 
really hoping is true. 

141
00:08:45,771 --> 00:08:49,715
We're really hoping that an optimal BST, 
binary search tree, must necessarily be 

142
00:08:49,715 --> 00:08:53,759
composed in this way of optimal binary 
search trees for the key sets to the left 

143
00:08:53,759 --> 00:08:57,354
of the root and the right of the root. 
If that's true, hopefully with the 

144
00:08:57,354 --> 00:09:01,048
experience we now have, we can sort of 
envision what a dynamic programming 

145
00:09:01,048 --> 00:09:04,443
algorithm might look like. 
I'll just fill in the details in the next 

146
00:09:04,443 --> 00:09:06,690
video. 
If this weren't true, if we didn't have 

147
00:09:06,690 --> 00:09:10,634
this optimal substructure, honestly, I 
have no idea how we'd get started. 

148
00:09:10,634 --> 00:09:14,387
It's really not clear what an algorithm 
would look like if this weren't true. 

149
00:09:14,387 --> 00:09:16,912
So the next couple slides, I'm going to 
prove this to you. 

150
00:09:16,912 --> 00:09:20,279
The format, you know, will not be 
radically different than the ones we've 

151
00:09:20,279 --> 00:09:22,477
already seen. 
I don't think there'll be any big 

152
00:09:22,477 --> 00:09:25,844
surprises, but it's so important, this 
really is the whole reason why the 

153
00:09:25,844 --> 00:09:29,117
algorithm is going to work, I'm still 
going to give you a fool proof of this 

154
00:09:29,117 --> 00:09:30,380
optimal substructure lemma.