1
00:00:01,120 --> 00:00:04,967
In this next sequence of videos we're 
going to study a slightly more intricate 

2
00:00:04,967 --> 00:00:07,396
application of the dynamic programming 
paradigm, 

3
00:00:07,396 --> 00:00:10,231
namely to the problem of computing 
optimal search trees. 

4
00:00:10,231 --> 00:00:14,230
Search trees that minimize the average 
search time, with respect to a given set 

5
00:00:14,230 --> 00:00:19,292
of probabilities over the keys. 
So I'm going to assume in these videos 

6
00:00:19,292 --> 00:00:22,780
that you remember the basics of the 
search tree data structure. 

7
00:00:22,780 --> 00:00:26,986
So if you need to jog your memory, you 
might want to review the relevant video 

8
00:00:26,986 --> 00:00:29,920
from part one. 
So a search tree is they contain objects, 

9
00:00:29,920 --> 00:00:33,186
each object has a key drawn from some 
totally ordered sets. 

10
00:00:33,186 --> 00:00:37,504
And the search tree property states that 
at each node of a search tree, say it 

11
00:00:37,504 --> 00:00:41,987
contains some object with a key of value 
x, it better be the case that everything 

12
00:00:41,987 --> 00:00:46,526
in the left subtree of that node has keys 
less than x and everything in the right 

13
00:00:46,526 --> 00:00:50,932
subtree under the node with key x has to 
have keys bigger than X. So that has to 

14
00:00:50,932 --> 00:00:54,370
be true simultaneously at every single 
node of the search tree. 

15
00:00:54,370 --> 00:00:58,205
The motivation for the search tree 
property is so that searching in a search 

16
00:00:58,205 --> 00:01:01,940
tree just involves following your nose, 
just like binary search in assorted 

17
00:01:01,940 --> 00:01:04,181
array. 
So if you're looking for, say, an object 

18
00:01:04,181 --> 00:01:07,917
with key seventeen, you know whether to 
go left or right from the root, just 

19
00:01:07,917 --> 00:01:11,254
based on the root's key value. 
If the root has key value twelve, you 

20
00:01:11,254 --> 00:01:14,541
know where seventeen, if it exists, has 
to be in the right sub-tree. 

21
00:01:14,541 --> 00:01:16,783
So you recursively search the right 
sub-tree. 

22
00:01:16,783 --> 00:01:20,817
If the root has value 23, you know where 
seventeen has to be in the left sub-tree. 

23
00:01:20,817 --> 00:01:25,324
So that's where you recursively search. 
Something we originally discussed in the 

24
00:01:25,324 --> 00:01:28,693
context of balanced binary search trees. 
Like red black trees. 

25
00:01:28,693 --> 00:01:32,339
And I'm going to reiterate now. 
Is that, for a given set of keys, there 

26
00:01:32,339 --> 00:01:35,376
are many, many valid search trees 
containing those keys. 

27
00:01:35,376 --> 00:01:39,629
So just to remind you how this works, 
let's even just suppose there were only 

28
00:01:39,629 --> 00:01:43,220
three keys in the world, x, y, and z, 
with x less than, y less than z. 

29
00:01:44,680 --> 00:01:47,360
One obvious search tree would be the 
balanced one. 

30
00:01:47,360 --> 00:01:50,148
So that would put the middle element y at 
the roots. 

31
00:01:50,148 --> 00:01:52,614
You would have left child x and right 
child y. 

32
00:01:52,614 --> 00:01:56,688
But there's also the two chain search 
trees containing these three keys, one 

33
00:01:56,688 --> 00:02:00,977
with the smallest element x at the root, 
the other with the largest element z at 

34
00:02:00,977 --> 00:02:05,375
the root. 
So, given the multiplicity of solutions, 

35
00:02:05,375 --> 00:02:09,127
all of the different search trees one 
could use to store objects with a bunch 

36
00:02:09,127 --> 00:02:11,677
of keys, an obvious question is which one 
is the best. 

37
00:02:11,677 --> 00:02:15,520
What's the best search tree to use out of 
all of the possibilities? 

38
00:02:15,520 --> 00:02:18,962
So I don't blame you if you've got a 
sense of deja vu. 

39
00:02:18,962 --> 00:02:22,996
We did already ask and answer this 
question in one sense, when we discussed 

40
00:02:22,996 --> 00:02:25,954
red black trees. 
There we argued that the best thing to 

41
00:02:25,954 --> 00:02:29,826
have is a balanced binary search tree 
that keeps the height as small as 

42
00:02:29,826 --> 00:02:34,075
possible and therefore the worst case 
search time, which is proportional to the 

43
00:02:34,075 --> 00:02:38,109
height, as small as possible, namely 
logarithmic in the number of objects in 

44
00:02:38,109 --> 00:02:41,269
the search tree. 
But now let's make the same kind of 

45
00:02:41,269 --> 00:02:44,847
assumption that we made when we discussed 
Huffman codes. 

46
00:02:44,847 --> 00:02:49,510
That is, let's assume that we actually 
have accurate statistics about how 

47
00:02:49,510 --> 00:02:53,407
frequently, each item in the tree is 
going to be searched for. 

48
00:02:53,407 --> 00:02:57,815
So maybe we know that item X is going to 
be searched for 80% of the time, 

49
00:02:57,815 --> 00:03:01,073
whereas Y and Z will only be searched for 
10% each. 

50
00:03:01,073 --> 00:03:05,608
Could we then improve, upon the perfectly 
balanced search tree solution? 

51
00:03:05,608 --> 00:03:11,642
So let me make this question more 
concrete, just by asking you to compare 

52
00:03:11,642 --> 00:03:14,820
two candidate solutions. 
On the one hand the balance tree, which 

53
00:03:14,820 --> 00:03:16,939
has Y at the root and X and Z as 
children. 

54
00:03:16,939 --> 00:03:20,974
On the other hand, the chain which has X 
as a root, Y as its right child and then 

55
00:03:20,974 --> 00:03:24,102
Z as the right child of Z, excuse me, Z 
as the right child of Y. 

56
00:03:24,102 --> 00:03:27,835
So what is the average search time in 
each of these two search trees, with 

57
00:03:27,835 --> 00:03:32,980
respect to the search frequencies I told 
you, 80% for X, 10% for Y, and 10% for Z? 

58
00:03:32,980 --> 00:03:36,814
And when I say the search time for a 
node, I just mean how many nodes do you 

59
00:03:36,814 --> 00:03:40,648
look at on route to discovering what 
you're looking for, including that last 

60
00:03:40,648 --> 00:03:43,571
node itself. 
So the search time for something that's 

61
00:03:43,571 --> 00:03:46,697
at the root for example, 
that would be a search time of just one, 

62
00:03:46,697 --> 00:03:48,700
because you only look at the root to find 
it. 

63
00:03:57,700 --> 00:04:02,198
All right so the correct answer to the 
quiz is the fourth option 1.9 and 1.3. 

64
00:04:02,198 --> 00:04:06,638
So to see why, let's just compute the 
average search time in each of the two 

65
00:04:06,638 --> 00:04:10,026
proposed search trees. 
In the first one, with Y at the root, 

66
00:04:10,026 --> 00:04:14,407
well, 80% of the time we're going to 
suffer a search time of two, whenever we 

67
00:04:14,407 --> 00:04:18,321
look for X we have to look at the root Y 
and then we look at the X. 

68
00:04:18,321 --> 00:04:21,476
So we pay two 80% of the time and that 
contributes 1.6. 

69
00:04:21,476 --> 00:04:25,040
10% of the time we get lucky, we see a Y, 
that contributes a.1. 

70
00:04:25,040 --> 00:04:29,187
10% of the time we see Z, that 
contributes another.2 for a total of 1.9. 

71
00:04:29,187 --> 00:04:32,789
By contrast think about the chain that 
has X at the root. 

72
00:04:32,789 --> 00:04:36,700
Here 80% of the time we get lucky, and we 
only have to pay one. 

73
00:04:36,700 --> 00:04:40,286
Two for every search for X so that 
contributes only 8. to the total. 

74
00:04:40,286 --> 00:04:43,445
It is true our worst case search time has 
actually gone up. 

75
00:04:43,445 --> 00:04:47,942
When we see its Z we suffer a search time 
of three which never ever happened in the 

76
00:04:47,942 --> 00:04:50,672
balance case. 
But we pay that three only ten% of the 

77
00:04:50,672 --> 00:04:54,206
time, that contributes a.3. 
The remaining ten% of the time we suffer 

78
00:04:54,206 --> 00:04:59,200
a search time of two to look for Y. 
That gives us a total of 1.3. 

79
00:05:02,000 --> 00:05:07,076
And the moral of the story, of course, is 
that this example exposes an interesting 

80
00:05:07,076 --> 00:05:10,961
algorithmic opportunity. 
So the obvious, quote unquote, solution 

81
00:05:10,961 --> 00:05:15,599
of a perfectly balanced search tree, need 
not be the best search tree when 

82
00:05:15,599 --> 00:05:20,299
frequency of access is non uniform. 
You might want to have unbalanced trees, 

83
00:05:20,299 --> 00:05:25,187
like this chain if it gets extremely 
frequent searches, closer to the roots to 

84
00:05:25,187 --> 00:05:29,261
have smaller search time. 
So the obvious question is then, given a 

85
00:05:29,261 --> 00:05:33,460
bunch of items and known frequencies of 
access, what is the optimal. 

86
00:05:33,460 --> 00:05:36,710
Search tree. 
Which search tree minimizes the average 

87
00:05:36,710 --> 00:05:40,762
search time? 
So that brings us to the formal problem 

88
00:05:40,762 --> 00:05:43,678
definition. 
We're told n objects that we gotta store 

89
00:05:43,678 --> 00:05:47,267
in a search tree and we're told the 
frequency of access of each. 

90
00:05:47,267 --> 00:05:51,809
So let's just keep things simple and the 
notation straightforward, let's just say 

91
00:05:51,809 --> 00:05:55,959
the items are named from one, two, all 
the way up to n, and that is also the 

92
00:05:55,959 --> 00:05:59,996
order of their respective keys. 
So key one is the frequency of searching 

93
00:05:59,996 --> 00:06:03,440
for the item with the smallest key, and 
so on. 

94
00:06:03,440 --> 00:06:05,604
You might wonder where these frequencies 
come from. 

95
00:06:05,604 --> 00:06:09,085
How would you know exactly how frequently 
every possible key will be searched for. 

96
00:06:09,085 --> 00:06:12,353
It's going to depend on the application. 
And you know there will be applications 

97
00:06:12,353 --> 00:06:14,645
where you're not going to have these 
kinds of statistics. 

98
00:06:14,645 --> 00:06:18,126
And that's where you'd probably want to 
turn to a general purpose balanced binary 

99
00:06:18,126 --> 00:06:21,479
search tree solution, something like a 
red black tree, which guarantees you that 

100
00:06:21,479 --> 00:06:23,347
every search is going to be reasonably 
fast. 

101
00:06:23,347 --> 00:06:26,276
But it's not hard to think about 
applications where you are going to be 

102
00:06:26,276 --> 00:06:29,799
able to learn pretty accurate statistics 
about how frequently different things are 

103
00:06:29,799 --> 00:06:31,964
searched for. 
One example might be something like a 

104
00:06:31,964 --> 00:06:34,341
spell checker. 
So if you would implement that by storing 

105
00:06:34,341 --> 00:06:37,440
all of the legally spelled words in a 
search tree, and as you're scanning a 

106
00:06:37,440 --> 00:06:41,023
document, and every time you hit a word, 
you look it up in the search tree to see 

107
00:06:41,023 --> 00:06:43,219
if it's correctly spelled or incorrectly 
spelled. 

108
00:06:43,219 --> 00:06:46,578
You can imagine that after, you know, 
scanning through a number of documents, 

109
00:06:46,578 --> 00:06:50,207
you would have pretty good estimates, 
about how frequently things get searched 

110
00:06:50,207 --> 00:06:52,268
for. 
And then you could use those estimates to 

111
00:06:52,268 --> 00:06:55,359
build a highly optimized binary search 
tree for all future documents. 

112
00:06:55,359 --> 00:06:58,629
If you're in some other kind of 
application where you're concerned about 

113
00:06:58,629 --> 00:07:02,168
these frequencies changing over time, so, 
for example, if they're privy to trends 

114
00:07:02,168 --> 00:07:05,484
in the industry, you could imagine 
rebuilding the search tree every day or 

115
00:07:05,484 --> 00:07:08,620
every week or whatever, based on the 
latest statistics that you've got. 

116
00:07:09,920 --> 00:07:13,597
In any case, if you're lucky enough to 
have such statistics, what you're 

117
00:07:13,597 --> 00:07:17,120
going to want to do is to build a search 
tree, which, on one hand is valid. 

118
00:07:17,120 --> 00:07:20,849
It should satisfy the search tree 
property and on the other hand, should 

119
00:07:20,849 --> 00:07:23,440
make the average search time as small as 
possible. 

120
00:07:25,080 --> 00:07:28,388
So let me go ahead and write down a 
formula for the average search time. 

121
00:07:28,388 --> 00:07:30,547
It's the one that you would expect it to 
be. 

122
00:07:30,547 --> 00:07:33,074
And also introduce some notation, 
namely capital C of T. 

123
00:07:33,074 --> 00:07:36,061
We'll denote the average search time of a 
proposed search tree T. 

124
00:07:36,061 --> 00:07:39,507
So for these lectures, we're going to 
focus on the case where all searches are 

125
00:07:39,507 --> 00:07:41,804
successful. 
The only thing that ever gets searched 

126
00:07:41,804 --> 00:07:45,571
for is stuff that's actually in the tree. 
But everything we'll talk about in these 

127
00:07:45,571 --> 00:07:49,063
lectures and the algorithm is easily 
extended to accommodate the case where 

128
00:07:49,063 --> 00:07:52,831
you also have unsuccessful searches and 
statistics about how frequent the various 

129
00:07:52,831 --> 00:07:55,496
unsuccessful searches are. 
But, if there is only successful 

130
00:07:55,496 --> 00:07:59,848
searches, then we average only over the n 
elements that are stored in the tree. 

131
00:07:59,848 --> 00:08:04,584
So we sum over each of the items I, we 
weight it by the probability or the 

132
00:08:04,584 --> 00:08:09,512
frequency of it's access P sub I, and 
then that gets multiplied by the search 

133
00:08:09,512 --> 00:08:12,520
time required in the tree T to find the 
item I. 

134
00:08:13,860 --> 00:08:18,449
And as we discussed in the quiz, the 
search time for a given key I and a given 

135
00:08:18,449 --> 00:08:22,868
tree T is just the number of nodes you 
have to visit until you get to the one 

136
00:08:22,868 --> 00:08:25,871
containing I. 
So if you think about it, that's exactly 

137
00:08:25,871 --> 00:08:28,364
the depth of the node in this tree plus 
one. 

138
00:08:28,364 --> 00:08:32,840
So for example, if you're lucky enough 
that the key is at the root, the depth of 

139
00:08:32,840 --> 00:08:36,523
the root is zero, and we're counting that 
as a search time as one. 

140
00:08:36,523 --> 00:08:39,924
So it's depth plus one. 
So, one minor point. 

141
00:08:39,924 --> 00:08:43,381
It's going to be convenient for me to not 
insist that the PI's sum to one. 

142
00:08:43,381 --> 00:08:46,399
Of course, if the PIs were probabilities, 
they would sum to one. 

143
00:08:46,399 --> 00:08:49,855
But I'm going to go ahead and allow them 
to be arbitrary positive numbers. 

144
00:08:49,855 --> 00:08:53,214
And that, for the same reason, I'm 
going to sometimes call capital C of T, 

145
00:08:53,214 --> 00:08:56,184
the weighted search time rather that the 
average search time, 

146
00:08:56,184 --> 00:08:59,446
because I won't necessarily be assuming 
that the PIs, sum to one. 

147
00:08:59,446 --> 00:09:03,145
With that said, go ahead and, you know? 
Think of that as the canonical special 

148
00:09:03,145 --> 00:09:05,580
case in your mind as we go through these 
lectures. 

149
00:09:07,520 --> 00:09:12,876
So for example, in the case where these 
are probabilities, where the PIs sum to 

150
00:09:12,876 --> 00:09:18,440
one, we could always as a reference point 
use a red black tree as our search tree. 

151
00:09:20,620 --> 00:09:25,089
But, as we've seen, when these PIs are 
not uniform, you can generally do better. 

152
00:09:25,089 --> 00:09:28,224
And so that's the point of this 
computational problem. 

153
00:09:28,224 --> 00:09:32,925
Exploit the non-uniformities in the given 
probabilities to come up with the best 

154
00:09:32,925 --> 00:09:35,480
possibly unbalanced search tree as 
possible. 

155
00:09:36,940 --> 00:09:41,050
So I'm sure many of you will have noticed 
some of the similarities between this 

156
00:09:41,050 --> 00:09:43,773
computational problem of optimal binary 
search trees. 

157
00:09:43,773 --> 00:09:47,267
And one that we already solved back in 
the greedy algorithm section, 

158
00:09:47,267 --> 00:09:50,966
namely Huffkin, Huffman codes, which, 
amongst all prefix free binary codes, 

159
00:09:50,966 --> 00:09:54,665
minimize the average encoding length. 
So, let's just be precise about the 

160
00:09:54,665 --> 00:09:58,724
similarities and differences between the 
two problems, and in particular, why we 

161
00:09:58,724 --> 00:10:02,783
can just reuse the algorithm we already 
saw off the shelf, to solve the optimal 

162
00:10:02,783 --> 00:10:06,355
BST problem. 
So it's, of course, super similar in the 

163
00:10:06,355 --> 00:10:10,570
two cases, is the format of the output. 
In both problems, the responsibility of 

164
00:10:10,570 --> 00:10:15,004
the algorithm is to output a binary tree, 
and the goal is to minimize the average 

165
00:10:15,004 --> 00:10:18,070
depth, more or less, 
where the average is with respect to 

166
00:10:18,070 --> 00:10:21,573
provided frequencies, over a bunch of 
objects that we care about. 

167
00:10:21,573 --> 00:10:24,694
Characters from an alphabet in the case 
of Huffman codes, 

168
00:10:24,694 --> 00:10:28,909
and a bunch of objects with keys from 
some totally ordered set in the binary 

169
00:10:28,909 --> 00:10:32,415
search tree case. 
And it is true that in the optimal BST 

170
00:10:32,415 --> 00:10:36,705
case, we're not really averaging depths, 
we're averaging depths plus one but if 

171
00:10:36,705 --> 00:10:39,400
you think about it that's exactly the 
same thing. 

172
00:10:40,460 --> 00:10:44,846
More important is to understand the 
differences between the problem solved by 

173
00:10:44,846 --> 00:10:49,600
Huffman codes and the computational 
problem that we have to solve here. 

174
00:10:49,600 --> 00:10:53,759
In the Huffman code case, we had to 
output of binary code and the key 

175
00:10:53,759 --> 00:10:58,334
constraint was that it be prefix-free. 
And in the language of trees what that 

176
00:10:58,334 --> 00:11:02,969
meant is that the symbols that we're 
encoding had to correspond to the leaves 

177
00:11:02,969 --> 00:11:06,119
of the tree. 
Symbols could not correspond to internal 

178
00:11:06,119 --> 00:11:10,062
nodes of the tree that we output. 
Now in the optimal binary search tree 

179
00:11:10,062 --> 00:11:14,027
problem we do not have this prefix free 
constraint, so we're going to have a 

180
00:11:14,027 --> 00:11:18,201
label that is an object at every single 
node of our tree, whether it's a leaf or 

181
00:11:18,201 --> 00:11:20,496
not. 
But, we have a different, seemingly quite 

182
00:11:20,496 --> 00:11:24,096
a bit harder constraint to deal with, 
namely the search tree property. 

183
00:11:24,096 --> 00:11:28,217
So remember, back in the Huffman code 
case we didn't even have an ordering on 

184
00:11:28,217 --> 00:11:32,235
the symbols of the, of the alphabet. 
There wasn't a sense that one of them was 

185
00:11:32,235 --> 00:11:36,200
less than another, it wouldn't even make 
sense to talk about the search tree. 

186
00:11:36,200 --> 00:11:39,341
Brought in that context. 
Here by contrast, we're given these keys 

187
00:11:39,341 --> 00:11:43,267
and there's this total lowering on them. 
And we'd better satisfy the search tree 

188
00:11:43,267 --> 00:11:47,292
property in the tree that we output, that 
is in every single node in the tree that 

189
00:11:47,292 --> 00:11:51,268
we output, it better be the case that all 
keys in the left sub-tree are less than 

190
00:11:51,268 --> 00:11:55,342
the key at that node, and all keys in the 
right sub-tree are bigger than the key at 

191
00:11:55,342 --> 00:11:57,600
that node. 
That's a constraint that we have no 

192
00:11:57,600 --> 00:12:02,111
choice but to satisfy. 
This constraint is harder in the sense 

193
00:12:02,111 --> 00:12:05,989
that no greedy algorithm, Huffman's 
Algorithm or otherwise, solves the 

194
00:12:05,989 --> 00:12:10,316
optimal binary search tree problem. 
Rather we're going to have to turn to the 

195
00:12:10,316 --> 00:12:14,249
more sophisticated tool of dynamic 
programming to design an efficient 

196
00:12:14,249 --> 00:12:17,172
algorithm for computing optimal binary 
search trees. 

197
00:12:17,172 --> 00:12:20,600
That's the solution we'll start 
developing in the next video.