So now that we have motivated and 
formally defined the optimal binary 
search tree problem, lets think about how 
to solve it. 
After settling on dynamic programming as 
the paradigm we are going to try to use 
we're going to proceed in the usual way, 
turning to the optimal solution for 
clues, asking in what way is it composed 
of optimal solutions to smaller 
sub-problems. 
So let me just remind you of the formal 
problem statement. 
There's n objects we got to store in a 
search tree, and let's just name them in 
the order of their keys, and let's name 
them one, two, three, all the way up to 
n, for simplicity, and we're also given. 
frequencies or weights reflecting how 
often the different objects are searched 
for. 
So that's p1 up to pn positive numbers. 
Canonically we think of these summing to 
one, being probabilities, but actually 
won't use that fact so in general they're 
just arbitrary positive numbers. 
The goal is to output a search tree. 
It should satisfy the search tree 
property, it should contain all of these 
objects one through n, and amongst all 
such search trees it should minimize the 
weighted search time. 
So the sum over all of the items I of the 
probability of I times the search time 
for I, namely its depth in the tree plus 
one. 
Now in case you're feeling cocky about 
the fact that the greedy algorithm works 
to solve the seemingly similar optimal 
prefix-free binary code problem in the 
form of Huffman's algorithm, I want to 
spend a little time pointing out that 
greedy algorithms are not sufficient, are 
not correct, to solve the optimal search 
tree problem. 
So if we were to design a greedy 
algorithm what kind of intuition would 
motivate a particular greedy rule. 
Well, staying at the objective function 
it's very clear that we want the objects 
that high frequency of access to be at or 
near the root and we want items of low 
frequency access to be in the bottom 
levels of the tree, like the leaves. 
So what are some ways we can compile this 
intuition into a Greedy algorithm? 
Well one, perhaps motivated by the 
success of Huffman's algorithm, is we 
could think about a bottom up approach. 
Now I'm not going to define what I mean 
here precisely, but informally we want to 
start with the bottom most levels, with 
the leaves and the nodes you want to put 
there are the objects that are accessed 
the least frequently. 
Any reasonable way of implementing this 
kind of body of greedy rule is not going 
to work. 
Let me show a simple counter example. 
So, let's just assume we have four 
objects, one, two, three, four. 
What I'm showing here on the right in 
pink is two possible search trees valid 
for those four keys. 
And let's assume that, the frequencies 
are as followed. 
Object one is searched for two% of the 
time. 
Object two for 23% of the time. 
Object three, the bulk of the time 73%. 
An object for only one% of the time. 
Any greedy algorithm which insists on 
putting the lowest frequency objects in 
the very bottommost level of the tree is 
not going to produce this tree on the 
right, which has the two% object below, 
at a lower level than the !% object. 
Instead, such a greedy algorithm would 
presumably output a searchtree like the 
one on the left, which has the two as the 
root, and then the object four, at the 
lowest level, at depth two. 
But you should be able to easily convince 
yourself that, for these probabilities, 
it's the tree on the right which is the 
one you want, that's optimal, because the 
object three is the one that's searched 
for the bulk of the time, that's the one 
you want at the root as opposed to the 
object two. 
So I realize I'm being a little informal 
here but I hope you get the idea that a 
naive bottom-up implementation of a 
greedy algorithm, which if you think 
about it is really what we did in 
Huffman's algorithm, is not going to work 
here. 
The same can be said about a top-down 
approach. 
Perhaps the simplest top-down approach 
would be just to take the most frequently 
searched for object and put that at the 
root and then recursively develop an 
appropriate left and right sub-trees 
under that most frequently accessed 
element. 
So let me show you again and, formally, 
just the same kind of counter-example. 
We're going to use the exact same four 
objects, the exact same two trees. 
I, I will however, change the numbers. 
Now, let's imagine that object one is 
searched for almost never, just one 
percent of the time and each of the other 
three objects are searched for roughly a 
third of the time each. 
But, let me sort of break ties, so that 
the object number two is the most 
frequently one. 
Searched for 34%. 
So that, in that case the Greedy 
algorithm will put the 34% node up in the 
roots when really what should happen is 
you want a perfectly balanced sub-tree 
for the objects two, three and four 
because each accounts for roughly a third 
of the searches. 
So let's give object three 33% of the 
searches and object four 32% of the 
searches. 
And again I'll leave it for you to 
convince yourself that this is indeed a 
counter example the tree's spit out by 
the greedy algorithm on the left, we have 
an average search time of roughly two, 
where as the search tree on the right we 
have an average search time of roughly 
five thirds. 
So we'd like to produce the tree on the 
right but the greedy algorithm proposed 
here will spit out the tree on the left. 
This of course doesn't exhaust the list 
of potential greedy algorithms. 
You could try others but it's not going 
to work. 
There's no known greedy algorithm that 
successfully solves the optimal binary 
search tree problem. 
So in particular if we focus on the top 
down approach. 
The choice of the route. 
The choice of what to do at the uppermost 
level. 
Has very hard to predict repercussions, 
for what the two different sub-problems 
look like. 
So this is what stymie is, not only the 
top down gritty approach, but also a 
naive divide and conquer approach. 
So for example if we just wanted to split 
the keys into the first half, and the 
second half, recursively compute and 
optimal B is I need on each of those two 
half's, and then put them back together. 
The search tree property would say that 
we have to, unite those two sub-solutions 
under a root which is the median, in 
between the two sub-problems. 
And who is to say that the median is a 
good choice for the root. 
Again, because the ramifications further 
down the tree, maybe that's a stupid 
root. 
But, boy, is it tempting to try to solve 
this problem recursively, 
right? 
We're trying to output this binary tree. 
It has recursive structure. 
If only we knew which root we should 
pick. 
We would love to recurse twice. 
Once to construct an optimal left 
subtree. 
Once to construct an optimal right 
subtree. 
Okay, 
so if only we knew the right route. 
this is starting to sound familiar, 
actually. 
How did it work in all our dynamic 
programming solutions? 
We always said, oh. 
If only a little birdie told us this one 
little piece of the solution. 
Well, then, you know? 
Then we could, sort of, look up or 
recursively compute the rest of the 
solution. 
And extend it back to one for the 
original problem, easily. 
So maybe the same thing's true here. 
Maybe, if only a little birdie told us 
what the root was. 
Then we could look up or recursively 
compute optimal solutions to smaller 
subproblems. 
And paste everything back together, and 
be done. 
That would be great. 
So as usual we want to make this precise 
with an optimal substructure lemma. 
We want to understand the way in which an 
optimal solution to an optimal BST 
problem must necessarily be constructed 
from optimal solutions to smaller 
sub-problems. 
So in the following quiz I'm going to ask 
you to guess what the appropriate optimal 
substructure lemma is and then after that 
quiz once we've identified the right 
statement at that point I will show you 
the proof. 
Okay, so the answer I'm looking for is 
the fourth one, is D. 
Which is the strongest statement of all. 
So the first point is that each of the 
trees T1 and T2, now as subtrees of a 
binary search tree these of course are 
themselves binary search trees, valid for 
the trees that they contain. 
And not only can they be viewed as search 
trees on the keys they contain, but the 
claim which we'll prove on the next 
couple slides is that they are indeed 
optimal. 
They minimize the weighted search time 
over all possible search trees for the 
objects contained in those two trees. 
So that gets us down, it rules out A, it 
rules out B. 
We can say something stronger than that, 
but we can even say something stronger 
than C, that each of the two trees is 
optimal for the items that they contain. 
We actually know exactly which items are 
in T1 and which items are in T2. 
And this is by the search tree property. 
Search tree property said that every node 
and in particular here at the roots 
everything to the left of the root is 
less than that of everything to the right 
of the node is bigger than it. 
So the root being R by assumption we know 
the objects one through R minus one, but 
they got to be somewhere. 
The only way they can be is in the left 
sub-tree, t1. 
So that's exactly the contents of T1. 
Similarly, the contents of t2 are 
precisely the objects r+1 through n. 
So the two sub-trees are optimal. 
And we know exactly which keys they are, 
it's everything less than r on the left 
and everything bigger than r on the 
right. 
Okay, 
so here's where things stand at the end 
of this quiz. 
We've identified a statement that we're 
really hoping is true. 
We're really hoping that an optimal BST, 
binary search tree, must necessarily be 
composed in this way of optimal binary 
search trees for the key sets to the left 
of the root and the right of the root. 
If that's true, hopefully with the 
experience we now have, we can sort of 
envision what a dynamic programming 
algorithm might look like. 
I'll just fill in the details in the next 
video. 
If this weren't true, if we didn't have 
this optimal substructure, honestly, I 
have no idea how we'd get started. 
It's really not clear what an algorithm 
would look like if this weren't true. 
So the next couple slides, I'm going to 
prove this to you. 
The format, you know, will not be 
radically different than the ones we've 
already seen. 
I don't think there'll be any big 
surprises, but it's so important, this 
really is the whole reason why the 
algorithm is going to work, I'm still 
going to give you a fool proof of this 
optimal substructure lemma.