In this next sequence of videos we're 
going to study a slightly more intricate 
application of the dynamic programming 
paradigm, 
namely to the problem of computing 
optimal search trees. 
Search trees that minimize the average 
search time, with respect to a given set 
of probabilities over the keys. 
So I'm going to assume in these videos 
that you remember the basics of the 
search tree data structure. 
So if you need to jog your memory, you 
might want to review the relevant video 
from part one. 
So a search tree is they contain objects, 
each object has a key drawn from some 
totally ordered sets. 
And the search tree property states that 
at each node of a search tree, say it 
contains some object with a key of value 
x, it better be the case that everything 
in the left subtree of that node has keys 
less than x and everything in the right 
subtree under the node with key x has to 
have keys bigger than X. So that has to 
be true simultaneously at every single 
node of the search tree. 
The motivation for the search tree 
property is so that searching in a search 
tree just involves following your nose, 
just like binary search in assorted 
array. 
So if you're looking for, say, an object 
with key seventeen, you know whether to 
go left or right from the root, just 
based on the root's key value. 
If the root has key value twelve, you 
know where seventeen, if it exists, has 
to be in the right sub-tree. 
So you recursively search the right 
sub-tree. 
If the root has value 23, you know where 
seventeen has to be in the left sub-tree. 
So that's where you recursively search. 
Something we originally discussed in the 
context of balanced binary search trees. 
Like red black trees. 
And I'm going to reiterate now. 
Is that, for a given set of keys, there 
are many, many valid search trees 
containing those keys. 
So just to remind you how this works, 
let's even just suppose there were only 
three keys in the world, x, y, and z, 
with x less than, y less than z. 
One obvious search tree would be the 
balanced one. 
So that would put the middle element y at 
the roots. 
You would have left child x and right 
child y. 
But there's also the two chain search 
trees containing these three keys, one 
with the smallest element x at the root, 
the other with the largest element z at 
the root. 
So, given the multiplicity of solutions, 
all of the different search trees one 
could use to store objects with a bunch 
of keys, an obvious question is which one 
is the best. 
What's the best search tree to use out of 
all of the possibilities? 
So I don't blame you if you've got a 
sense of deja vu. 
We did already ask and answer this 
question in one sense, when we discussed 
red black trees. 
There we argued that the best thing to 
have is a balanced binary search tree 
that keeps the height as small as 
possible and therefore the worst case 
search time, which is proportional to the 
height, as small as possible, namely 
logarithmic in the number of objects in 
the search tree. 
But now let's make the same kind of 
assumption that we made when we discussed 
Huffman codes. 
That is, let's assume that we actually 
have accurate statistics about how 
frequently, each item in the tree is 
going to be searched for. 
So maybe we know that item X is going to 
be searched for 80% of the time, 
whereas Y and Z will only be searched for 
10% each. 
Could we then improve, upon the perfectly 
balanced search tree solution? 
So let me make this question more 
concrete, just by asking you to compare 
two candidate solutions. 
On the one hand the balance tree, which 
has Y at the root and X and Z as 
children. 
On the other hand, the chain which has X 
as a root, Y as its right child and then 
Z as the right child of Z, excuse me, Z 
as the right child of Y. 
So what is the average search time in 
each of these two search trees, with 
respect to the search frequencies I told 
you, 80% for X, 10% for Y, and 10% for Z? 
And when I say the search time for a 
node, I just mean how many nodes do you 
look at on route to discovering what 
you're looking for, including that last 
node itself. 
So the search time for something that's 
at the root for example, 
that would be a search time of just one, 
because you only look at the root to find 
it. 
All right so the correct answer to the 
quiz is the fourth option 1.9 and 1.3. 
So to see why, let's just compute the 
average search time in each of the two 
proposed search trees. 
In the first one, with Y at the root, 
well, 80% of the time we're going to 
suffer a search time of two, whenever we 
look for X we have to look at the root Y 
and then we look at the X. 
So we pay two 80% of the time and that 
contributes 1.6. 
10% of the time we get lucky, we see a Y, 
that contributes a.1. 
10% of the time we see Z, that 
contributes another.2 for a total of 1.9. 
By contrast think about the chain that 
has X at the root. 
Here 80% of the time we get lucky, and we 
only have to pay one. 
Two for every search for X so that 
contributes only 8. to the total. 
It is true our worst case search time has 
actually gone up. 
When we see its Z we suffer a search time 
of three which never ever happened in the 
balance case. 
But we pay that three only ten% of the 
time, that contributes a.3. 
The remaining ten% of the time we suffer 
a search time of two to look for Y. 
That gives us a total of 1.3. 
And the moral of the story, of course, is 
that this example exposes an interesting 
algorithmic opportunity. 
So the obvious, quote unquote, solution 
of a perfectly balanced search tree, need 
not be the best search tree when 
frequency of access is non uniform. 
You might want to have unbalanced trees, 
like this chain if it gets extremely 
frequent searches, closer to the roots to 
have smaller search time. 
So the obvious question is then, given a 
bunch of items and known frequencies of 
access, what is the optimal. 
Search tree. 
Which search tree minimizes the average 
search time? 
So that brings us to the formal problem 
definition. 
We're told n objects that we gotta store 
in a search tree and we're told the 
frequency of access of each. 
So let's just keep things simple and the 
notation straightforward, let's just say 
the items are named from one, two, all 
the way up to n, and that is also the 
order of their respective keys. 
So key one is the frequency of searching 
for the item with the smallest key, and 
so on. 
You might wonder where these frequencies 
come from. 
How would you know exactly how frequently 
every possible key will be searched for. 
It's going to depend on the application. 
And you know there will be applications 
where you're not going to have these 
kinds of statistics. 
And that's where you'd probably want to 
turn to a general purpose balanced binary 
search tree solution, something like a 
red black tree, which guarantees you that 
every search is going to be reasonably 
fast. 
But it's not hard to think about 
applications where you are going to be 
able to learn pretty accurate statistics 
about how frequently different things are 
searched for. 
One example might be something like a 
spell checker. 
So if you would implement that by storing 
all of the legally spelled words in a 
search tree, and as you're scanning a 
document, and every time you hit a word, 
you look it up in the search tree to see 
if it's correctly spelled or incorrectly 
spelled. 
You can imagine that after, you know, 
scanning through a number of documents, 
you would have pretty good estimates, 
about how frequently things get searched 
for. 
And then you could use those estimates to 
build a highly optimized binary search 
tree for all future documents. 
If you're in some other kind of 
application where you're concerned about 
these frequencies changing over time, so, 
for example, if they're privy to trends 
in the industry, you could imagine 
rebuilding the search tree every day or 
every week or whatever, based on the 
latest statistics that you've got. 
In any case, if you're lucky enough to 
have such statistics, what you're 
going to want to do is to build a search 
tree, which, on one hand is valid. 
It should satisfy the search tree 
property and on the other hand, should 
make the average search time as small as 
possible. 
So let me go ahead and write down a 
formula for the average search time. 
It's the one that you would expect it to 
be. 
And also introduce some notation, 
namely capital C of T. 
We'll denote the average search time of a 
proposed search tree T. 
So for these lectures, we're going to 
focus on the case where all searches are 
successful. 
The only thing that ever gets searched 
for is stuff that's actually in the tree. 
But everything we'll talk about in these 
lectures and the algorithm is easily 
extended to accommodate the case where 
you also have unsuccessful searches and 
statistics about how frequent the various 
unsuccessful searches are. 
But, if there is only successful 
searches, then we average only over the n 
elements that are stored in the tree. 
So we sum over each of the items I, we 
weight it by the probability or the 
frequency of it's access P sub I, and 
then that gets multiplied by the search 
time required in the tree T to find the 
item I. 
And as we discussed in the quiz, the 
search time for a given key I and a given 
tree T is just the number of nodes you 
have to visit until you get to the one 
containing I. 
So if you think about it, that's exactly 
the depth of the node in this tree plus 
one. 
So for example, if you're lucky enough 
that the key is at the root, the depth of 
the root is zero, and we're counting that 
as a search time as one. 
So it's depth plus one. 
So, one minor point. 
It's going to be convenient for me to not 
insist that the PI's sum to one. 
Of course, if the PIs were probabilities, 
they would sum to one. 
But I'm going to go ahead and allow them 
to be arbitrary positive numbers. 
And that, for the same reason, I'm 
going to sometimes call capital C of T, 
the weighted search time rather that the 
average search time, 
because I won't necessarily be assuming 
that the PIs, sum to one. 
With that said, go ahead and, you know? 
Think of that as the canonical special 
case in your mind as we go through these 
lectures. 
So for example, in the case where these 
are probabilities, where the PIs sum to 
one, we could always as a reference point 
use a red black tree as our search tree. 
But, as we've seen, when these PIs are 
not uniform, you can generally do better. 
And so that's the point of this 
computational problem. 
Exploit the non-uniformities in the given 
probabilities to come up with the best 
possibly unbalanced search tree as 
possible. 
So I'm sure many of you will have noticed 
some of the similarities between this 
computational problem of optimal binary 
search trees. 
And one that we already solved back in 
the greedy algorithm section, 
namely Huffkin, Huffman codes, which, 
amongst all prefix free binary codes, 
minimize the average encoding length. 
So, let's just be precise about the 
similarities and differences between the 
two problems, and in particular, why we 
can just reuse the algorithm we already 
saw off the shelf, to solve the optimal 
BST problem. 
So it's, of course, super similar in the 
two cases, is the format of the output. 
In both problems, the responsibility of 
the algorithm is to output a binary tree, 
and the goal is to minimize the average 
depth, more or less, 
where the average is with respect to 
provided frequencies, over a bunch of 
objects that we care about. 
Characters from an alphabet in the case 
of Huffman codes, 
and a bunch of objects with keys from 
some totally ordered set in the binary 
search tree case. 
And it is true that in the optimal BST 
case, we're not really averaging depths, 
we're averaging depths plus one but if 
you think about it that's exactly the 
same thing. 
More important is to understand the 
differences between the problem solved by 
Huffman codes and the computational 
problem that we have to solve here. 
In the Huffman code case, we had to 
output of binary code and the key 
constraint was that it be prefix-free. 
And in the language of trees what that 
meant is that the symbols that we're 
encoding had to correspond to the leaves 
of the tree. 
Symbols could not correspond to internal 
nodes of the tree that we output. 
Now in the optimal binary search tree 
problem we do not have this prefix free 
constraint, so we're going to have a 
label that is an object at every single 
node of our tree, whether it's a leaf or 
not. 
But, we have a different, seemingly quite 
a bit harder constraint to deal with, 
namely the search tree property. 
So remember, back in the Huffman code 
case we didn't even have an ordering on 
the symbols of the, of the alphabet. 
There wasn't a sense that one of them was 
less than another, it wouldn't even make 
sense to talk about the search tree. 
Brought in that context. 
Here by contrast, we're given these keys 
and there's this total lowering on them. 
And we'd better satisfy the search tree 
property in the tree that we output, that 
is in every single node in the tree that 
we output, it better be the case that all 
keys in the left sub-tree are less than 
the key at that node, and all keys in the 
right sub-tree are bigger than the key at 
that node. 
That's a constraint that we have no 
choice but to satisfy. 
This constraint is harder in the sense 
that no greedy algorithm, Huffman's 
Algorithm or otherwise, solves the 
optimal binary search tree problem. 
Rather we're going to have to turn to the 
more sophisticated tool of dynamic 
programming to design an efficient 
algorithm for computing optimal binary 
search trees. 
That's the solution we'll start 
developing in the next video.