So now that we have our magic formula, 
our recurrence, all that's left to do is 
to systematically solve the subproblems. 
As usual, it is crucial that we solve the 
subproblems in the right order, from 
smallest to largest. 
How should we measure the size of a 
subproblem in the optimal binary search 
tree problem? The natural way to do it is 
the number of items in the subproblem. 
So if you're starting at I and you're 
going till J, the number of items in that 
subproblem is j - i + 1 that, and that's 
going to be our measure of subproblem 
size. 
So let's bust out our [INAUDIBLE] array. 
The number of dimensions of this array is 
going to be two and that's because we 
have two different degrees of freedom for 
annexing subproblems, 
one for the start of the contiguous 
interval, one for the end. 
So the outer for loop is going to control 
the subproblem size. 
It's going to ensure that we solve all 
smaller subproblems before proceeding to 
larger subproblems. 
Specifically, we'll be using an index s S 
and in the iteration of this outer for 
loop, whatever the current value of S is, 
we're only going to consider subproblems 
of size s + 1. 
So you should think of s as representing 
the difference between the larger index j 
and the earlier index i. 
The inner for loop controls the first 
item in the contiguous interval that 
we're looking at, 
so that's just i. 
And now, all we have to do is rewrite the 
reccurrence in terms of the array entries 
and with this change variable where S 
corresponds to j - i. 
That is for a given subproblem, starting 
with the item i and ending with the item 
i + s. 
We just by by brute force pick the best 
route, so the route here is going to be 
similar to i and i + s. 
Regardless of the choice of the route, we 
pick up the constant, the sum of the Pks 
where here, K, is ranging from the first 
item i to the last item i + s. 
And then we also look at the previously 
computed optimal solution values for the 
two relevant subproblems, 
one starting in i, ending in r - 1. 
The other starting in r + 1 and ending at 
i + s. 
So a couple of quick comments about the 
two array lookups on the right-hand side 
of this formula. 
so first of all, if you choose i to be, 
the root to be the first item i, then the 
first lookup doesn't make sense. 
If you choose it to be the last item, the 
second array lookup doesn't make sense. 
In that case, it's understood we're just 
going to interpret these lookups as zero. 
Of course, in an actual implementation, 
you'd have to include that code but, I'll 
let you take care of that on your own. 
So the second comment is just our usual 
sanity check and again, you should always 
do this when you write out a dynamic 
programming algorithm. 
When you write down your formula to 
populate the array entries, 
make sure that on the right-hand side, 
whenever you do an array lookup, 
that is indeed already computed and 
available for constant time lookup. 
So in this case, whatever our choice of 
the root is, the two relevant subproblems 
are going to involve strictly fewer items 
than what we started with. 
And therefore, the two subproblem lookups 
on the right-hand side will indeed have 
been computed in some previous iteration 
of the outer for loop. 
Remember, the outer for loop is ensuring 
we solve some problems from smallest 
number of items up to largest number of 
items. 
And of course, after the two for loops 
complete, what we really care about is 
the answer in A of one comma n, 
that is the optimal binary search tree 
value for all of the items that's the 
eventual output. 
Some students like to think about these 
double for loops pictorially. So let's 
imagine A, the 2-D array is laid out as a 
grid. 
So imagine the x-axis correspond to the 
index i, 
that is the first item in the set of 
items we're looking at and the y-axis 
corresponding to j, the last item in the 
current set. And let me single out the 
diagonal of this grid, 
so these are subproblems corresponding to 
i = j, 
that is subproblems with a single 
element. 
Now, we only ever solve problems where j 
is at least as large as i, so that means 
we're really only filling in the upper 
left or northwestern part of this table. 
So we never bother to fill in the 
southeastern, the bottom right part of 
this table. 
We just sort of think of it all as zero. 
Now, in the first outer iteration. 
So, when s0. 
= 0, that's when our dynamic programming 
algorithm solves, in turn, each of the n 
single item problems. 
So, in the first iteration of this double 
for loop, it's going to solve the 
subproblem A11. 
In the next iteration of the inner for 
loop, it's going to proceed the A22, then 
A33, and so on. 
In each of those, both of the array 
lookups are going to just correspond to 
zero and we're just going to fill in this 
diagonal with the base cases, 
where Aii is just the probability of item 
i. 
Then, as the dynamic programming 
algorithm proceeds, we're going to be 
filling in the upper left portion of this 
table diagonal by diagonal. 
Each time we increment s, the index in 
the outer for loop, we're going to march 
up to the next northwesternmost diagonal, 
and then as we step through the possible 
values of i, we're going to fill in that 
diagonal one at a time moving from 
southwest to northeast. 
When we're filling in the value of a 
subproblem on one of these diagonals, all 
we need to do is lookup the value for two 
subproblems on lower diagonals. Lower 
diagonals correspond to subproblems with 
strictly fewer items. 
So that's it. 
That's the dynamic programming algorithm 
that computes the value of an optimal 
binary search tree given a set of items 
with probabilities. 
I'm not going to say anything about 
correctness. 
It's the, the same story as we've seen in 
the past. 
All the heavylifting is improving the 
optimal substructure lemma, 
that gave us the correctness of our 
occurrence given that our magic formula 
is correct and we're just applying it 
systematically, correctness of the 
dynamic programming algorithm follows in 
a straightforward way, just by induction. 
Let me, however, make some comments about 
the running time. 
So, let's just follow the usual 
procedure. 
Let's just look at how many subproblems 
got to get solved and then how much work 
has to get done to solve each of those 
subproblems. 
So as far as the number of subproblems, 
it's all possible choices of i and j, 
where i is at most j, or in other words, 
it's essentially half of that n by n 
grid. 
So this is roughly n squared over two, 
let's just call it theta of n squared, 
so a quadratic number of subproblems. 
Now, for each of the subproblems, we have 
to evaluate this recurrence, we have to 
evaulate the formula, which conceptually 
is a breadth-first search through the 
number of candidates that we've 
identified. And a disctinction between 
this dynamic programming algorithm and 
all of the other ones we've seen 
recently, sequence allignment, knapsack, 
computing independent set of the line 
graphs, is that it's actually kind of a 
lot of options for what the optimal 
solution can be. That is, our 
breadth-first search, for the first time, 
is not over a merely constant number of 
possibilities. 
We have to try every possible route, 
each of the items in our given subproblem 
is a candidate route and we try them all. 
So, given a start item of i and an end 
item of j, there's j minus i plus one 
total items and we have to do constant 
work for each one of those choices. 
So there will be subproblems, some 
subproblems that we can evaluate quickly 
and only say constant time if i and j are 
very close to each other, 
but for a constant fraction of the 
subproblems we have to deal with, this is 
going to be linear time, 
theta of n time. 
So over all, that gives us a cubic 
running time, theta of n cubed. 
Alright, so I would say this running time 
is sort of okay, not great. 
So it is polynomial time, that's good. 
That's certainly way, way, way faster 
than enumerating all of the exponentially 
many possible binary search trees, 
so it blows away breath-first search. 
But it's not something I would call 
blazingly fast or for free primitive or 
anything like that. 
So you're going to be able be to solve 
problem sizes with n in the 100's, but 
probably not n in the 1000's. 
So that will cover some applications 
where you'd want to use this optimal 
binary search tree algorithm, but not all 
of them. So it's good for some things, 
but it's not a universal solution. 
On the other hand, here's a fun fact. 
And the fun fact is you can actually 
speed up this dynamic programming 
algorithm significantly. 
You can keep with the exact same 2-D 
array with the exact same semantics. 
Again, each index is going to correspond 
to the subproblem with the optimal binary 
search tree between items i and j 
inclusive. 
But, you can actually fill up this entire 
table all n squared entries using only a 
total of n squared time. 
That is on average, constant work per 
subproblem. 
So this fun fact, it's very clever. 
It's really more intricate than what we 
discussed in this video here, 
but it, it's not impossible to read. 
So if you're interested, I encourage you 
to go back to the original papers or 
search the web for some other resources 
on this optimized speed up version of 
this dynamic programming algorithm. 
I mean, at a very high level, 
sort of from 30,000 feet, 
the goal is to avoid doing this 
breadth-first search over all possible 
routes in every single subproblem. 
And it turns out there's structure, nice 
structure in this optimal binary search 
tree problem that allows you to piggyback 
on the work done in smaller subproblems. 
So, in smaller subproblems, you already 
searched over a bunch candidate roots and 
it turns out using the results of those 
previous breadth-first search is, you can 
make inferences about which subset of the 
current set of roots might conceivably be 
the ones that determine the recurrence. 
And so, that lets you avoid searching 
over all of the possible candidates for 
the roots, 
instead focusing just on a very small 
set. 
In fact, the average, on average, 
constant number of possible roots over 
all of the subproblems. 
And needless to say, this speeding up of 
the running time from cubic to quadratic 
really significantly increases the 
problem sizes that you can now apply this 
algorithm to. 
So now, instead of being stuck in the 
hundreds, you'd certainly be able to 
solve problem sizes in the 1000's, 
possibly even in the 10,000's using this 
quadratic time algorithm. 
Very cool.