So, now that we understand that the 
optimal solution to the sequence element 
problem has to be only one of three 
candidates, we're going to be easily able 
to formulate a recurrence, identify the 
relevant sub-problems and derive an 
efficient dynamic programming algorithm 
for the problem. 
So, here is the culmination of our work. 
On the previous video, we thought about 
an optimal alignment of some pair of 
strings X and Y, and we notice that there 
are three cases for the contents of the 
final position. 
Either there's no gaps or there's a gap 
on top or there's a gap on the bottom. 
in case one, where there's no gaps, XM 
and YM get matched. 
And we proved that the induced alignment 
which is of the smaller strings X prime 
and Y prime has to be optimal in its own 
right. 
In the second case where the character 
little X sub M gets matched with a gap 
induced alignment this time of X prime 
and Y. 
Has to be optimal in its own right, and 
the third case where little y sub n gets 
matched with the gap, the induced 
alignment now of x and y prime. 
Must be optimal. 
So one way to think about this kind of 
assertion is it says that the optimal 
solution to a problem, to a sequence of a 
lot of problem depends only on the 
solutions to three different smaller 
sub-problems, one involving x, x prime 
and y prime with characters peeled off of 
both of the strings. 
One involving x prime and y and one 
involving x and y prime. 
But in all of the cases, all that we care 
about are sub-problems in which a single 
character was peeled off from the right 
from one or both of the strings that we 
started with. 
The situation is very similar to in our 
previous two examples. 
We have independent sets on line graphs 
and the nap sack problem and the 
independent set problem whenever we, we 
only cared about sub problems obtained by 
plucking off either one or two vertices 
from the given line graph. 
So all we ever cared about were prefixes 
of the original line graph. 
In the nap sack problem we got sub 
problems by plucking off the last item 
and perhaps also reducing the nap sack 
capacity by some interval amount. 
So there were two dimensions in the nap 
sack problem for which sub problems could 
decrease in size, then number of items in 
the residual nap sack capacity. 
So we use two parameters to keep track of 
the sub problems. 
And what we cared about were all possible 
prefixes of the items and all possible 
residual integral capacities, at most the 
original knapsack capacity. 
So what's up in the sequence alignment 
problem? 
Well here, sub problems get smaller by 
plucking a character off of the first 
string and or the second string. 
So again there are two ways in which the 
sub problem can get smaller, either the 
first string or the second string. 
So we'll again use two different 
parameters, one to figure out how much 
we've plucked off of the first string, 
the second one to figure out how much 
we've plucked off of the second string. 
Right. 
But all we care about. 
The only relevant sub problems involved. 
Prefixes of the two original input 
strings X and Y. 
That is, the only sub problems that we 
care about have the form x I y j, where x 
I denotes the first I letters of capital 
x, some prefix of x, and y j denotes some 
prefix of y, the first j letters of y. 
So lets now move from the sub-problems 
we're going to use in our dynamic 
programming algorithm to the recurrence 
that we're going to use. 
And the recurrence really all it does is 
compile our understanding of the optimal 
solution and how it depends on the 
solution of the smaller sub-problems into 
an easy to use mathematical formula. 
So I'll use the notation P sub i j for 
the value of the optimal solution to the 
corresponding sub problem, the one 
involving the prefix X i and the prefix Y 
j So for a given set of positive values 
for i and j, what is Pij? 
Well, there are three possibilities. 
Case one is where the final position of 
the optimal alignment doesn't have any 
gaps, so it matches the final character 
of X sub i, that is little x sub i and 
the final character of the prefix capital 
Y sub j, that is the character little y 
sub j. 
It matches them together and just reuses 
an optimal alignment for the smaller 
strings, Xi-1 and Yi-1 Case two is where 
the last letter of the first string, that 
is little x of i gets matched with a gap. 
And that case the penalty of the 
corresponding alignment is the penalty of 
a gap plus whatever the optimal alignment 
is of the first i minus one letter of 
capital X plus the first J letter of 
Capital Y. 
The symmetrically case three we pay for a 
gap and then we pay whatever the optimal 
alignment is of all of the first I 
letters of capital X with the first j 
menace one letters of Y. 
This is the case where the last letter of 
the second string gets matched with the 
gap in the final position of the optimal 
alignment. 
So we know that the optimal solution has 
to look like one of these three things, 
we don't know which, so in the recurrence 
we'll just in effect do brute force 
search among the three outcomes. 
We just remember, we just choose the 
minimum of the three possibilities. 
So that's the formal recurrence. 
It's correctness really just follows 
immediately from the work we already did, 
understand what the optimal solution has 
to look like. 
So, before we state the algorithm where 
we systematically solve all of the sub 
problems using this magical formula. 
Let's just make sure we get the edge 
cases, the base cases where i or j equals 
zero correctly sorted out. 
So specifically what is the value of p 
I,0 and also it turns out p of zero, i 
where i here is just some non-negative 
integer. 
Alright. 
So the answer to this question is the 
second one, is B. 
And I hope if you could keep the notation 
straight then the answer was fairly 
clear. 
So let's remember what, what does PIJ 
mean? 
That's the total penalty of an optimal 
alignment between the first i letters of 
X, and the first j letters of Y. 
So consider Pi zero. 
So now we're asking about aligning the 
first zero letters of X with the first 
zero letters of Y. 
That is with the empty string. 
Well the optimal way to match any string 
to the empty string is you're just going 
to insert gaps into the empty string to 
equalize their lengths. 
And so if your string has length i, you 
need to insert i gaps. 
What's the? 
Penalty of that alignment is just i times 
the penalty for a single gap, and that's 
the answer here in B. 
So we're ready now to give the algorithm, 
and as with all these dynamic programming 
algorithms once you know the sub-problems 
and once you know the recurrence that 
relates their solutions there's really 
nothing to do. 
All you do is systematically answer solve 
all of the sub-problems moving from 
smallest to largest. 
So we're going to use an array A to keep 
track of the solutions of all of these 
sub-problems. 
A is going to have two dimensions. 
The reason for two dimensions is we have 
two independent parameters which are 
keeping track of the sub-problem size. 
One for how many letters of X we're 
dealing with, and the second dimension 
for how many letters of Y that we're 
dealing with. 
That's analogous to the knapsack problem, 
where we also had two dimensions to keep 
track of. 
The number of items in play, and the 
residual knapsack capacity. 
We just figured out what the base case 
is, so we just solved those in a 
pre-processing step. 
So if one of the two indices is zero, 
then the optimal solution value is just 
the gap penalty times the non-zero index. 
And, now we just go to our double four 
loops. 
It's double four loops because we have 
two indices into out array. 
And whenever we get into a sub problem, 
we just evaluate the recurrence invoking 
of these solution to the already computed 
smaller sub problems. 
So one sanity check you should always 
apply when you're writing out the code 
for a dynamic programming algorithm: when 
you look at the right hand side of your 
recurrence, when you look at these 
purportedly already solved subproblems 
whose solutions you're using to solve the 
current subproblem, make sure you have 
actually already solved those 
subproblems. 
So in this case we're good to go because 
the indices of the subproblems are only 
less than the entry that we're filling in 
right now. 
So indeed all three of the relevant 
subproblems, A-I - 1 j - 1 A-I - 1 j, and 
A-I j - 1 they've already been computed 
in earlier iterations of our double four 
loop. 
So they're just hanging out, waiting to 
get looked up in constant time. 
And as usual once you've actually figured 
out the key ingredients for the dynamic 
programming solution, namely the 
sub-problems and the recurrence, it's 
pretty much self evident why the things 
going to work and it's also self evident 
exactly what its running time is going to 
be. 
So why is the algorithm correct? 
That is, why does it terminate with every 
entry Aij equal to the true optimal 
penalty Pij of the corresponding 
sub-problem. 
Well, this just follows because our 
recurrent is correct, that's where all 
the hard work was, and then we're just 
systematically solving all of the 
sub-problems. 
So, formally, if you like, it would be 
proof by induction. 
So, the running time is completely 
trivial to evaluate. 
In each iteration of this double four 
loop, we do a constant amount of work. 
We just need to look up three things in 
constant time and make a couple of 
comparisons. 
How many four loops are there? 
Well, M iterations of the outer four 
loop, N iterations of the inner four 
loop. 
So we suffer the product, M times N. 
That is, the running time is proportional 
to the product of the lengths of the two 
strings. 
So depending on the application, you may 
be content to just have an algorithm 
compute for you the nw score, the total 
penalty of an optimal alignment or 
perhaps you're actually interested in the 
alignment itself. 
And just as we discussed with independent 
sets of line graphs, by tracing back 
through the filled in table, you can 
indeed reconstruct an optimal solution. 
So let me just give you the high level 
idea of how it works. 
It's going to to follow the same template 
and all you think through the details of 
why this really works in the privacy of 
your own home. 
So assume that you've already run the 
algorithm on the previous slide. 
That you've filled in all the entries of 
this two d array capital A. 
Now we want to trace back. 
So where are we going to start tracing 
back this filled in table? 
Well, we are going to start with a 
problem that we actually care about, 
namely the largest problem A of M comma 
N, that's what we want the alignment for. 
Now we know the softball alignment looks, 
has one of three candidates, we know 
there's three possible situations for the 
contents of the final position of that 
alignment. 
More over, when we filled in this entry 
of the table, we explicitly compared the 
three possibilities to figure out which 
one was the best. 
So you know, perhaps on the forward pass 
we actually cached the result of the 
comparison, or in the worst case we can 
just go back and re-compute, and figure 
out which of those three. 
Pieces was used to fill in this entry, 
and depending on which one of the three 
candidates won, that tells us, what 
should be, the contents of the final 
position of the optimal alignment. 
If case one was used to fill in this 
entry, we should match, little x sub n 
and little y sub n. 
If case two was used to fill in this 
entry, we should match little x sub n 
with the gap. 
If case three was used, to fill in this 
entry, we should match little y sub n 
with the gap. 
If there was a tie, we get to pick any of 
them. 
Arbitrarily, all of them will lead to 
optimal alignments. 
Then of course, after figuring out what 
to do in this final position. 
We have an induced sub problem involving 
x prime and or y prime. 
That tells us a, a previous entry of the 
table to go to. 
And we just repeat the process. 
We, again, figure out which of the three 
cases was used to fill in this entry. 
That tells us how to fill in the next 
right most position of the alignment. 
And we just keep going until we fall off 
the table. 
So what do you do when you fall off the 
table? 
Well, once one of the indices I or J gets 
all the way down to zero, now you have no 
choice. 
So now one of the strings is empty and 
the other one has some number of symbol. 
So you should just insert the appropriate 
number of gaps to equalize the lengths. 
One thing that's pretty neat is that this 
trace back procedure is efficient. 
In fact, it's way more efficient, in 
general, than the forward pass. 
For the forward pass, you have to fill in 
every single one of the m * n entries. 
But in this trace back procedure, each 
time you track back. 
One of the two indices, at least, will 
get decremented. 
So that says, you're going to complete 
this trace back in O of m + n time with 
an optimal alignment of the original two 
strings.