In this video we'll establish the 
correctness of Huffman's algorithm. 
Meaning that, that greedy algorithm 
always computes the prefix-free binary 
code that minimizes the average encoding 
length. 
Let me also remind you of our expression 
for the average encoding length in terms 
of a tree capital T. 
We'll be working with this expression 
quite a bit in this proof. 
So we average over the symbols so we sum 
over the symbols I and the alpha bit, 
capital sigma. 
We weight the various terms by the 
frequencies which were given as part of 
the input and remember that the encoding 
length that we're producing for a given 
symbol I is exactly the same as the depth 
of the corresponding leaf in the tree, 
capital T. 
So I quite like the proof of Huffman's 
Theorem. 
It's a cool proof and we will, it will 
give us an opportunity to revisit the 
themes that we've been studying in 
proving the correctness of various greedy 
algorithms. 
At a high level, we're going to proceed 
by induction, induction on the size N of 
the alpha bit sigma. 
So there's a vague parallel you could 
draw with how we proved, say Dijkstra's 
Algorithm correct back in part one, where 
by induction we show that each successive 
iteration of the algorithm is correct 
assuming that the previous ones were. 
But in the inductive step we're going to 
use an exchange argument. 
So to justify each of our mergers we're 
going to argue that any optimal solution 
could. 
You massage into one that looks more like 
ours without making it any worse, and 
that's how we argue that each of our 
individual mergers is in fact a 
reasonable idea, a good idea. 
So more precisely we're going to go by 
induction on the size-end of the 
alphabet. 
I'm going to assume to make the problem 
non-trivial that the alphabet size is at 
least two. 
So as in any proof by induction we begin 
with a base case and then we do the 
inductive step where in we can assume the 
inducted hypothesis. 
So the base case is when we just have a 
two letter alphabet. 
If you go back to Huffmans algorithm 
you'll see that it does the obvious thing 
in the base case or just outputs a tree 
that encodes one of the symbols with the 
letter zero, and one of the symbols with 
just the bit one. 
And that's the best you can do. 
You need a bit to encode every symbol in 
that tree uses exactly one bit for each 
symbol, so Huffman's algorithm is optimal 
in that trivial special case. 
So for the inductive set, 
we focus on an arbitrary problem instance 
where the alphabet size is at least, 
three. 
Now of course, whenever you're doing a 
proof by induction the thing you've 
really got going for you is the inductive 
hypothesis. 
You assume, that the assertion that 
you're trying to prove. 
In this case, correctness of Huffman's 
algorithm. 
Is true for all smaller values of N. 
That is, we're going to assume that if we 
invoke the algorithm on any smaller 
input. 
As of course we do, in this recursive 
algorithm. 
We get to assume that the algorithm 
returns the correct solution to that 
smaller subproblem. 
So to understand how we're going to pass 
from the inductive hypothesis. 
So this assumption we're correct on all 
smaller inputs to the inductive step. 
The assertion that we're correct on the 
current input, 
we need to take a closer look at how the 
original input with it's alphabet sigma 
relates to the smallest of problem with 
the smaller alphabet sigma prime with the 
two letters fused into one. 
Which is the one we solve correctly by 
assumption recursively. 
So, recall our notation from the pseudo 
code for Huffman's algarithm. 
What the algorithm does is take the two 
symbols with the smallest frequencies, 
and let's call those two symbols A and B, 
and it replaces those two symbols with a 
single symbol AB, a sort of meta-symbol 
representing the presence of either A or 
B. 
We also in the quiz discussed how the 
sensible frequency for this medi-symbol 
AB is the sum of the frequencies A and B. 
Last video when we're developing our 
intuition for what a greedy algorithm for 
successive bottom of mergers might look 
like, we notice that when you merge two 
symbols A and B which are really doing is 
committing to outputting a tree at the 
end of the day, at the end of your 
algorithm in which the symbols A and B 
appear as siblings in which they have 
exactly the same parent. 
Therefore, there is an exact 
correspondence 101 correspondence between 
on the one hand true. 
Trees whose leaves are labeled with the 
symbols in sigma prime, so that is there 
is no leaf labeled A there is no leaf 
labeled B there is instead one labeled 
AB. 
So there's a correspondence between those 
kinds of trees, labels with the leaves 
corresponding to sigma prime and. 
Trees for the original alphabet sigma in 
which the symbols A and B are siblings, 
in which they have exactly the same 
parent. 
Given a tree as shown in the left 
picture, that is given a tree for T 
prime, the leaves labeled according to 
sigma prime, you can, and this is in fact 
exactly what we do in Huffman's 
Algorithm, split the leaf with the meta 
symbol AB, make it an internal node, and 
give it two leaves with labels A and B. 
That produces a tree of the form on the 
right. 
Conversely, given a tree of the form on 
the right, that is its leaves are labeled 
according to sigma, and it just so 
happens that A and B show up as leaves of 
that tree, you can produce a tree for 
sigma prime just by contract. 
A and B together. 
So, sucking up A and B into their parent, 
and labeling the parent, AB. 
So you can go back and forth between 
these two types of trees. 
One, arbitrary trees for sigma prime, and 
trees of a special kind for sigma, trees 
in which A and B happen to be siblings. 
This subset of trees for sigma are 
important enough to be given its own 
notation. 
So, let me denotes by capital X, sub AB. 
The trees with leaves labeled according 
to sigma in which, it just so happens, 
that A and B appear as siblings. 
So, that'll be some trees for sigma, but 
not all of them. 
Only the ones in which A and B are 
siblings. 
So this correspondence between solutions 
to the smallest sub-problem and solutions 
of a particular form for the original 
problem has a important property, namely 
it preserves the objective function 
value. 
It preserves the average encoding length. 
Okay, that's not quite true but it's 
almost true. 
It's good enough for our purposes. 
It preserves the average encoding length 
up to a fixed constant. 
So let me show you the computation which 
demonstrates that point. 
So consider any pair of matched trees, t 
prime and t. 
And by matched I just mean t prime is any 
old tree who's leaves are, leaves are 
labelled according to sigma prime. 
And t is the tree you would obtain if you 
split the leaf with meta label ab in the 
usual way. 
You replace it with an internal node and 
children with labels a and b. 
So that's going to be the corresponding 
tree T. 
So take any such matched pair and let's 
look at the difference between their 
average encoding lengths. 
Now remember the average encoding length 
of a tree is just a sum over the symbols 
of the relevant alphabet. 
And what we got going for us here is that 
sigma and sigma prime are almost exactly 
the same. 
Right? 
So the only difference is sigma prime has 
the meta symbol AB, whereas sigma has the 
individual symbols A and B. 
Furthermore the two trees T and T prime 
are almost exactly the same, the only 
difference being that T prime has this 
leaf with a meta label AB, whereas T has 
two corresponding nodes one level down 
with the labels A and B. 
So when we take the difference of these 
two sums, everything cancels out except. 
The, the tree T contributes two sum ans, 
one for the leaf with label A, one for 
the leaf with label B, and the tree T 
prime contributes with a minus sign. 
One sum and that corresponding to the 
leaf width label AB. 
So when the dust clears and we cancel out 
all the terms, what we're left with is. 
The term for the leaf A, and with the 
frequency of A times the depth of the 
leaf A in the tree T. 
A similar term for the leaf would label B 
in the tree T. 
And then with a minus sign the frequency 
of the meda label AB times it's depth in 
the tree T prime. 
But we are certainly not done with our 
simplifications. 
There is a strong relationship between 
the frequencies that we're staring at, 
and a strong relationship between these 
various depths. 
Let's begin with the frequencies. 
How did we define the frequency of the 
meta symbol AB? 
Well remember our quiz. 
It made sense to divide it as the sum of 
the frequencies of A and B. 
What about the depths? 
Well you know, this symbol AB is at 
whatever depth it is in the tree T prime, 
say it's at depth ten, something like 
that. 
Remember T is obtained from T prime 
simply by splitting this leaf AB and 
giving it two new children with symbols A 
and B. 
So if the meta symbol AB was at depth ten 
in T prime, then the leaves A and B are 
going to be at depth eleven in T. 
So the depth is simply one more than 
whatever it was previously in the tree T 
prime. 
So, these relationships are going to 
result in a second wave of cancellations. 
So, just to make that crystal clear, 
let's calls the depth of AB in T prime, 
D. 
So, D is what I was calling ten earlier. 
So, A and B are both at depth D plus one 
in the tree T. 
So the first term becomes the frequency 
of A times the depth D plus one. 
The second term becomes the frequency of 
B plus the depth D plus one. 
And then the third term becomes the sum 
of the frequencies of A and B times the 
depth D. 
And, when the dust settles, yet again, 
we're left with a constant PA plus PB. 
So, the sums of two frequencies. 
And one thing I want you to really 
understand is that the difference between 
these two average encoding links is just 
some constant. 
It does not depend on which trees we 
started with. 
So if we choose a perfectly balanced tree 
and we do this difference, we get some 
constant like.1. 
If we choose some totally different pair 
of trees that are really lopsided and we 
take this difference, we still get 1.. 
Exactly the same thing. 
So this fulfills the promise I gave you 
earlier. 
That not only do we have this natural 
correspondence between on the one hand 
trees labeled with sigma prime and trees 
of a certain type labelled according to 
sigma. 
Mainly those in which A and B appear as 
siblings. 
But the correspondence preserves 
averaging encoding length. 
Okay? 
It doesn't quite preserve the averaging 
encoding length but it preserves it up to 
a universal constant. 
And that's going to be good enough for 
our purposes as we'll see in a sec.