In this video we'll establish the correctness of Huffman's algorithm. Meaning that, that greedy algorithm always computes the prefix-free binary code that minimizes the average encoding length. Let me also remind you of our expression for the average encoding length in terms of a tree capital T. We'll be working with this expression quite a bit in this proof. So we average over the symbols so we sum over the symbols I and the alpha bit, capital sigma. We weight the various terms by the frequencies which were given as part of the input and remember that the encoding length that we're producing for a given symbol I is exactly the same as the depth of the corresponding leaf in the tree, capital T. So I quite like the proof of Huffman's Theorem. It's a cool proof and we will, it will give us an opportunity to revisit the themes that we've been studying in proving the correctness of various greedy algorithms. At a high level, we're going to proceed by induction, induction on the size N of the alpha bit sigma. So there's a vague parallel you could draw with how we proved, say Dijkstra's Algorithm correct back in part one, where by induction we show that each successive iteration of the algorithm is correct assuming that the previous ones were. But in the inductive step we're going to use an exchange argument. So to justify each of our mergers we're going to argue that any optimal solution could. You massage into one that looks more like ours without making it any worse, and that's how we argue that each of our individual mergers is in fact a reasonable idea, a good idea. So more precisely we're going to go by induction on the size-end of the alphabet. I'm going to assume to make the problem non-trivial that the alphabet size is at least two. So as in any proof by induction we begin with a base case and then we do the inductive step where in we can assume the inducted hypothesis. So the base case is when we just have a two letter alphabet. If you go back to Huffmans algorithm you'll see that it does the obvious thing in the base case or just outputs a tree that encodes one of the symbols with the letter zero, and one of the symbols with just the bit one. And that's the best you can do. You need a bit to encode every symbol in that tree uses exactly one bit for each symbol, so Huffman's algorithm is optimal in that trivial special case. So for the inductive set, we focus on an arbitrary problem instance where the alphabet size is at least, three. Now of course, whenever you're doing a proof by induction the thing you've really got going for you is the inductive hypothesis. You assume, that the assertion that you're trying to prove. In this case, correctness of Huffman's algorithm. Is true for all smaller values of N. That is, we're going to assume that if we invoke the algorithm on any smaller input. As of course we do, in this recursive algorithm. We get to assume that the algorithm returns the correct solution to that smaller subproblem. So to understand how we're going to pass from the inductive hypothesis. So this assumption we're correct on all smaller inputs to the inductive step. The assertion that we're correct on the current input, we need to take a closer look at how the original input with it's alphabet sigma relates to the smallest of problem with the smaller alphabet sigma prime with the two letters fused into one. Which is the one we solve correctly by assumption recursively. So, recall our notation from the pseudo code for Huffman's algarithm. What the algorithm does is take the two symbols with the smallest frequencies, and let's call those two symbols A and B, and it replaces those two symbols with a single symbol AB, a sort of meta-symbol representing the presence of either A or B. We also in the quiz discussed how the sensible frequency for this medi-symbol AB is the sum of the frequencies A and B. Last video when we're developing our intuition for what a greedy algorithm for successive bottom of mergers might look like, we notice that when you merge two symbols A and B which are really doing is committing to outputting a tree at the end of the day, at the end of your algorithm in which the symbols A and B appear as siblings in which they have exactly the same parent. Therefore, there is an exact correspondence 101 correspondence between on the one hand true. Trees whose leaves are labeled with the symbols in sigma prime, so that is there is no leaf labeled A there is no leaf labeled B there is instead one labeled AB. So there's a correspondence between those kinds of trees, labels with the leaves corresponding to sigma prime and. Trees for the original alphabet sigma in which the symbols A and B are siblings, in which they have exactly the same parent. Given a tree as shown in the left picture, that is given a tree for T prime, the leaves labeled according to sigma prime, you can, and this is in fact exactly what we do in Huffman's Algorithm, split the leaf with the meta symbol AB, make it an internal node, and give it two leaves with labels A and B. That produces a tree of the form on the right. Conversely, given a tree of the form on the right, that is its leaves are labeled according to sigma, and it just so happens that A and B show up as leaves of that tree, you can produce a tree for sigma prime just by contract. A and B together. So, sucking up A and B into their parent, and labeling the parent, AB. So you can go back and forth between these two types of trees. One, arbitrary trees for sigma prime, and trees of a special kind for sigma, trees in which A and B happen to be siblings. This subset of trees for sigma are important enough to be given its own notation. So, let me denotes by capital X, sub AB. The trees with leaves labeled according to sigma in which, it just so happens, that A and B appear as siblings. So, that'll be some trees for sigma, but not all of them. Only the ones in which A and B are siblings. So this correspondence between solutions to the smallest sub-problem and solutions of a particular form for the original problem has a important property, namely it preserves the objective function value. It preserves the average encoding length. Okay, that's not quite true but it's almost true. It's good enough for our purposes. It preserves the average encoding length up to a fixed constant. So let me show you the computation which demonstrates that point. So consider any pair of matched trees, t prime and t. And by matched I just mean t prime is any old tree who's leaves are, leaves are labelled according to sigma prime. And t is the tree you would obtain if you split the leaf with meta label ab in the usual way. You replace it with an internal node and children with labels a and b. So that's going to be the corresponding tree T. So take any such matched pair and let's look at the difference between their average encoding lengths. Now remember the average encoding length of a tree is just a sum over the symbols of the relevant alphabet. And what we got going for us here is that sigma and sigma prime are almost exactly the same. Right? So the only difference is sigma prime has the meta symbol AB, whereas sigma has the individual symbols A and B. Furthermore the two trees T and T prime are almost exactly the same, the only difference being that T prime has this leaf with a meta label AB, whereas T has two corresponding nodes one level down with the labels A and B. So when we take the difference of these two sums, everything cancels out except. The, the tree T contributes two sum ans, one for the leaf with label A, one for the leaf with label B, and the tree T prime contributes with a minus sign. One sum and that corresponding to the leaf width label AB. So when the dust clears and we cancel out all the terms, what we're left with is. The term for the leaf A, and with the frequency of A times the depth of the leaf A in the tree T. A similar term for the leaf would label B in the tree T. And then with a minus sign the frequency of the meda label AB times it's depth in the tree T prime. But we are certainly not done with our simplifications. There is a strong relationship between the frequencies that we're staring at, and a strong relationship between these various depths. Let's begin with the frequencies. How did we define the frequency of the meta symbol AB? Well remember our quiz. It made sense to divide it as the sum of the frequencies of A and B. What about the depths? Well you know, this symbol AB is at whatever depth it is in the tree T prime, say it's at depth ten, something like that. Remember T is obtained from T prime simply by splitting this leaf AB and giving it two new children with symbols A and B. So if the meta symbol AB was at depth ten in T prime, then the leaves A and B are going to be at depth eleven in T. So the depth is simply one more than whatever it was previously in the tree T prime. So, these relationships are going to result in a second wave of cancellations. So, just to make that crystal clear, let's calls the depth of AB in T prime, D. So, D is what I was calling ten earlier. So, A and B are both at depth D plus one in the tree T. So the first term becomes the frequency of A times the depth D plus one. The second term becomes the frequency of B plus the depth D plus one. And then the third term becomes the sum of the frequencies of A and B times the depth D. And, when the dust settles, yet again, we're left with a constant PA plus PB. So, the sums of two frequencies. And one thing I want you to really understand is that the difference between these two average encoding links is just some constant. It does not depend on which trees we started with. So if we choose a perfectly balanced tree and we do this difference, we get some constant like.1. If we choose some totally different pair of trees that are really lopsided and we take this difference, we still get 1.. Exactly the same thing. So this fulfills the promise I gave you earlier. That not only do we have this natural correspondence between on the one hand trees labeled with sigma prime and trees of a certain type labelled according to sigma. Mainly those in which A and B appear as siblings. But the correspondence preserves averaging encoding length. Okay? It doesn't quite preserve the averaging encoding length but it preserves it up to a universal constant. And that's going to be good enough for our purposes as we'll see in a sec.