1
00:00:03,940 --> 00:00:07,906
In this video we'll establish the 
correctness of Huffman's algorithm. 

2
00:00:07,906 --> 00:00:12,160
Meaning that, that greedy algorithm 
always computes the prefix-free binary 

3
00:00:12,160 --> 00:00:14,920
code that minimizes the average encoding 
length. 

4
00:00:16,600 --> 00:00:21,930
Let me also remind you of our expression 
for the average encoding length in terms 

5
00:00:21,930 --> 00:00:25,420
of a tree capital T. 
We'll be working with this expression 

6
00:00:25,420 --> 00:00:29,281
quite a bit in this proof. 
So we average over the symbols so we sum 

7
00:00:29,281 --> 00:00:32,221
over the symbols I and the alpha bit, 
capital sigma. 

8
00:00:32,221 --> 00:00:36,544
We weight the various terms by the 
frequencies which were given as part of 

9
00:00:36,544 --> 00:00:41,212
the input and remember that the encoding 
length that we're producing for a given 

10
00:00:41,212 --> 00:00:45,823
symbol I is exactly the same as the depth 
of the corresponding leaf in the tree, 

11
00:00:45,823 --> 00:00:48,324
capital T. 
So I quite like the proof of Huffman's 

12
00:00:48,324 --> 00:00:50,594
Theorem. 
It's a cool proof and we will, it will 

13
00:00:50,594 --> 00:00:54,196
give us an opportunity to revisit the 
themes that we've been studying in 

14
00:00:54,196 --> 00:00:56,812
proving the correctness of various greedy 
algorithms. 

15
00:00:56,812 --> 00:01:00,562
At a high level, we're going to proceed 
by induction, induction on the size N of 

16
00:01:00,562 --> 00:01:03,424
the alpha bit sigma. 
So there's a vague parallel you could 

17
00:01:03,424 --> 00:01:07,372
draw with how we proved, say Dijkstra's 
Algorithm correct back in part one, where 

18
00:01:07,372 --> 00:01:11,320
by induction we show that each successive 
iteration of the algorithm is correct 

19
00:01:11,320 --> 00:01:15,021
assuming that the previous ones were. 
But in the inductive step we're going to 

20
00:01:15,021 --> 00:01:18,228
use an exchange argument. 
So to justify each of our mergers we're 

21
00:01:18,228 --> 00:01:20,400
going to argue that any optimal solution 
could. 

22
00:01:20,400 --> 00:01:24,727
You massage into one that looks more like 
ours without making it any worse, and 

23
00:01:24,727 --> 00:01:28,506
that's how we argue that each of our 
individual mergers is in fact a 

24
00:01:28,506 --> 00:01:32,134
reasonable idea, a good idea. 
So more precisely we're going to go by 

25
00:01:32,134 --> 00:01:34,390
induction on the size-end of the 
alphabet. 

26
00:01:34,390 --> 00:01:38,902
I'm going to assume to make the problem 
non-trivial that the alphabet size is at 

27
00:01:38,902 --> 00:01:41,819
least two. 
So as in any proof by induction we begin 

28
00:01:41,819 --> 00:01:45,471
with a base case and then we do the 
inductive step where in we can assume the 

29
00:01:45,471 --> 00:01:48,280
inducted hypothesis. 
So the base case is when we just have a 

30
00:01:48,280 --> 00:01:50,948
two letter alphabet. 
If you go back to Huffmans algorithm 

31
00:01:50,948 --> 00:01:54,787
you'll see that it does the obvious thing 
in the base case or just outputs a tree 

32
00:01:54,787 --> 00:01:58,580
that encodes one of the symbols with the 
letter zero, and one of the symbols with 

33
00:01:58,580 --> 00:02:00,827
just the bit one. 
And that's the best you can do. 

34
00:02:00,827 --> 00:02:04,619
You need a bit to encode every symbol in 
that tree uses exactly one bit for each 

35
00:02:04,619 --> 00:02:07,850
symbol, so Huffman's algorithm is optimal 
in that trivial special case. 

36
00:02:07,850 --> 00:02:11,289
So for the inductive set, 
we focus on an arbitrary problem instance 

37
00:02:11,289 --> 00:02:13,446
where the alphabet size is at least, 
three. 

38
00:02:13,446 --> 00:02:17,245
Now of course, whenever you're doing a 
proof by induction the thing you've 

39
00:02:17,245 --> 00:02:19,966
really got going for you is the inductive 
hypothesis. 

40
00:02:19,966 --> 00:02:22,943
You assume, that the assertion that 
you're trying to prove. 

41
00:02:22,943 --> 00:02:25,408
In this case, correctness of Huffman's 
algorithm. 

42
00:02:25,408 --> 00:02:29,207
Is true for all smaller values of N. 
That is, we're going to assume that if we 

43
00:02:29,207 --> 00:02:31,363
invoke the algorithm on any smaller 
input. 

44
00:02:31,363 --> 00:02:33,776
As of course we do, in this recursive 
algorithm. 

45
00:02:33,776 --> 00:02:37,524
We get to assume that the algorithm 
returns the correct solution to that 

46
00:02:37,524 --> 00:02:41,820
smaller subproblem. 
So to understand how we're going to pass 

47
00:02:41,820 --> 00:02:45,885
from the inductive hypothesis. 
So this assumption we're correct on all 

48
00:02:45,885 --> 00:02:50,060
smaller inputs to the inductive step. 
The assertion that we're correct on the 

49
00:02:50,060 --> 00:02:53,041
current input, 
we need to take a closer look at how the 

50
00:02:53,041 --> 00:02:57,378
original input with it's alphabet sigma 
relates to the smallest of problem with 

51
00:02:57,378 --> 00:03:01,118
the smaller alphabet sigma prime with the 
two letters fused into one. 

52
00:03:01,118 --> 00:03:04,480
Which is the one we solve correctly by 
assumption recursively. 

53
00:03:07,520 --> 00:03:12,100
So, recall our notation from the pseudo 
code for Huffman's algarithm. 

54
00:03:12,100 --> 00:03:16,393
What the algorithm does is take the two 
symbols with the smallest frequencies, 

55
00:03:16,393 --> 00:03:20,906
and let's call those two symbols A and B, 
and it replaces those two symbols with a 

56
00:03:20,906 --> 00:03:25,309
single symbol AB, a sort of meta-symbol 
representing the presence of either A or 

57
00:03:25,309 --> 00:03:27,632
B. 
We also in the quiz discussed how the 

58
00:03:27,632 --> 00:03:32,348
sensible frequency for this medi-symbol 
AB is the sum of the frequencies A and B. 

59
00:03:32,348 --> 00:03:36,947
Last video when we're developing our 
intuition for what a greedy algorithm for 

60
00:03:36,947 --> 00:03:41,546
successive bottom of mergers might look 
like, we notice that when you merge two 

61
00:03:41,546 --> 00:03:46,262
symbols A and B which are really doing is 
committing to outputting a tree at the 

62
00:03:46,262 --> 00:03:50,512
end of the day, at the end of your 
algorithm in which the symbols A and B 

63
00:03:50,512 --> 00:03:54,122
appear as siblings in which they have 
exactly the same parent. 

64
00:03:54,122 --> 00:03:58,197
Therefore, there is an exact 
correspondence 101 correspondence between 

65
00:03:58,197 --> 00:04:01,645
on the one hand true. 
Trees whose leaves are labeled with the 

66
00:04:01,645 --> 00:04:05,985
symbols in sigma prime, so that is there 
is no leaf labeled A there is no leaf 

67
00:04:05,985 --> 00:04:08,322
labeled B there is instead one labeled 
AB. 

68
00:04:08,322 --> 00:04:12,773
So there's a correspondence between those 
kinds of trees, labels with the leaves 

69
00:04:12,773 --> 00:04:17,935
corresponding to sigma prime and. 
Trees for the original alphabet sigma in 

70
00:04:17,935 --> 00:04:23,811
which the symbols A and B are siblings, 
in which they have exactly the same 

71
00:04:23,811 --> 00:04:27,618
parent. 
Given a tree as shown in the left 

72
00:04:27,618 --> 00:04:31,996
picture, that is given a tree for T 
prime, the leaves labeled according to 

73
00:04:31,996 --> 00:04:36,313
sigma prime, you can, and this is in fact 
exactly what we do in Huffman's 

74
00:04:36,313 --> 00:04:40,990
Algorithm, split the leaf with the meta 
symbol AB, make it an internal node, and 

75
00:04:40,990 --> 00:04:45,908
give it two leaves with labels A and B. 
That produces a tree of the form on the 

76
00:04:45,908 --> 00:04:48,606
right. 
Conversely, given a tree of the form on 

77
00:04:48,606 --> 00:04:53,103
the right, that is its leaves are labeled 
according to sigma, and it just so 

78
00:04:53,103 --> 00:04:57,841
happens that A and B show up as leaves of 
that tree, you can produce a tree for 

79
00:04:57,841 --> 00:05:00,535
sigma prime just by contract. 
A and B together. 

80
00:05:00,535 --> 00:05:04,472
So, sucking up A and B into their parent, 
and labeling the parent, AB. 

81
00:05:04,472 --> 00:05:07,958
So you can go back and forth between 
these two types of trees. 

82
00:05:07,958 --> 00:05:12,456
One, arbitrary trees for sigma prime, and 
trees of a special kind for sigma, trees 

83
00:05:12,456 --> 00:05:16,903
in which A and B happen to be siblings. 
This subset of trees for sigma are 

84
00:05:16,903 --> 00:05:19,864
important enough to be given its own 
notation. 

85
00:05:19,864 --> 00:05:24,885
So, let me denotes by capital X, sub AB. 
The trees with leaves labeled according 

86
00:05:24,885 --> 00:05:29,327
to sigma in which, it just so happens, 
that A and B appear as siblings. 

87
00:05:29,327 --> 00:05:32,867
So, that'll be some trees for sigma, but 
not all of them. 

88
00:05:32,867 --> 00:05:35,700
Only the ones in which A and B are 
siblings. 

89
00:05:35,700 --> 00:05:40,377
So this correspondence between solutions 
to the smallest sub-problem and solutions 

90
00:05:40,377 --> 00:05:44,772
of a particular form for the original 
problem has a important property, namely 

91
00:05:44,772 --> 00:05:47,139
it preserves the objective function 
value. 

92
00:05:47,139 --> 00:05:51,478
It preserves the average encoding length. 
Okay, that's not quite true but it's 

93
00:05:51,478 --> 00:05:54,070
almost true. 
It's good enough for our purposes. 

94
00:05:54,070 --> 00:05:57,677
It preserves the average encoding length 
up to a fixed constant. 

95
00:05:57,677 --> 00:06:01,340
So let me show you the computation which 
demonstrates that point. 

96
00:06:01,340 --> 00:06:04,192
So consider any pair of matched trees, t 
prime and t. 

97
00:06:04,192 --> 00:06:08,525
And by matched I just mean t prime is any 
old tree who's leaves are, leaves are 

98
00:06:08,525 --> 00:06:12,694
labelled according to sigma prime. 
And t is the tree you would obtain if you 

99
00:06:12,694 --> 00:06:15,491
split the leaf with meta label ab in the 
usual way. 

100
00:06:15,491 --> 00:06:19,331
You replace it with an internal node and 
children with labels a and b. 

101
00:06:19,331 --> 00:06:21,745
So that's going to be the corresponding 
tree T. 

102
00:06:21,745 --> 00:06:25,968
So take any such matched pair and let's 
look at the difference between their 

103
00:06:25,968 --> 00:06:30,128
average encoding lengths. 
Now remember the average encoding length 

104
00:06:30,128 --> 00:06:33,779
of a tree is just a sum over the symbols 
of the relevant alphabet. 

105
00:06:33,779 --> 00:06:38,370
And what we got going for us here is that 
sigma and sigma prime are almost exactly 

106
00:06:38,370 --> 00:06:39,200
the same. 
Right? 

107
00:06:39,200 --> 00:06:43,791
So the only difference is sigma prime has 
the meta symbol AB, whereas sigma has the 

108
00:06:43,791 --> 00:06:47,497
individual symbols A and B. 
Furthermore the two trees T and T prime 

109
00:06:47,497 --> 00:06:51,702
are almost exactly the same, the only 
difference being that T prime has this 

110
00:06:51,702 --> 00:06:56,072
leaf with a meta label AB, whereas T has 
two corresponding nodes one level down 

111
00:06:56,072 --> 00:06:59,612
with the labels A and B. 
So when we take the difference of these 

112
00:06:59,612 --> 00:07:04,071
two sums, everything cancels out except. 
The, the tree T contributes two sum ans, 

113
00:07:04,071 --> 00:07:08,496
one for the leaf with label A, one for 
the leaf with label B, and the tree T 

114
00:07:08,496 --> 00:07:13,456
prime contributes with a minus sign. 
One sum and that corresponding to the 

115
00:07:13,456 --> 00:07:18,083
leaf width label AB. 
So when the dust clears and we cancel out 

116
00:07:18,083 --> 00:07:24,422
all the terms, what we're left with is. 
The term for the leaf A, and with the 

117
00:07:24,422 --> 00:07:29,114
frequency of A times the depth of the 
leaf A in the tree T. 

118
00:07:29,114 --> 00:07:33,567
A similar term for the leaf would label B 
in the tree T. 

119
00:07:33,567 --> 00:07:40,088
And then with a minus sign the frequency 
of the meda label AB times it's depth in 

120
00:07:40,088 --> 00:07:43,929
the tree T prime. 
But we are certainly not done with our 

121
00:07:43,929 --> 00:07:47,016
simplifications. 
There is a strong relationship between 

122
00:07:47,016 --> 00:07:51,395
the frequencies that we're staring at, 
and a strong relationship between these 

123
00:07:51,395 --> 00:07:54,090
various depths. 
Let's begin with the frequencies. 

124
00:07:54,090 --> 00:07:57,121
How did we define the frequency of the 
meta symbol AB? 

125
00:07:57,121 --> 00:08:00,714
Well remember our quiz. 
It made sense to divide it as the sum of 

126
00:08:00,714 --> 00:08:03,522
the frequencies of A and B. 
What about the depths? 

127
00:08:03,522 --> 00:08:07,987
Well you know, this symbol AB is at 
whatever depth it is in the tree T prime, 

128
00:08:07,987 --> 00:08:10,455
say it's at depth ten, something like 
that. 

129
00:08:10,455 --> 00:08:14,743
Remember T is obtained from T prime 
simply by splitting this leaf AB and 

130
00:08:14,743 --> 00:08:17,563
giving it two new children with symbols A 
and B. 

131
00:08:17,563 --> 00:08:22,322
So if the meta symbol AB was at depth ten 
in T prime, then the leaves A and B are 

132
00:08:22,322 --> 00:08:26,258
going to be at depth eleven in T. 
So the depth is simply one more than 

133
00:08:26,258 --> 00:08:29,020
whatever it was previously in the tree T 
prime. 

134
00:08:29,020 --> 00:08:34,276
So, these relationships are going to 
result in a second wave of cancellations. 

135
00:08:34,276 --> 00:08:39,532
So, just to make that crystal clear, 
let's calls the depth of AB in T prime, 

136
00:08:39,532 --> 00:08:42,444
D. 
So, D is what I was calling ten earlier. 

137
00:08:42,444 --> 00:08:46,280
So, A and B are both at depth D plus one 
in the tree T. 

138
00:08:46,280 --> 00:08:52,007
So the first term becomes the frequency 
of A times the depth D plus one. 

139
00:08:52,007 --> 00:08:57,496
The second term becomes the frequency of 
B plus the depth D plus one. 

140
00:08:57,496 --> 00:09:03,860
And then the third term becomes the sum 
of the frequencies of A and B times the 

141
00:09:03,860 --> 00:09:07,360
depth D. 
And, when the dust settles, yet again, 

142
00:09:07,360 --> 00:09:12,850
we're left with a constant PA plus PB. 
So, the sums of two frequencies. 

143
00:09:12,850 --> 00:09:17,273
And one thing I want you to really 
understand is that the difference between 

144
00:09:17,273 --> 00:09:20,432
these two average encoding links is just 
some constant. 

145
00:09:20,432 --> 00:09:23,305
It does not depend on which trees we 
started with. 

146
00:09:23,305 --> 00:09:27,900
So if we choose a perfectly balanced tree 
and we do this difference, we get some 

147
00:09:27,900 --> 00:09:31,175
constant like.1. 
If we choose some totally different pair 

148
00:09:31,175 --> 00:09:35,598
of trees that are really lopsided and we 
take this difference, we still get 1.. 

149
00:09:35,598 --> 00:09:39,932
Exactly the same thing. 
So this fulfills the promise I gave you 

150
00:09:39,932 --> 00:09:42,316
earlier. 
That not only do we have this natural 

151
00:09:42,316 --> 00:09:46,462
correspondence between on the one hand 
trees labeled with sigma prime and trees 

152
00:09:46,462 --> 00:09:48,846
of a certain type labelled according to 
sigma. 

153
00:09:48,846 --> 00:09:51,385
Mainly those in which A and B appear as 
siblings. 

154
00:09:51,385 --> 00:09:54,443
But the correspondence preserves 
averaging encoding length. 

155
00:09:54,443 --> 00:09:56,775
Okay? 
It doesn't quite preserve the averaging 

156
00:09:56,775 --> 00:10:00,040
encoding length but it preserves it up to 
a universal constant. 

157
00:10:00,040 --> 00:10:03,720
And that's going to be good enough for 
our purposes as we'll see in a sec.