1
00:00:00,000 --> 00:00:04,237
So in this video we'll finally discuss 
Huffman's Algorithm which is a greedy 

2
00:00:04,237 --> 00:00:08,474
Algorithm that constructs the prefix free 
binary code minimizing the average 

3
00:00:08,474 --> 00:00:11,445
encoding length. 
So, let me just briefly remind you the 

4
00:00:11,445 --> 00:00:15,682
formal statement of the compositional 
problem that I left you with last time. 

5
00:00:15,682 --> 00:00:19,919
So, the input is just a frequency for 
each symbol I, coming from some alphabet 

6
00:00:19,919 --> 00:00:22,594
sigma. 
So the responsibility of the algorithm is 

7
00:00:22,594 --> 00:00:25,775
to get optimal compression, so to compute 
an optimal code. 

8
00:00:25,775 --> 00:00:29,180
So the code has to be binary. 
We have to use only zeros and ones. 

9
00:00:29,180 --> 00:00:33,533
We have to be prefix free meaning the 
encodings of any two characters, neither 

10
00:00:33,533 --> 00:00:37,217
one can be a prefix of the other, 
that's to facilitate unambiguous 

11
00:00:37,217 --> 00:00:39,830
decoding. 
And finally, the average number of bits 

12
00:00:39,830 --> 00:00:43,942
needed to encode a character, where the 
average is with respect to the input 

13
00:00:43,942 --> 00:00:46,269
frequencies, should be as small as 
possible. 

14
00:00:46,269 --> 00:00:49,570
Now remember these kinds of codes 
correspond to binary trees. 

15
00:00:49,570 --> 00:00:54,007
The prefix free condition just says that 
the symbols of sigma should be in one to 

16
00:00:54,007 --> 00:00:56,550
one correspondence with the leaves of the 
tree. 

17
00:00:56,550 --> 00:01:01,096
And finally remember the encoding lengths 
of the various symbols just correspond to 

18
00:01:01,096 --> 00:01:05,126
depths of the corresponding leaves. 
So we can formally express the averaging 

19
00:01:05,126 --> 00:01:07,835
coding length. 
Which given a legal tree capital T. 

20
00:01:07,835 --> 00:01:10,701
I'm using the notation capital L of T. 
So what do we do? 

21
00:01:10,701 --> 00:01:15,026
We just take the average over the symbols 
of the alphabet weighted by the provided 

22
00:01:15,026 --> 00:01:18,726
frequencies and we just look at the 
number of bits used to encode that 

23
00:01:18,726 --> 00:01:21,175
symbol. 
Equivalently depth of the corresponding 

24
00:01:21,175 --> 00:01:24,041
leaf in the tree T. 
So we want the tree that makes this 

25
00:01:24,041 --> 00:01:27,899
number as small as possible. 
So this task is a little bit different 

26
00:01:27,899 --> 00:01:30,188
than any we've seen so far in this 
course, right? 

27
00:01:30,188 --> 00:01:33,908
All we're given as an input is an array 
of numbers and yet we have to produce 

28
00:01:33,908 --> 00:01:36,912
this actual full blown tree. 
So how can we just take a bunch of 

29
00:01:36,912 --> 00:01:39,487
numbers and produce them in a sensible, 
principled way? 

30
00:01:39,487 --> 00:01:42,777
Some kind of tree whose leaves correspond 
to symbols of the alphabet. 

31
00:01:42,777 --> 00:01:46,306
So let's spend a slide just thinking 
about, you know, at a high level where 

32
00:01:46,306 --> 00:01:49,930
would this tree come from, how would you 
build it up given this unstructured 

33
00:01:49,930 --> 00:01:52,171
input. 
So there's certainly no unique answer to 

34
00:01:52,171 --> 00:01:55,891
this question and one idea which is very 
natural but that turns out to be sub 

35
00:01:55,891 --> 00:01:59,610
optimal is to build a tree using a top 
down approach, which you can also think 

36
00:01:59,610 --> 00:02:01,903
of as an. 
Initiation of the divide and conquer 

37
00:02:01,903 --> 00:02:05,176
algorithm design paradigm. 
The divide and conquer paradigm, you'll 

38
00:02:05,176 --> 00:02:08,650
recall, involves breaking the given 
sub-problem into usually multiple, 

39
00:02:08,650 --> 00:02:11,470
smaller sub-problems. 
They're solved recursively, and the 

40
00:02:11,470 --> 00:02:14,592
solutions are then combined into one for 
the original problem. 

41
00:02:14,592 --> 00:02:18,268
Because trees, the desired output here, 
have a recursive substructure, it's 

42
00:02:18,268 --> 00:02:21,390
natural to think about applying this 
paradigm to this problem. 

43
00:02:21,390 --> 00:02:26,077
Specifically you love to just lean on 
recursive call to construct for you the 

44
00:02:26,077 --> 00:02:29,322
left sub tree, 
another sub call constructing the right 

45
00:02:29,322 --> 00:02:31,906
sub tree. 
And then you can fuse the results 

46
00:02:31,906 --> 00:02:36,150
together under a common root vertex. 
So it's not clear how to do this 

47
00:02:36,150 --> 00:02:41,066
partitioning of the symbols into two 
groups, but one idea to get the most bang 

48
00:02:41,066 --> 00:02:45,920
for your buck, the most information out 
of the first bit of your encoding. You 

49
00:02:45,920 --> 00:02:50,900
might want to split them in, the symbols, 
into groups that have roughly, as close 

50
00:02:50,900 --> 00:02:54,280
to as possible, of 50% of the overall 
frequency. 

51
00:02:54,280 --> 00:02:58,654
So this topped out approach is sometimes 
called Fanno-Shannon coding. 

52
00:02:58,654 --> 00:03:03,787
Fanno was Huffman's graduate adviser. 
Shannon is the Claude Shannon inventor of 

53
00:03:03,787 --> 00:03:07,345
information theory. 
But Huffman in a term paper believe it or 

54
00:03:07,345 --> 00:03:10,261
not, realized the topped down is not the 
way to go. 

55
00:03:10,261 --> 00:03:13,410
The way to go is to build the tree from 
the bottom up. 

56
00:03:13,410 --> 00:03:17,318
Not only are we going to get optimal 
codes, but we're going to get the 

57
00:03:17,318 --> 00:03:20,410
blazingly fast greedy algorithm that 
constructs them. 

58
00:03:20,410 --> 00:03:24,301
So what do I mean by bottom up? 
I mean we're going to start with just a 

59
00:03:24,301 --> 00:03:27,863
bunch of nodes, each one labelled with 
one symbol of the alphabet. 

60
00:03:27,863 --> 00:03:30,932
So, in effect, we're starting with the 
leaves of our tree. 

61
00:03:30,932 --> 00:03:33,453
And then we're going to do successive 
mergers. 

62
00:03:33,453 --> 00:03:37,782
We're going to take at each step two 
sub-trees thus far and link them together 

63
00:03:37,782 --> 00:03:40,084
as sub-trees under a common internal 
node. 

64
00:03:40,084 --> 00:03:43,620
So, I think you'll see what I mean in a 
picture. 

65
00:03:43,620 --> 00:03:48,420
So imagine we want to build a tree where 
the leaves are supposed to be just A, B, 

66
00:03:48,420 --> 00:03:51,120
C, and D. 
So one merger would be oh, well let's 

67
00:03:51,120 --> 00:03:56,200
just take the leaves C and D and link 
them as siblings under a common ancestor. 

68
00:03:56,200 --> 00:04:00,989
Now in the second step let's merge the 
leaf B with the sub-tree we got from the 

69
00:04:00,989 --> 00:04:05,181
previous merge, the sub-tree comprising 
the nodes C, D, and their common 

70
00:04:05,181 --> 00:04:08,797
ancestor. 
So now of course we have no choice but to 

71
00:04:08,797 --> 00:04:11,475
merge the only two sub-trees we have 
left, 

72
00:04:11,475 --> 00:04:16,002
and then that gives us a single tree. 
Which is in fact exactly the same 

73
00:04:16,002 --> 00:04:20,720
lopsided tree we were using in the 
previous video as a running example. 

74
00:04:21,800 --> 00:04:25,592
So let me explain what I hope is clear 
and what is maybe unclear at this 

75
00:04:25,592 --> 00:04:28,137
juncture. 
I hope what's intuitively clear is that 

76
00:04:28,137 --> 00:04:32,293
the bottom of approaches, a systematic 
way to build trees that have a prescribed 

77
00:04:32,293 --> 00:04:34,008
set of leaves. 
So what do we want? 

78
00:04:34,008 --> 00:04:38,060
We want trees whose leaves are labeled 
with the symbols of the alphabet sigma. 

79
00:04:38,060 --> 00:04:41,956
So if we have an alphabet with n symbols, 
we're going to start with just the N 

80
00:04:41,956 --> 00:04:43,470
leaves. 
What does a merger do? 

81
00:04:43,470 --> 00:04:46,451
Well, there's two things. 
First of all, it introduces a new 

82
00:04:46,451 --> 00:04:50,112
internal node, an unlabelled node. 
And secondly, it takes two of our old 

83
00:04:50,112 --> 00:04:52,936
subtrees and fuses them into one, merges 
them into one. 

84
00:04:52,936 --> 00:04:57,015
We take two subtrees, we make one the 
left child of this new internal node, the 

85
00:04:57,015 --> 00:04:59,525
other, the right child of this new 
internal node. 

86
00:04:59,525 --> 00:05:02,820
So that drops the number of subtrees 
we're working with by one. 

87
00:05:02,820 --> 00:05:07,518
So if we start with the N leaves and we 
do N minus one successive mergers, what 

88
00:05:07,518 --> 00:05:10,298
happens? 
Well on the one hand, we introduce N 

89
00:05:10,298 --> 00:05:14,930
minus one new unlabeled internal nodes. 
And on the other hand, we construct a 

90
00:05:14,930 --> 00:05:18,372
single tree. 
And the leaves of this tree that we get 

91
00:05:18,372 --> 00:05:23,520
are in one to one correspondence with the 
alphabet letters as desired. 

92
00:05:23,520 --> 00:05:27,178
Now what I don't expect you to have an 
intuition for is what should we be 

93
00:05:27,178 --> 00:05:30,391
merging with what and why? 
Forget about, you know, how do we get an 

94
00:05:30,391 --> 00:05:34,099
optimal tree at the end of the day. 
I mean, even just if we wanted to design 

95
00:05:34,099 --> 00:05:36,769
a greedy algorithm. 
If we just wanted to make a myopic 

96
00:05:36,769 --> 00:05:39,784
decision, that looks good right now, how 
would we even do that? 

97
00:05:39,784 --> 00:05:43,838
What's our greedy criteria that's going 
to guide us to merge a particular pair of 

98
00:05:43,838 --> 00:05:46,484
trees together? 
So we can re-frame this quandary in the 

99
00:05:46,484 --> 00:05:50,007
same kind of question we asked for 
minimum cost spanning trees and really 

100
00:05:50,007 --> 00:05:53,768
more generally with greedy algorithms. 
When you're making irrevocable decisions 

101
00:05:53,768 --> 00:05:57,576
which strikes fear in your heart, is that 
this decision will come back and haunt 

102
00:05:57,576 --> 00:06:00,004
you later on. 
You'll only realize at the end of the 

103
00:06:00,004 --> 00:06:03,432
algorithm that you made some horrible 
mistake early on in the algorithm. 

104
00:06:03,432 --> 00:06:07,003
So just as for MST's, we ask, you know, 
when can we be sure that including an 

105
00:06:07,003 --> 00:06:10,621
edge irrevocably is not a mistake, it's 
safe in the tree that we're building? 

106
00:06:10,621 --> 00:06:14,287
Here we want to ask, you know, we have to 
do a merger, we want to do successive 

107
00:06:14,287 --> 00:06:16,620
mergers and how do we know that a merger 
is safe? 

108
00:06:16,620 --> 00:06:20,620
That it doesn't prevent us from 
eventually computing an optimum solution. 

109
00:06:20,620 --> 00:06:24,382
Well here's the way to look at things 
that will at least give us an intuitive 

110
00:06:24,382 --> 00:06:27,710
conjecture for this question. 
We'll save the proof for the next video. 

111
00:06:27,710 --> 00:06:31,902
So what are the ramifications when we 
merge two subtrees, each containing a 

112
00:06:31,902 --> 00:06:35,088
collection of symbols? 
Well, when we merge two subtrees, we 

113
00:06:35,088 --> 00:06:39,392
introduce a new internal node which 
unites these two subtrees under them, and 

114
00:06:39,392 --> 00:06:43,528
if you think about it, at the end of the 
day on the final tree, this is yet 

115
00:06:43,528 --> 00:06:47,050
another internal node that's going to be 
on the root to leaf path, 

116
00:06:47,050 --> 00:06:49,370
for all of the leaves in these two sub 
trees. 

117
00:06:49,370 --> 00:06:53,340
That is, if you're a symbol and you're 
watching your subtree get merged with 

118
00:06:53,340 --> 00:06:56,072
somebody else. 
You're bummed out, your like, man that's 

119
00:06:56,072 --> 00:06:59,527
another bit in my encoding. 
That's yet one more node I have to pass 

120
00:06:59,527 --> 00:07:03,291
through to get back to the root. 
I think this will become even more clear 

121
00:07:03,291 --> 00:07:06,654
if we look at an example. 
So naturally, we'll use our four symbol 

122
00:07:06,654 --> 00:07:09,527
alphabet A, B, C, D. 
And initially, before we've merged 

123
00:07:09,527 --> 00:07:12,080
anything, each of these is just its own 
leaf A, B. 

124
00:07:12,080 --> 00:07:14,368
C, D. 
So there's no internal nodes above 'em. 

125
00:07:14,368 --> 00:07:18,145
In the sense, everybody's encoding length 
at the beginning is zero bits. 

126
00:07:18,145 --> 00:07:22,294
So now, imagine we've merged C and D, we 
introduce a new internal node, which is 

127
00:07:22,294 --> 00:07:24,582
the common an-ancestor of these two 
leaves. 

128
00:07:24,582 --> 00:07:28,572
And as a result, C and D are bummed out. 
They said, well, there's one bit that 

129
00:07:28,572 --> 00:07:32,508
we're going to, to have to incur our 
encoding length, there's one new internal 

130
00:07:32,508 --> 00:07:36,605
node we're always going to have to pass 
through enroute back to the root of the 

131
00:07:36,605 --> 00:07:39,520
eventual tree. 
Now suppose next we merge B with the 

132
00:07:39,520 --> 00:07:43,801
subtree containing both C and D. 
Well everybody but A is bummed out about, 

133
00:07:43,801 --> 00:07:48,493
about this merger because we introduce 
another internal node, and it contributes 

134
00:07:48,493 --> 00:07:51,133
one bit to the encoding of each of B, C, 
and D. 

135
00:07:51,133 --> 00:07:55,590
It contributes an extra one to the 
encoding of C and D, and it contributes a 

136
00:07:55,590 --> 00:07:59,461
zero to the encoding of B. 
So all of their encodings in some sense 

137
00:07:59,461 --> 00:08:03,450
inc, get incremented, go up by one, 
compared to how things were before. 

138
00:08:03,450 --> 00:08:07,203
Now in the final iteration, we have to 
merge everything together. 

139
00:08:07,203 --> 00:08:10,135
We have no choice, 
there's only two sub-trees left. 

140
00:08:10,135 --> 00:08:13,536
So here, everybody's encoding length gets 
bumped up by one. 

141
00:08:13,536 --> 00:08:18,168
So, A finally picks up a bit at zero to 
encode it and B, C, and D each pick up an 

142
00:08:18,168 --> 00:08:22,332
additional bit of one, which is prepenned 
into their encodings thus far. 

143
00:08:22,332 --> 00:08:26,629
And, in general what you'll notice is 
that the final encoding length of a 

144
00:08:26,629 --> 00:08:31,151
symbol, is precisely the number of 
mergers that its subtree has to endure. 

145
00:08:31,151 --> 00:08:36,231
Every time your subtree gets merged with 
somebody else you pick up another bit in 

146
00:08:36,231 --> 00:08:40,939
your encoding, and that's because there's 
one more internal node that you're going 

147
00:08:40,939 --> 00:08:44,780
to have to traverse enroute to the root 
of the final tree. 

148
00:08:44,780 --> 00:08:48,640
In this example, the symbols C and D, 
well they got merged with somebody else 

149
00:08:48,640 --> 00:08:52,297
in each of the three iterations. 
So, they're the two symbols that wind up 

150
00:08:52,297 --> 00:08:56,004
with the encoding length of three. 
The symbol B, it didn't get merged with 

151
00:08:56,004 --> 00:08:58,595
anybody in the first iteration, only the 
second two. 

152
00:08:58,595 --> 00:09:00,830
That's why it has an encoding length of 
two. 

153
00:09:00,830 --> 00:09:04,969
So this is really helpful. 
So this, lets us relate, the operations 

154
00:09:04,969 --> 00:09:10,080
that our algorithm actually does, namely 
mergers back to the objective function 

155
00:09:10,080 --> 00:09:13,573
that we care about, namely the average 
encoding length. 

156
00:09:13,573 --> 00:09:18,360
Mergers increase the encoding lengths of 
the participating symbols by one. 

157
00:09:18,360 --> 00:09:22,734
So this allows us to segway into a design 
of a greedy heuristic for how to do 

158
00:09:22,734 --> 00:09:25,314
mergers. 
Let's just think about the very first 

159
00:09:25,314 --> 00:09:28,174
iteration. 
So we have our N original symbols, and we 

160
00:09:28,174 --> 00:09:31,932
have to pick two to merge. 
And remember the consequences of a merge 

161
00:09:31,932 --> 00:09:36,138
is going to be an increase in the 
encoding length by one bit, whichever two 

162
00:09:36,138 --> 00:09:39,896
symbols we're going to pick. 
Now we want to do is minimize the average 

163
00:09:39,896 --> 00:09:43,485
encoding length with respect to the 
frequencies that were given. 

164
00:09:43,485 --> 00:09:48,181
So which pair of symbols are we the least 
unhappy to suffer an increment to their 

165
00:09:48,181 --> 00:09:52,435
encoding length, was going to be the 
symbols that are the least frequent. 

166
00:09:52,435 --> 00:09:56,217
That's going to increase your averaging 
poding length by the least. 

167
00:09:56,217 --> 00:10:00,000
So that's the green merging hiuristic. 
Somethings gotta increase. 

168
00:10:00,000 --> 00:10:04,519
Pick the ones that are the least frequent 
to be the ones that get incremented. 

169
00:10:04,519 --> 00:10:09,096
So that seems like a really good idea of 
how to proceed in the first iteration. 

170
00:10:09,096 --> 00:10:11,993
The next question is, how are we going to 
recurse? 

171
00:10:11,993 --> 00:10:15,180
So, I'll let you think about that in the 
following quiz. 

172
00:10:21,740 --> 00:10:25,776
So let's agree that the first iteration 
of our greedy heuristic is going to merge 

173
00:10:25,776 --> 00:10:28,779
together the two symbols that possess the 
lowest frequencies. 

174
00:10:28,779 --> 00:10:31,290
Let's call those two symbols little A and 
little B. 

175
00:10:31,290 --> 00:10:33,899
The question is then how do we make 
further progress? 

176
00:10:33,899 --> 00:10:36,853
What do we do next? 
Well, one thing that would be really nice 

177
00:10:36,853 --> 00:10:39,610
is if we could somehow recurse on a 
smaller subproblem. 

178
00:10:39,610 --> 00:10:43,739
Well, which smaller subproblem? 
Well, what does it mean that we've merged 

179
00:10:43,739 --> 00:10:47,640
together the symbols A and B? 
Well, If you think about it, in the tree 

180
00:10:47,640 --> 00:10:51,999
that we finally construct by virtue of us 
merging A and B, we're forcing the 

181
00:10:51,999 --> 00:10:56,359
algorithm to output a tree in which A and 
B are siblings, in which they have 

182
00:10:56,359 --> 00:10:59,915
exactly the same parent. 
So what does it mean for the encoding 

183
00:10:59,915 --> 00:11:04,217
that we compute that A and B are going to 
be siblings with the same parent? 

184
00:11:04,217 --> 00:11:07,430
It means that their encodings are going 
to be identical, 

185
00:11:07,430 --> 00:11:11,430
save to the lowest order bits. 
So A will get encoded with a bunch of 

186
00:11:11,430 --> 00:11:15,665
bits followed by a zero, B will be 
encoded by exactly the same prefix of 

187
00:11:15,665 --> 00:11:18,960
bits followed by A1. 
So they're going to have almost of the 

188
00:11:18,960 --> 00:11:23,430
same encoding, so for our recursion let's 
just treat them as the same symbol. 

189
00:11:23,430 --> 00:11:28,254
So let's introduce a new meta symbol, 
let's call it AB, which represents the 

190
00:11:28,254 --> 00:11:31,843
conjunction of A and B. 
So it's meant to represent all of the 

191
00:11:31,843 --> 00:11:36,020
frequencies of either one, all of the 
occurrences of either one of them. 

192
00:11:37,760 --> 00:11:42,365
But remember the input to the 
computational problem that we're studying 

193
00:11:42,365 --> 00:11:47,034
is not just an alphabet, but also 
frequencies of each of these symbols of 

194
00:11:47,034 --> 00:11:50,104
that alphabet. 
So my question for you is when we 

195
00:11:50,104 --> 00:11:54,709
introduce this new meta symbol A B. 
What should be the frequency that we 

196
00:11:54,709 --> 00:12:02,600
define for this meta symbol? 
All right, so hopefully your intuition 

197
00:12:02,600 --> 00:12:07,403
suggested answer C, that for this 
recursion to make sense, for it to 

198
00:12:07,403 --> 00:12:11,060
conform to our semantics of what this 
merging does. 

199
00:12:11,060 --> 00:12:15,257
We should define the frequency of this 
new meta symbol to be the sum of the 

200
00:12:15,257 --> 00:12:18,074
frequencies of the two symbols that it's 
replacing. 

201
00:12:18,074 --> 00:12:22,603
That's because, remember this meta symbol 
AB is meant to represent all occurrences 

202
00:12:22,603 --> 00:12:25,309
of both A and B. 
So it should be the sum of their 

203
00:12:25,309 --> 00:12:28,071
frequencies. 
So I'm now ready to formally describe 

204
00:12:28,071 --> 00:12:31,716
Huffman's greedy algorithm. 
Let me first describe it on an example 

205
00:12:31,716 --> 00:12:34,920
and then I think of the general code will 
be, self evident. 

206
00:12:34,920 --> 00:12:40,693
So let's just use our usual example, so 
we're going to have letters A, B, C, D, 

207
00:12:40,693 --> 00:12:48,492
with frequencies 60, 25, 10, and 5. 
So we're going to use the bottom up 

208
00:12:48,492 --> 00:12:52,782
approach, so we begin with each symbol 
just as its own node, A, B, C, D. 

209
00:12:52,782 --> 00:12:56,176
I'm annotating in red the frequencies 60, 
25, 10, 5. 

210
00:12:56,176 --> 00:13:01,298
The greedy heuristic says initially we 
should merge together the two nodes that 

211
00:13:01,298 --> 00:13:05,652
have the smallest frequencies. 
So that would be the C and the D with 

212
00:13:05,652 --> 00:13:10,293
their frequencies of 10 and 5. 
Now based on the idea of the last slide 

213
00:13:10,293 --> 00:13:15,282
when we merged C and D, we replaced them 
with a meta symbol cd whose frequency is 

214
00:13:15,282 --> 00:13:18,546
the sum of the frequencies of C and D, 
namely fifteen. 

215
00:13:18,546 --> 00:13:23,165
Now we just run another iteration of the 
greedy algorithm meaning we merge 

216
00:13:23,165 --> 00:13:27,723
together the two nodes, the two symbols 
that have the smallest frequencies. 

217
00:13:27,723 --> 00:13:32,219
So this is now B 25, and CD 15. 
So now we're down to just two symbols, 

218
00:13:32,219 --> 00:13:36,838
the original symbol A, which still has 
frequencies 60 and the in some sense, 

219
00:13:36,838 --> 00:13:42,070
meta, meta symbol BC, 
D who is now cumulative frequency is 40. 

220
00:13:42,070 --> 00:13:45,752
So when we're down to two nodes, that's 
going to be the point in Huffman's 

221
00:13:45,752 --> 00:13:49,487
algorithm where we hit the base case and 
the recursion begins to unwind. 

222
00:13:49,487 --> 00:13:53,637
So if you just have two symbols to encode 
there's pretty much only, one sensible 

223
00:13:53,637 --> 00:13:56,240
way to do it. 
One is a zero, and one is a one. 

224
00:13:56,240 --> 00:14:01,855
So at this point, the final recursive 
call just returns now a tree with two 

225
00:14:01,855 --> 00:14:05,300
leaves corresponding to the symbols A and 
BCD. 

226
00:14:06,860 --> 00:14:10,555
And now is the recursion on lines we in 
effect undo the mergers. 

227
00:14:10,555 --> 00:14:15,058
So for each merge we do a corresponding 
split of the meta node and we replace 

228
00:14:15,058 --> 00:14:18,869
that with an internal node. 
And then two children corresponding to 

229
00:14:18,869 --> 00:14:21,871
the symbols that were merged to make that 
meta node. 

230
00:14:21,871 --> 00:14:26,375
So for example, when we want to undo the 
merge of the B and the CD, we take the 

231
00:14:26,375 --> 00:14:28,799
node labeled BCD and we split it into 
two. 

232
00:14:28,799 --> 00:14:32,440
The left, left child being B, the right 
child being CD. 

233
00:14:32,440 --> 00:14:37,826
So the original outer most call, what it 
gets from its rehersive call is this tree 

234
00:14:37,826 --> 00:14:42,227
with three nodes and it has to undue it's 
merger, it merged C and D. 

235
00:14:42,227 --> 00:14:46,826
So it takes the leaf labeled with symbol 
CD and splits it into two. 

236
00:14:46,826 --> 00:14:51,752
It replaces it with a new unlabeled 
internal node, left child C, right child 

237
00:14:51,752 --> 00:14:53,854
D. 
So how does the algorithm work in 

238
00:14:53,854 --> 00:14:59,110
general? Well just as you'd expect given 
the discussion on this concrete example. 

239
00:14:59,110 --> 00:15:03,360
I'm going to take as the base case when 
the alphabet has just two symbols, in 

240
00:15:03,360 --> 00:15:07,840
that case the only thing to do is encode 
one with a zero, the other with a one. 

241
00:15:10,020 --> 00:15:14,169
Otherwise, we take the two symbols of the 
alphabet that have the smallest 

242
00:15:14,169 --> 00:15:17,125
frequencies. 
You can break ties arbitrarily, it's not 

243
00:15:17,125 --> 00:15:19,740
going to matter, it's going to be optimal 
either way. 

244
00:15:20,940 --> 00:15:26,740
We then replace these two low frequency 
symbols A and B with meta symbol AB, 

245
00:15:26,740 --> 00:15:30,479
intended to represent both of them in 
some sense. 

246
00:15:30,479 --> 00:15:35,668
As we just discussed with those 
semantics, we should be defining the 

247
00:15:35,668 --> 00:15:39,866
frequency of the meta symbol AB as the 
sum of the frequencies of the symbols 

248
00:15:39,866 --> 00:15:43,892
that it comprises. 
We now have a well defined, smaller sub 

249
00:15:43,892 --> 00:15:46,811
problem. 
It has one fewer symbol than the one we 

250
00:15:46,811 --> 00:15:49,974
were given. 
So, we can recursively compute a solution 

251
00:15:49,974 --> 00:15:52,833
for it. 
So, what the recursive call returns is a 

252
00:15:52,833 --> 00:15:56,969
tree who's leaves are in one to one 
correspondence with sigma prime. 

253
00:15:56,969 --> 00:16:01,591
That is, it does not have a leaf labeled 
A, it does not have a leaf labeled B. 

254
00:16:01,591 --> 00:16:06,031
It does have a leaf labeled AB, 
so we want to extend to this tree T prime 

255
00:16:06,031 --> 00:16:09,072
to be one who's leaves correspond to all 
of sigma. 

256
00:16:09,072 --> 00:16:13,695
And the obvious way to do that is to 
split the leaf labeled AB, replace that 

257
00:16:13,695 --> 00:16:19,720
with a new internal unlabeled node with 
two children which are labeled A and B. 

258
00:16:19,720 --> 00:16:24,539
The resulting tree capital T with leaves 
and correspondents to the original 

259
00:16:24,539 --> 00:16:28,533
alphabet sigma is then the final output 
of Huffman's Algorithm. 

260
00:16:28,533 --> 00:16:33,606
As always, with a greedy algorithm we may 
have intuition, it may seem like a good 

261
00:16:33,606 --> 00:16:36,459
idea. 
But we can't be sure without a rigorous 

262
00:16:36,459 --> 00:16:39,440
argument. 
That is the subject of the next video.