1
00:00:03,360 --> 00:00:09,259
Next, we'll look at Huffman encoding. this
is a classic algorithm that is still

2
00:00:09,465 --> 00:00:15,583
widely-used and effectively used today.
it's definitely on the list of algorithms

3
00:00:15,583 --> 00:00:21,805
that everyone should know cuz it's quite
ingenious. so we start out with the idea

4
00:00:21,805 --> 00:00:27,763
of variable length codes. so far, we've
talked about codes where every character

5
00:00:27,763 --> 00:00:33,010
is represented with the same number of
bits. But there's no reason to do that.

6
00:00:33,010 --> 00:00:39,279
For example, look at Morse code. the idea
in a way that we assign dots and dashes to

7
00:00:39,279 --> 00:00:44,798
letters is that frequently, you'll use
letters should have smaller number of

8
00:00:44,798 --> 00:00:51,134
characters. so for example, an E is only
one dot and so forth. And letters that are

9
00:00:51,134 --> 00:00:58,142
used less frequently, like Q are going to
be longer. More dashes and more

10
00:00:58,142 --> 00:01:06,242
characters. So with this we can, it's an
idea, a compression idea, of let's use

11
00:01:06,242 --> 00:01:12,993
fewer characters for frequent fewer bits
or fewer code characters for frequently

12
00:01:12,993 --> 00:01:19,789
used characters. now, there's an issue
with this and that is that it can be

13
00:01:19,789 --> 00:01:26,860
ambiguous. So, this message everybody
knows, looks like SOS, three dots followed

14
00:01:26,860 --> 00:01:34,147
by three dashes, followed by three dots.
but actually it could be it's any one of

15
00:01:34,147 --> 00:01:41,184
these others. Like V is dot, dot, dot,
dash. Thats good. and then, seven is dash,

16
00:01:41,184 --> 00:01:47,195
dash, dot, dot, dot, so it could be V7. Or
it could be either of these two. There's

17
00:01:47,195 --> 00:01:53,084
lots of different things that this thing
could be. So, Morse code is actually, it's

18
00:01:53,084 --> 00:01:59,324
not just dots and dashes it actually has
they have a little space between the code

19
00:01:59,324 --> 00:02:07,037
words to avoid this problem that there's
an ambiguity caused by the fact that one

20
00:02:07,037 --> 00:02:12,366
code word can be prefix of another and
there's no way to tell that it's not

21
00:02:12,366 --> 00:02:19,002
without separating the code words. so now
that's the way it's solved in Morse code

22
00:02:19,226 --> 00:02:25,046
and that's not necessarily the most
efficient way to solve it, and that's what

23
00:02:25,046 --> 00:02:30,791
Huffman codes addresses. Also, a problem
with Morse code is don't be trying to

24
00:02:30,791 --> 00:02:36,835
transmit a lot of numbers with Morse code,
because the codes for those are pretty

25
00:02:36,835 --> 00:02:42,580
long and, and you'll have much longer
representations for codes that involve a

26
00:02:42,580 --> 00:02:47,619
lot of messages and involve a lot of
numbers. But there's a really elegant way

27
00:02:47,619 --> 00:02:52,909
to avoid ambiguity. And that is to just
adopt a rule that no code word is going to

28
00:02:52,909 --> 00:02:57,877
be a prefix of another one. so fixed
length codes do that. If every code is the

29
00:02:57,877 --> 00:03:03,167
same number of bits and every character
encode different bits, then no code word's

30
00:03:03,167 --> 00:03:08,443
a prefix of another. They're just all
different. another thing you can do is

31
00:03:08,443 --> 00:03:14,161
have a, a special stop character like,
like the space in Morse Code to end to

32
00:03:14,161 --> 00:03:19,735
each code word. That's really what it's
doing. so the code words are all

33
00:03:19,735 --> 00:03:25,235
different. And they end with a character
that's special. And so, none can be a

34
00:03:25,235 --> 00:03:31,605
prefix of another one. but there's also
just an easy way to just make sure that

35
00:03:31,605 --> 00:03:37,830
you use a code that has this prefix free
property. so, for example, this code here

36
00:03:38,100 --> 00:03:45,411
no code word is a prefix of another. And
it means when you have bits there's a

37
00:03:45,411 --> 00:03:50,928
unique way to decode. So here, the
compressed bitstream starts with zero, the

38
00:03:50,928 --> 00:03:55,606
only way that can happen is if the first
letter is A. Then, when you're done with

39
00:03:55,606 --> 00:04:02,021
the A you've got four ones and that means
it's got to be a B. and then, you get rid

40
00:04:02,021 --> 00:04:07,427
of the B, and so forth. And it's three
ones and a zero, it's got to be an R, and

41
00:04:07,427 --> 00:04:13,914
so forth. Since no code word is a prefix
of another you can just read off the code

42
00:04:13,914 --> 00:04:20,422
from the bit string without having any
special stop character or any efficiency,

43
00:04:20,422 --> 00:04:26,519
inefficiency caused by that. all we have
is the bits and we are able to uniquely

44
00:04:26,519 --> 00:04:31,710
decode it. And there is numerous
prefix-free codes. For example here's

45
00:04:31,710 --> 00:04:38,735
another prefix-free code that for the same
five characters and it actually uses fewer

46
00:04:38,735 --> 00:04:45,570
bits. So, the idea of Huffman encoding
within these rules where we must have a

47
00:04:45,570 --> 00:04:52,189
prefix-free, free code. And we've got a
particular message. what's amazing about

48
00:04:52,189 --> 00:04:59,166
Huffman encoding is, that it finds the
code that uses the fewest bits for that

49
00:04:59,166 --> 00:05:05,492
message. and that's why, you know, it's so
effective. So the interesting thing one

50
00:05:05,492 --> 00:05:11,494
interesting thing about prefix-free code
is that they're really easy to represent

51
00:05:11,494 --> 00:05:17,691
with trie. and, and the idea is very
simple. We put characters in the leaves

52
00:05:17,691 --> 00:05:23,808
and we imagine that every, this is binary
trie, so we imagine that there's only two

53
00:05:23,808 --> 00:05:30,487
links per node. And we imagine that every
left link is labeled zero and every right

54
00:05:30,487 --> 00:05:36,523
link is labeled one. and then, just
following the path from the root to a

55
00:05:36,523 --> 00:05:42,307
leaf, so say we go right, so it's one,
zero, zero, that's D. Those bits are not

56
00:05:42,307 --> 00:05:48,095
stored in the trie, we just keep, we just
implicitly keep track of them. if we go

57
00:05:48,095 --> 00:05:54,169
left we keep zero, if we go right we keep
one. And so a try for prefix-free code

58
00:05:54,383 --> 00:06:01,993
uniquely determines the code. and that
means that just working with the trie we

59
00:06:01,993 --> 00:06:07,041
can decompress a bit string just by
starting at the root, reading the bits

60
00:06:07,238 --> 00:06:12,548
take them where they take us, and when we
get to a leaf, we've got a character. so

61
00:06:12,548 --> 00:06:17,400
that's the trie corresponding to that
code, that's the trie corresponding to

62
00:06:17,400 --> 00:06:22,645
that code. this one starts out with one,
one, so starting at the root, we go one,

63
00:06:22,645 --> 00:06:28,283
one, and the first letter is A. And then,
the next letters are zero, zero, and so we

64
00:06:28,283 --> 00:06:33,200
go zero, zero, and we come to B, and so
forth. a simple representation of a

65
00:06:33,200 --> 00:06:38,681
prefix-free code that is that makes for
easy decoding, given the bit, bit string

66
00:06:38,681 --> 00:06:47,030
in the trie, it's pretty easy to decode.
you could also keep a symbol table of

67
00:06:46,305 --> 00:06:53,192
pairs but sorry, for compression, how do
we do compression that was in expansion?

68
00:06:53,192 --> 00:07:00,260
so, for compression, we can just keep the
table and use it. you could also use the

69
00:07:00,260 --> 00:07:07,262
trie to figure out the code by following
the path h, up to the root. and for

70
00:07:07,262 --> 00:07:14,047
expansion, as I just said, you start out
at the root I go left at zero, right of

71
00:07:14,047 --> 00:07:22,014
one and it's very easy. so for among comp,
expansion, so we will look at the code for

72
00:07:22,169 --> 00:07:26,455
both of these. The question is how to
construct, the first question is how to

73
00:07:26,455 --> 00:07:33,770
construct the trie. so let's start with
the data type. so this is similar to other

74
00:07:33,770 --> 00:07:41,069
tree structures that we've built before
where we have a character that we don't

75
00:07:41,069 --> 00:07:48,452
use for internal nodes, just for leaves.
we've got the frequency that we use while

76
00:07:48,452 --> 00:07:55,581
we're building the tribe, not when we're
expanding and then every node has a left

77
00:07:55,581 --> 00:08:00,843
and a right. and th is, is the
constructor, you just fill in all the

78
00:08:00,843 --> 00:08:06,609
fields. we need a method to tells us if
the current node is a leaf. And that's

79
00:08:06,609 --> 00:08:13,175
when so that's if both its links are null.
and we need to compare it to, to compare

80
00:08:13,175 --> 00:08:19,762
nodes by frequency. And we'll see why
that's needed later on. so that's the data

81
00:08:19,762 --> 00:08:26,582
type for the nodes that we're going to use
to build the tries. and so now, let's look

82
00:08:26,582 --> 00:08:33,077
at expansion. So, this is in code what I
just talked about. so, we're going to have

83
00:08:33,077 --> 00:08:39,020
a method that reads in some of the bit
string, and converts it to a trie. Now,

84
00:08:39,020 --> 00:08:46,398
that's one clever part of the algorithm.
and then the first thing is the number of

85
00:08:46,627 --> 00:08:53,016
characters in the code in the message, you
know, we also get that. So, we get the

86
00:08:53,016 --> 00:08:59,481
trie, and then, we get the number of
characters. and then we simply for do a

87
00:08:59,481 --> 00:09:06,610
for loop for the number of characters and
start at the root. if we're not at a leaf

88
00:09:06,870 --> 00:09:13,303
we read a bit. And if it's zero, we go
left, and if it's one, we go right. and

89
00:09:13,303 --> 00:09:20,608
that's it. if as soon as we get to a leaf,
we just write out the character that we

90
00:09:20,608 --> 00:09:25,563
get and then, go back to the root and keep
going. We're not in a leaf if we read a

91
00:09:25,563 --> 00:09:30,856
bit, we have zero, go to the left, one, go
to the right. And when we get to a leaf,

92
00:09:30,856 --> 00:09:36,761
we write out the character extremely
compact code for expanding. given a bit

93
00:09:36,761 --> 00:09:41,987
string, the first thing is to trie. We
have to talk about how to get the trie in

94
00:09:41,987 --> 00:09:47,730
from a bit string number of characters.
and then, the simple loop that expands

95
00:09:47,923 --> 00:09:53,660
according to the trie representation. so
that's networks for any prefix-free code,

96
00:09:53,660 --> 00:09:59,635
not just the Cuffman, Huffman code. in the
running your time, it's obviously linear

97
00:09:59,635 --> 00:10:05,726
in the number of bits cuz for it's just a
loop that chews up a bit every time in the

98
00:10:05,726 --> 00:10:12,989
loop. Okay. So, so again, for any
prefix-free, free code, how are we

99
00:10:12,989 --> 00:10:21,639
actually going to write the trie? well
it's a little, little clever but by this

100
00:10:21,639 --> 00:10:30,691
time, you should be ready for this kind of
algorithm. what you can do is traverse the

101
00:10:30,691 --> 00:10:39,598
trie in pre-order and when you and then
come to an internal node you write a zero.

102
00:10:39,798 --> 00:10:45,615
when you come to a leaf, you write a one f
ollowed by the 8-bit character at that

103
00:10:45,615 --> 00:10:52,379
leaf. so it's a simple pre-order
traversal. if this is a recursive program

104
00:10:52,379 --> 00:10:59,863
to write out in trie if you're at a leaf,
you write a one followed by the 8-bit

105
00:11:00,204 --> 00:11:08,062
characters at the leaf and you're done.
and if you're not at the leaf, you simply

106
00:11:08,062 --> 00:11:15,706
write a zero. and then recursively, do the
two sub-tries. and that gives a unique

107
00:11:15,706 --> 00:11:23,180
representation of the trie that can be
used to construct the trie from that bit

108
00:11:23,180 --> 00:11:31,450
string. so and the idea is that typically,
we're talking about a, a very long message

109
00:11:31,677 --> 00:11:37,421
the trie itself is just basically
encodings of all the possible characters

110
00:11:37,421 --> 00:11:43,695
in a message. and that's going to tend to
be small compared to the length of the

111
00:11:43,695 --> 00:11:49,703
message. so for say, a genomic sequence
there would be only four characters and

112
00:11:49,703 --> 00:11:54,909
the ones that used most frequently,
hopefully would be encoded with not that

113
00:11:54,909 --> 00:12:00,180
many bits. of course, but anyways, the
size of the trie itself is going to be

114
00:12:00,180 --> 00:12:05,475
much smaller than the length of the
message. so reading in the trie and this

115
00:12:05,475 --> 00:12:11,271
requires a little bit of thought, but and
we see the code, the code is extremely

116
00:12:11,271 --> 00:12:17,299
simple. We justreconstructed from the
pre-order traversal to trie. So, this is

117
00:12:17,299 --> 00:12:23,714
the readTrie method that we needed, that
reads the bits from binary standard in and

118
00:12:23,714 --> 00:12:30,440
produces the trie, and it's also
recursive. and if the bit that you read is

119
00:12:30,738 --> 00:12:40,382
zero that sorry, if the bit that you read
is one that means you have to create a

120
00:12:40,382 --> 00:12:49,729
leaf otherwise, you recursively read the
left and read the right. and make a new

121
00:12:49,729 --> 00:12:57,121
node that has a subtree that, that denotes
to the left and the right. And it doesn't

122
00:12:57,338 --> 00:13:04,010
it doesn't use the character in the second
one, the frequency we initialize to zero.

123
00:13:04,305 --> 00:13:11,886
so, in internal nodes, we don't use the
character. So if you look at what would

124
00:13:12,182 --> 00:13:19,861
this code do for this bit string so the
first bit is zero. so it's going to

125
00:13:20,452 --> 00:13:28,526
recursively call itself to read the trie
on the left. and so, the next bit is a

126
00:13:28,526 --> 00:13:33,984
one. so, it's going to read the next eight
bits to get the A. And it's going to

127
00:13:33,984 --> 00:13:40,158
return a new node with that character in
it that's a leaf. so, that's going to be

128
00:13:40,158 --> 00:13:45,986
the left subtree of the initial trie, the
first recursive call. and then, it'll read

129
00:13:45,986 --> 00:13:51,120
the right subtree, which is an internal
node, another internal node. It takes a

130
00:13:51,120 --> 00:13:57,225
little thinking to convince yourself this,
this works. but it does. And it's very

131
00:13:57,225 --> 00:14:03,608
compact and easy code. And again, works
for an prefix code. you encode the trie

132
00:14:03,608 --> 00:14:09,200
with the prefix traversal. And with a very
small amount of code, you can reconstruct

133
00:14:09,200 --> 00:14:14,860
the trie and transmit a bitstream. And so,
this is a compression of a trie structure.

134
00:14:14,860 --> 00:14:19,826
so now, the question is, how do we,
there's a lot of prefix pre-codes, how we

135
00:14:19,826 --> 00:14:26,321
will find the best one? well, and there's
an idea called the Shannon-Fano Algorithm

136
00:14:26,321 --> 00:14:32,841
which says the, the, the key idea is that
you've got characters that appear with

137
00:14:32,841 --> 00:14:39,684
different frequency. And you want to get
the ones that appear very frequent to use

138
00:14:39,684 --> 00:14:45,634
fewer bits. and so, it's of interest to
find the ones that are very frequent, and

139
00:14:45,634 --> 00:14:51,384
so the Shannon-Fano algorithm says, just
divide all the symbols in, into two

140
00:14:51,384 --> 00:14:57,681
roughly equal frequency and start the ones
in the first one with zero and the ones in

141
00:14:57,681 --> 00:15:03,293
the second one with one, it's kind of a
balanced code. and then, and then kind of

142
00:15:03,293 --> 00:15:12,048
re, recur. so if you have A appearing five
times, and you C appearing one time at six

143
00:15:12,048 --> 00:15:19,660
for those and then you get six for all of
those. And then you try to encode those

144
00:15:19,660 --> 00:15:25,233
with and try to use zeroes for roughly
half the character. And one for roughly

145
00:15:25,233 --> 00:15:31,007
half the character. so in order to deal
with that, you have to figure out how to

146
00:15:31,007 --> 00:15:36,846
divide up the symbols. And you can imagine
an algorithm for doing that. but the real

147
00:15:36,846 --> 00:15:42,424
problem with this method is that it's not
optimal. and so that's the kind of

148
00:15:42,424 --> 00:15:48,132
situation that Huffman was faced with way
back when coming up with the algorithm.

149
00:15:48,329 --> 00:15:53,644
and the best way to explain the Huffman
algorithm is through a demo. And so,

150
00:15:53,644 --> 00:15:59,220
that's what we'll do next. So, here's a, a
demo of the Huffman algorithm. So, the

151
00:15:59,220 --> 00:16:05,302
first thing that we do is count the
frequency of each character in the input.

152
00:16:05,302 --> 00:16:11,384
and so that's what the algorithm is based
on. So, in this case, there's five As, two

153
00:16:11,384 --> 00:16:18,434
Bs, o ne C, one D, and so forth. So our
goal is to use fewer bits to encode A and

154
00:16:18,434 --> 00:16:23,843
more bits to encode C, D, and exclamation
point ,!,, for example and we want to

155
00:16:23,843 --> 00:16:32,763
prefix-free code. so it's kind of a bottom
up method. So what we do is we build one

156
00:16:32,763 --> 00:16:40,829
node for each character and we keep in the
node corresponding to each character, a

157
00:16:40,829 --> 00:16:47,645
weight that's equal to the frequency. and
then we imagine keeping them in a sorted

158
00:16:47,645 --> 00:16:53,835
order. so that's how the method starts
out. Now remember, our node representation

159
00:16:53,835 --> 00:17:00,026
had an instance variable that allowed for
storing these frequencies. And then the

160
00:17:00,026 --> 00:17:06,480
idea is to given these are sub-tries of
size one, the idea is to combine two of

161
00:17:06,480 --> 00:17:12,325
them and merge them into a single tribe
with a cumulative weight. And so we're

162
00:17:12,325 --> 00:17:18,832
going to find the ones that are, the two
that have the smallest weight and create a

163
00:17:18,832 --> 00:17:26,509
new internal node for them. And the weight
of that node is going to be the sum of the

164
00:17:26,753 --> 00:17:33,237
two weights. So, that's the frequency of
all the code characters below it. and

165
00:17:33,604 --> 00:17:42,758
then, just put that into the mix. so now
we have five sub-tries to deal with and

166
00:17:42,758 --> 00:17:49,336
that's the algorithm. Select the two tries
with minimum weight. in this case, it's,

167
00:17:49,336 --> 00:17:54,503
since we're keeping them sorted so it's
going to be the ones on the left. combine

168
00:17:55,070 --> 00:18:00,874
them and add their two weights. So, make a
parent node. It doesn't matter which goes

169
00:18:00,874 --> 00:18:06,324
in left which goes right, really. and
then, add the weights to the frequency of

170
00:18:06,324 --> 00:18:14,292
the parent node and then put it into the
list. Now, if you look up at the table at

171
00:18:14,292 --> 00:18:20,809
the right what we're doing when we're
doing this combination is building up the

172
00:18:20,809 --> 00:18:27,251
code corresponding to the characters
moving from left to right. So, so far, we

173
00:18:27,251 --> 00:18:33,616
know that the code for D is going to end
in zero and the code for C is going to end

174
00:18:33,616 --> 00:18:39,679
in one, one. and so for exclamation is
going to end in one zero. so that now

175
00:18:39,679 --> 00:18:48,107
continue the rule. Now the B and R are get
combined. And again we're figuring out how

176
00:18:48,107 --> 00:18:54,154
those code words end. and so now, we have
just three to choose from. Now, we're

177
00:18:54,154 --> 00:19:01,120
going to combine the two big ones and now,
we have new prefix bits for all of them.

178
00:19:01,962 --> 00:19:08,392
and notice that our letter that appear
with highest frequency gets added in late,

179
00:19:08,392 --> 00:19:14,756
which means, it's going to be nearer the
root. so now, we have just the two and we

180
00:19:14,756 --> 00:19:21,445
finish the algorithm. there's only two
left now, so we combine them. I merge into

181
00:19:21,445 --> 00:19:27,628
a single try. And that corresponds to
prepending a one to the codes for all the

182
00:19:27,628 --> 00:19:33,446
other letters and just having a one bit
code for A. And that now winds up with a

183
00:19:33,446 --> 00:19:39,298
single try. that's a demo of Huffman's
algorithm in operation. So, the Huffman

184
00:19:39,298 --> 00:19:45,105
Code is an answer to the question of how
to find the best prefix code that's really

185
00:19:45,105 --> 00:19:50,651
very simple to describe. Count a frequency
for each character in the input. And you

186
00:19:50,651 --> 00:19:55,870
start with one node corresponding to each
character with weight given by its

187
00:19:55,870 --> 00:20:00,763
frequency. And just, until you have a
single trie, you select the two with the

188
00:20:00,763 --> 00:20:05,853
minimum weight and merge them into a
single trie by creating a new node with a

189
00:20:05,853 --> 00:20:11,864
weight sequel or the sum of the
frequencies. this algorithm is used in

190
00:20:11,864 --> 00:20:19,780
many different and familiar compression
applications from PDFs to GZIP to MP3s to

191
00:20:19,780 --> 00:20:26,780
JPEG. and it's pretty simple to code up in
Java given the tools that algorithmic

192
00:20:27,104 --> 00:20:33,090
tools that we've built up. so what we're
going to do is this is the F of the

193
00:20:33,090 --> 00:20:38,767
frequencies have been computed. And we're
going to build a try given the

194
00:20:38,767 --> 00:20:44,230
frequencies. and we're going to use a
priority Q in our generic priority Q, we

195
00:20:44,230 --> 00:20:49,586
might as well put nodes on it. we
implemented a compare-to method. all we

196
00:20:49,586 --> 00:20:55,972
need is the compare-to to be able to build
a priority Q from data of that type. So,

197
00:20:55,972 --> 00:21:01,603
we have a priority Q of nodes and for all
the possible character values, if they,

198
00:21:01,603 --> 00:21:07,165
and if they appear in the message, they
have a nonzero frequency we'll just put in

199
00:21:07,165 --> 00:21:13,608
a new node with the given frequency that's
a leaf onto the priority Q. And then the

200
00:21:13,608 --> 00:21:19,873
Huffman algorithm is just as long as
there's more than one node in the priority

201
00:21:19,873 --> 00:21:26,575
Q. we pull off the smallest two nodes, we
create a new node, that's apparent. it's

202
00:21:26,575 --> 00:21:31,795
character value is not used. we compute
the total frequency, the sum of the two

203
00:21:31,795 --> 00:21:37,338
children and we make those children of
that nod e, and then we put that back onto

204
00:21:37,338 --> 00:21:42,622
the priority Q. And that's it. Take two
off, put one on. it reduces in size by

205
00:21:42,622 --> 00:21:47,263
one. So, this while loop is eventually
going to terminate. when it's done,

206
00:21:47,263 --> 00:21:53,333
there's only one node on the priority Q
and that's the root of the trie. extremely

207
00:21:53,574 --> 00:21:59,915
simple implementation of Huffman's
algorithm using our generic priority Q.

208
00:22:00,156 --> 00:22:07,126
you can also implement it by presorting
the of the nodes as well. but this is a

209
00:22:07,126 --> 00:22:14,480
particularly nice way to do it. now you do
have to prove that it produces an optimal

210
00:22:14,480 --> 00:22:21,515
code. That's in the sense that no code
produces, produce, uses fewer bits. and

211
00:22:21,515 --> 00:22:26,866
that proof is in the book. it's, it's not
terribly difficult but we're not going to

212
00:22:26,866 --> 00:22:31,670
do it in the lecture. the, and that's,
Huffman proved that it in the 1950s. So,

213
00:22:31,670 --> 00:22:36,808
there's no point looking for a better
prefix-free code. and prefix-free codes

214
00:22:36,808 --> 00:22:43,930
are so convenient cuz you don't need to
worry about the ambiguity it's a fine

215
00:22:43,930 --> 00:22:50,774
method that's pretty easy to do. So now
what we have to do, one disadvantage of

216
00:22:50,774 --> 00:22:56,639
Huffman and some other methods don't have
is, we have to have two passes through the

217
00:22:56,639 --> 00:23:02,714
data. we have to go through and read all
the character frequencies and then build

218
00:23:02,714 --> 00:23:08,091
the trie and then we have to go back
through the data and encode the file,

219
00:23:08,091 --> 00:23:16,499
either traversing the trie from bottom up
or using just a symbol table for to look

220
00:23:16,499 --> 00:23:24,214
up the code for each symbol in the
message. and the running time is pretty

221
00:23:24,214 --> 00:23:30,577
clear from the use of the trie, the trie,
the number of characters in the number of

222
00:23:30,577 --> 00:23:36,654
nodes in the trie is just the alphabet
size. and so it's going to be alphabet

223
00:23:36,654 --> 00:23:43,303
size log R because there's trie operation
that might involve depth of log for using

224
00:23:43,303 --> 00:23:51,431
heap implementation. and then you have to
be the for each character, you have to do

225
00:23:51,431 --> 00:23:57,485
some kind of lookup to encode. and
actually a question is, can we do better

226
00:23:57,712 --> 00:24:04,296
in terms of running time? and well, stay
tuned, we'll look at a different method.

227
00:24:04,296 --> 00:24:06,340
That's Huffman compression.