So we clearly have something to be 
excited about. 
There's clearly this opportunity to 
design binary prefix-free codes which 
improve over the obvious fixed link 
solution. 
So, we'd like to have in some sense, 
optimal algorithm for this problem and 
for that, we of course need a crisp 
problem definition. 
So, to do that it turns out to be useful 
to think of codes as binary trees. 
So, this video will develop that 
connection concluding with the final 
formal problem statement. 
So, the last video introduced us to this 
very interesting computational problem, 
namely given characters from an alphabet 
with frequencies, find the best binary 
prefix-free encoding, 
find the code which minimizes the average 
number of bits needed to encode a 
character. 
Crucial, the reasoning about this problem 
is thinking of binary codes as binary 
trees. So to give you an idea about this 
correspondence, let's just revisit three 
of the binary codes we saw in the 
previous video and see what kind of trees 
they correspond to. 
So let's just continue with the four 
symbol alphabet A B C D. 
The obvious fixed length code where we 
encode A, B, C, D was 0, 0, 0, 1, 1, 0 
and 1, 1 is just going to correspond to 
the complete binary tree with four 
leaves. 
So let me label this complete binary tree 
as follows. I'm going to label the leaves 
A through D from left to right, and I'm 
going to label each edge of the tree with 
a 0 if it corresponds to a left-child 
relationship and with a 1 if it 
corresponds to a right-child 
relationship. 
And now what you see is there's a 
correspondence between the bits along 
root to leaf paths and the fixed length 
encoding. 
So for example for the symbol C, if we 
follow the path from the root to the leaf 
labeled C, we first encounter a 1 because 
it's a right-child, then we encounter a 0 
because it's a left-child. 
That gives us a 1 and a 0, 
that's the same as our encoding of the 
symbol C in this fixed length encoding 
and the same of course is true for the 
other three leaves. 
Next, when we first started playing 
around with variable-length encodings to 
motivate the prefix-free property, we 
studied a code where we replaced the 
double 0 for an A with a single 0 and the 
double 1 for a D with a single 1. 
Now this code was not prefix-free, 
but we can still represent it as a binary 
tree. 
It's just not going to be a complete one. 
So I'm going to label the edges of this 
tree the same way as before. 
Left-child edges will be given a label of 
0, 
right-child edges will be given a label 
of 1. 
I'm going to label the left and right 
children of the root A and D respectively 
and the two leaves will be given the 
labels B and C. 
The reason I labeled the nodes in this 
way is, because then we have the same 
kind of correspondence between the 
encodings that we proposed for the 
various symbols and the bits that you see 
along a path from the root to nodes with 
those symbols. 
So for example, if you at the node 
labeled D, the path from the root only 
has a single bit 1 and that coincides 
with the proposed encoding of the symbol 
D. 
Now, remember, this code is not 
prefix-free and so therefore, as we saw, 
it had ambiguity. 
So if you're wondering what some encoded 
message means and you see a 0, you're not 
sure that 0 might be representing the 
symbol A or alternatively it might be the 
first half of an encoding of the symbol 
B. 
So, this ambiguity is also noticeable in 
the tree. 
And the property in the tree that tips 
you off to the ambiguity is that there 
are symbols at internal nodes of the 
tree. 
The symbols are not merely at the leaves 
as they were with the first tree with the 
fixed length in coding, 
but there are also two internal nodes 
that have symbols. 
So let's draw the tree for our final 
example which, was the variable length 
but prefix-free code that we looked at in 
the previous video. 
So this is going to correspond to a tree 
which is not perfectly balanced, but it 
will have labels only at the leaves of 
the tree. 
So, if you label the edges of this tree 
the way we've been doing, all the 
left-child edges get a 0, all the 
right-left edges get a 1, 
and we label the leaves of the tree from 
A to D going from left to right. 
You'll see we have the same 
correspondence as in the previous two 
trees. 
the sequence of bits from the root to a 
leaf coincide with the proposed encoding 
for it. 
So, for example, if you look at the leaf 
labeled C, because you traverse a 
right-child, another right-child and a 
left-child to get to it, that's the 
sequence 1, 1, 0 and that is precisely 
the proposed encoding for the symbol C. 
So in general, any binary code can be 
expressed as a tree in this way, with the 
left-child pointers being labeled with 
0's, the right child pointers being 
labeled with 1's, 
and the various nodes being labeled the 
symbols of the given alphabets, and the 
bits from the root down to the node 
labeled with the given symbol 
corresponding to the proposed encoding 
for that symbol. 
And what's cool about thinking about 
codes as trees is that the really 
important prefix-free condition, which 
seems like a nuisance to check in the 
abstract, shows up in a really clean way 
in these trees, 
namely the prefix-free condition is the 
same as leaves being the only nodes that 
have labels. 
No internal nodes are allowed to have a 
label in a prefix-free code. 
The reason for this is that we've set it 
up so that the encodings correspond to 
the bits along paths from the root to the 
labeled node. 
So being a prefix of another corresponds 
to one node being an ancestor of the 
other, and so, if all the labels are at 
the leaves, then of course nobody is an 
ancestor of the other and we have no 
prefixes. 
The other things that's really cool about 
this tree representation of codes is, it 
becomes pictorially obvious how you do 
the coding given this sequence of 0's and 
1's from a prefix-free binary code, 
namely, you start at the beginning and 
you start at the root of your tree, 
whenever you see a 0 you go left, 
whenever you see a 1 you go right. 
At some point you'll hit a leaf, that 
leaf will have some label and that's the 
symbol that's being encoded, and after 
you hit a leaf, you just start all over 
again back to the root. 
So for example, if you were using our 
variable-length prefix-free code for the 
four letter alphabet, as in our running 
example, 
and you were given the sequence of 0s and 
1s, 0, 1, 1, 0, 1, 1, 1. What would you 
do? 
Well, you'd start at the root and you see 
a 0, so you follow the left-child 
pointer, and you immediately get to a 
leaf. 
It's labeled A, so you're going to output 
and A as the first symbol. 
Now you start all over. You return to the 
root, now what do you see? 
You see a 1, so you go right, you see 
another one, so you go right, and now you 
see a 0, so you go left and that gets you 
to the leaf labeled C. 
Now you start all over again. 
You see a 1, you go right, you see a 1, 
you go right again, and then, finally you 
see yet one more 1 and you wind up at the 
lead labelled D. 
So by repeated traversals through the 
tree, you decode the sequence of 0s and 
1s as a C, D. 
There's never any ambiguity, because when 
you hit a label, you know you're going to 
leave, there's no where to go. 
And every internal note, it's unlabeled, 
you know to expect another bit, another 
traversal further down in the tree. 
So a final important point about this 
correspondence is that the encoding 
lengths of the symbols, the number of 
bits needed to encode the various 
symbols, are just the depths of the 
corresponding leaves in the tree that 
corresponds to the code. 
So for example, in our running example 
the symbol A is the only one that needs 
only one bit to encode and it's also the 
only leaf in level one of the tree. 
And similarly B needs two bits and shows 
up in the next level, 
the C and the D which need three bits 
show up in the third level. 
So this correspondence is, really just by 
construction, 
so how do you encode a given symbol. 
Well, it's just the bits on the path from 
the root, and that the number of such 
bits is just the number of pointer 
traversals you need to get from the root 
down to that leaf, and that's just the 
depth of that leaf in the tree. 
So we're now in a great position to 
really have a crisp definition of the 
problem. 
The input of course is just the 
frequencies for a bunch different symbols 
i from some alphabet capital sigma. 
I'm going to use P sub i as notation for 
the frequency of symbol i. 
So we know what it is we want to 
optimize. 
We want to minimize the expected number 
of bits needed to encode a symbol, where 
the average of the expectation is taken 
over the provided frequencies of the 
various symbols. 
Well, let's express this objective 
function using our newfound 
correspondence with binary trees. 
In particular, the fact that we can think 
about encoding lengths as depths of 
leaves in trees. 
So, given a tree, T, which corresponds to 
a prefix-free binary code. 
That is it should be a binary tree and 
the leaves of this tree should be in 
one-to-one correspondence with the 
symbols of sigma. 
We're going to define capital L(T) as the 
average encoding length. 
It's an average in the sense that we sum 
over all of the symbols of the alphabet, 
we weight each symbol by the frequency, 
and remember, this is part of the input, 
so whether the provided frequency P 
survived that symbol i. 
And then how many bits do we need to 
encode that symbol i? 
Well, it's just the depth of the leaf 
which is labelled i in the given tree, 
capital T. 
So this is what we want to make as small 
as possible. 
So, for instance, using the data from the 
previous video, the letters A, B, C, D 
with frequencies 60, 25, 10, and 5%. 
Then if we use the complete binary tree, 
that is the fixed length encoding, we 
just get two bits per character. 
While if we use the lopsided tree 
optimized, so that each A only takes one 
bit while suffering three bits for C and 
D, then the average encoding length drops 
to 1.55, as we saw in the last video. 
So what then is the goal, 
what's the responsibility of our 
algorithm? 
Well, amongst all binary trees, which 
have leaves and correspondence to the 
symbols of sigma, we want to compute the 
one which makes this average encoding 
length as small as possible which 
minimizes our objective function capital 
L. 
Turns out Huffman's greedy algorithm does 
it. 
More details to come. 
[SOUND]