Summary: The Huffman source coding algorithm is provably maximally efficient.
Shannon's Source Coding Theorem has additional applications in data compression. Here, we have a symbolic-valued signal source, like a computer file or an image, that we want to represent with as few bits as possible. Compression schemes that assign symbols to bit sequences are known as lossless if they obey the Source Coding Theorem; they are lossy if they use fewer bits than the alphabet's entropy. Using a lossy compression scheme means that you cannot recover a symbolic-valued signal from its compressed version without incurring some error. You might be wondering why anyone would want to intentionally create errors, but lossy compression schemes are frequently used where the efficiency gained in representing the signal outweighs the significance of the errors.
Shannon's Source Coding Theorem states that symbolic-valued
signals require on the average at least
The simple four-symbol alphabet used in the Entropy and
Source
Coding modules has a four-symbol alphabet with the
following probabilities,
Huffman Coding Tree |
---|
![]() |
The code thus obtained is not unique as we could have labeled
the branches coming out of each node differently. The average
number of bits required to represent this alphabet equals
If the alphabet probabilities were different, clearly a
different tree, and therefore different code, could well
result. Furthermore, we may not be able to achieve the entropy
limit. If our symbols had the probabilities
Derive the Huffman code for this second set of probabilities, and verify the claimed average code length and alphabet entropy.
The Huffman coding tree for the second set of
probabilities is identical to that for
the first (Figure 1). The
average code length is
"Electrical Engineering Digital Processing Systems in Braille."