Summary: The Huffman source coding algorithm is provably maximally efficient.
Shannon's Source Coding Theorem has additional applications in data compression. Here, we have a symbolic-valued signal source, like a computer file or an image, that we want to represent with as few bits as possible. Compression schemes that assign symbols to bit sequences are known as lossless if they obey the Source Coding Theorem; they are lossy if they use fewer bits than the alphabet's entropy. Using a lossy compression scheme means that you cannot recover a symbolic-valued signal from its compressed version without incurring some error. You might be wondering why anyone would want to intentionally create errors, but lossy compression schemes are frequently used where the efficiency gained in representing the signal outweighs the significance of the errors.
      Shannon's Source Coding Theorem states that symbolic-valued
      signals require on the average at least
      
      
	The simple four-symbol alphabet used in the Entropy and
	Source
	Coding modules has a four-symbol alphabet with the
	following probabilities,
	  
| Huffman Coding Tree | 
|---|
![]()  | 
	The code thus obtained is not unique as we could have labeled
	the branches coming out of each node differently. The average
	number of bits required to represent this alphabet equals
	
	
	If the alphabet probabilities were different, clearly a
	different tree, and therefore different code, could well
	result. Furthermore, we may not be able to achieve the entropy
	limit. If our symbols had the probabilities
	
	
Derive the Huffman code for this second set of probabilities, and verify the claimed average code length and alphabet entropy.
The Huffman coding tree for the second set of
	  probabilities is identical to that for
	  the first (Figure 1).  The
	  average code length is
	  
	  
"Electrical Engineering Digital Processing Systems in Braille."