Communication theory has been formulated best for
      symbolic-valued signals. Claude
      Shannon published in 1948 The Mathematical Theory
      of Communication, which became the cornerstone of digital
      communication. He showed the power of probabilistic
      models for symbolic-valued signals, which allowed him to
      quantify the information present in a signal. In the simplest
      signal model, each symbol can occur at index
      nn with a probability 
      
	Pr
	      a
	      k
	    
      
	  
	  
	      a
	      k
	    
	,  
      
	k=1…K
      
	  
	  k 
	  
	    1
	    …
	      K
	  
	.  What this model says is that for each signal value a
      KK-sided coin is flipped (note that
      the coin need not be fair). For this model to make sense, the
      probabilities must be numbers between zero and one and must sum
      to one. 
      
      
      
	   
	  ∑k=1KPrak=1
	
	    
	    
	      
	      k
	      1
	      K  
	      
		
		ak
	      
	    
	    1
	  
(2)
      This coin-flipping model assumes that symbols occur without
      regard to what preceding or succeeding symbols were, a false
      assumption for typed text. Despite this probabilistic
      model's over-simplicity, the ideas we develop here also
      work when more accurate, but still probabilistic, models are
      used. The key quantity that characterizes a symbolic-valued
      signal is the 
entropy of its alphabet.  
      
	 
	  HA=−∑kkPr
			a
			k
		      log2Pr
			  a
			  k
			
	
	    
	    
	      H
	      A
	    
	    
	      
	      
		
		k
		
		  k
		 
		
		  
		  
		    
		    
			a
			k
		      
		  
		  
		    
		    2
		    
		      
		      
			  a
			  k
			 
		    
		  
		
	      
	    
	  
      
(3)
      Because we use the base-2 logarithm, entropy has units of
      bits. For this definition to make sense, we must take special
      note of symbols having probability zero of occurring. A
      zero-probability symbol never occurs; thus, we define
      
	0log20=0
      
	  
	  
	    0
	    
	      
	      2
	      0
	    
	  
	  0
	 so that such symbols do not affect the entropy. The
      maximum value attainable by an alphabet's entropy occurs
      when the symbols are equally likely
      (
	Pr
		  a
		  k
		=Pr
		  a
		  l
		
      
	    
	    
	    
	      
		  a
		  k
		
	    
	    
	    
	      
		  a
		  l
		
	    
	  ).  In this case, the entropy equals 
      
	log2K
      
	  2
	  K
	.  The minimum value occurs when only one symbol
      occurs; it has probability one of occurring and the rest have
      probability zero. 
	Derive the maximum-entropy results, both the
	    numeric aspect (entropy equals 
	  
	    log2K
	  
	      2
	      K
	    ) and the theoretical one (equally likely symbols
	  maximize entropy). Derive the value of the minimum entropy
	  alphabet.
	
       
	  Equally likely symbols each have a probability of  
	  
	    1K
	  
	      
	      1
	      K
	    .  Thus, 
	  
	    HA=−∑kk1Klog21K=log2K
	  
	      
	      
		H
		A
	      
	       
	        
		k
		k
		 
		   
		    1K
                   
		   
		     2
		     
		      1K
		     
                   
		 
		         
	       
	      
		2
		K
	      
	    .  To prove that this is the maximum-entropy
	  probability assignment, we must explicitly take into account
	  that probabilities sum to one.  Focus on a particular
	  symbol, say the first.  
	  
	    Pr
		  a
		  0
		
	  
	      
	      
		  a
		  0
		
	     appears twice in the entropy
	  formula: the terms 
	  
	    Pr
		    a
		    0
		  log2Pr
		      a
		      0
		    
	   
	      
		
		
		    a
		    0
		  
	      
	      
		2
		
		  
		  
		      a
		      0
		    
		
	      
	    
	  and  
	  
	    (1−Pra0+…+Pr
                         
                          a
                          K-2
			)log2(1−Pra0+…+Pr
                         
                          a
                          K-2
			)
	  
	      
		1
		  
		    
		    
		      a0
		    
		    …
		    
		    
		      
                         
                          a
                          K-2
			
		    
		  
	      
	      
		2
	      
		1
		  
		    
		      
		      a0
		    
		    …
		    
		      
		      
                         
                          a
                          K-2
			
		    
		  
	      
	      
	    .  The derivative with respect to this probability
	  (and all the others) must be zero. The derivative equals 
	   
	    log2Pr
		      a
		      0
		    −log2(1−Pra0+…+Pr
                         
                          a
                          K-2
			)
	   
	      
		2
		
		  
		  
		      a
		      0
		    
		
	      
	      
		2
	      
		1
		  
		    
		      
		      a0
		    
		    …
		    
		      
		      
                         
                          a
                          K-2
			
		    
		  
	      
	      
	    , and all other derivatives have the same form
	  (just substitute your letter's index).  Thus, each
	  probability must equal the others, and we are done.  For the
	  minimum entropy answer, one term is
	  
	    1log21=0
	  
	       
	       
		1
		
		  2
		  1
		
	      
	      0
	    , and the others are 
	  
	    0log20
	  
	      0
	      
		2
		0
	      
	    , which we define to be zero also.  The minimum
	  value of entropy is zero.
	
 
    
     
	A four-symbol alphabet has the following probabilities.  
	  
	    Pra0=12
           
	      
	      
		a0
	      
	      
		  1
		  2
	      
             
	  
	    Pra1=14
           
	      
	      
		a1
	      
	      
		  1
		  4
	      
             
	  
	    Pra2=18
           
	      
	      
		a2
	      
	      
		  1
		  8
	      
             
	  
	    Pra3=18
           
	      
	      
		a3
	      
	      
		  1
		  8
	      
             
	Note that these probabilities sum to one as they should. As  
	
	  12=2-1
	
	    
	    
	      1
	      2
	    
	    
	      2
	    
	  ,  
	
	  log212=-1
	
	    
	    
	      2
	      
		1
		2
	      
	    
	    -1
	  . The
	entropy of this alphabet equals
	
	  
	    HA=−(12log212+14log214+18log218+18log218)=−(12-1+14-2+18-3+18-3)=1.75 bits
	  
	      
	      
		H
		A
	      
	      
		
		  
		    
		      1
		      2
		    
		    
		      2
		      
			1
			2
		      
		    
		  
		  
		    
		      1
		      4
		    
		    
		      2
		      
			1
			4
		      
		    
		  
		  
		    
		      1
		      8
		    
		    
		      2
		      
			1
			8
		      
		    
		  
		  
		    
		      1
		      8
		    
		    
		      2
		      
			1
			8
		      
		    
		  
		
	      
	      
		
		  
		    12
		    -1
		  
		  
		    14
		    -2
		  
		  
		    18
		    -3
		  
		  
		    18
		    -3
		  
		
	      
	      
                1.75
                 bits
              
	    
	
(4) 
       
   
        
"Electrical Engineering Digital Processing Systems in Braille."