1 00:00:00,290 --> 00:00:02,840 To make sure that Huffman's greedy algorithm is clear, 2 00:00:02,840 --> 00:00:06,350 let's go through a slightly larger, more complicated example. 3 00:00:08,930 --> 00:00:11,140 So let's work with a six character alphabet. 4 00:00:11,140 --> 00:00:14,501 Let's call the letters A, B, C, D, E, F, and let's assume 5 00:00:14,501 --> 00:00:19,655 that we're given the weights 3, 2, 6 and 8, 2, 6 for these six characters. 6 00:00:19,655 --> 00:00:24,145 Remember, this problem is well-defined even if the weights don't add up to 1. 7 00:00:24,145 --> 00:00:26,565 If you prefer working with actual probabilities, 8 00:00:26,565 --> 00:00:29,929 feel free to divide these six numbers by 27. 9 00:00:31,990 --> 00:00:34,180 In the first step of Huffman's greedy algorithm, 10 00:00:34,180 --> 00:00:38,520 we find the letters that have the smallest weights, the smallest frequencies. 11 00:00:38,520 --> 00:00:41,691 So in this example, that would be the letters B and E, 12 00:00:41,691 --> 00:00:43,390 both of them have weight 2. 13 00:00:43,390 --> 00:00:48,386 Now what we do is we merge these two letters into a single meta letter, 14 00:00:48,386 --> 00:00:54,000 in effect committing right now to having B and E be siblings in the final tree. 15 00:00:56,750 --> 00:01:01,238 After this merger, our alphabet is down to five symbols, the symbols B and 16 00:01:01,238 --> 00:01:03,744 E being replaced by the merged symbol BE. 17 00:01:03,744 --> 00:01:09,890 And the weight of BE is the sum of the weights of B and E, namely, 4. 18 00:01:09,890 --> 00:01:13,600 We can imagine our tree slowly taking shape through these iterations. 19 00:01:13,600 --> 00:01:17,575 So after step 1, we know that B and E are going to be siblings and we know that just 20 00:01:17,575 --> 00:01:21,013 A, C, D, and F are going to be leaves, that's all we know thus far. 21 00:01:23,004 --> 00:01:24,780 In the next iteration, we again look for 22 00:01:24,780 --> 00:01:27,350 the two symbols that have the smallest weight. 23 00:01:27,350 --> 00:01:29,440 And here the smallest weight symbol is A. 24 00:01:29,440 --> 00:01:30,768 It has weight 3. 25 00:01:30,768 --> 00:01:33,405 And the runner-up is the merged symbol B sub E. 26 00:01:33,405 --> 00:01:37,960 Its combined weight is 4, and that's second overall for these five symbols. 27 00:01:37,960 --> 00:01:40,600 So in this step, we merge A with B E. 28 00:01:43,590 --> 00:01:48,020 Now our alphabet is down to four symbols, the merged symbol, A, B, E, 29 00:01:48,020 --> 00:01:52,888 which has cumulative weight 7, and then the original symbols, C, D, and F, 30 00:01:52,888 --> 00:01:56,262 which have their original weights, 6, 8, and 6. 31 00:01:56,262 --> 00:02:00,761 As far as our tree, we've now committed to the symbol A appearing as an uncle 32 00:02:00,761 --> 00:02:02,480 of the siblings B and E. 33 00:02:02,480 --> 00:02:03,200 And again, C, D, 34 00:02:03,200 --> 00:02:05,729 and F, we just know they're leaves somewhere in the final tree. 35 00:02:08,198 --> 00:02:08,740 In step three, 36 00:02:08,740 --> 00:02:12,290 we're going to again pick the two symbols that have the smallest weights. 37 00:02:12,290 --> 00:02:17,394 In this case, the two symbols with the smallest weight are C and 38 00:02:17,394 --> 00:02:19,235 F, each of weight 6. 39 00:02:19,235 --> 00:02:24,420 In our new alphabet, we still have our symbol ABE, it still has weight 7. 40 00:02:24,420 --> 00:02:27,190 We still have the symbol D, it still has weight 8. 41 00:02:27,190 --> 00:02:32,920 But now we have a new merged symbol CF, and its new weight is 12. 42 00:02:32,920 --> 00:02:37,990 As far as our tree, in addition to the information we already knew, 43 00:02:37,990 --> 00:02:41,640 we're now committing to having C and F be siblings in the final tree. 44 00:02:44,390 --> 00:02:47,100 In step four, we merge the two symbols with the smallest weight. 45 00:02:47,100 --> 00:02:51,140 So that would be ABE with its weight of seven and D with its weight of eight. 46 00:02:53,370 --> 00:02:57,756 So this leaves us with only two symbols, ABDE and CF. 47 00:02:57,756 --> 00:03:01,436 And now we know what both of these subtrees of the root of the final tree 48 00:03:01,436 --> 00:03:02,710 are going to look like. 49 00:03:05,770 --> 00:03:07,230 Now that we're down to two symbols, 50 00:03:07,230 --> 00:03:10,980 the only thing we can do is fuse these two symbols into one, 51 00:03:10,980 --> 00:03:15,390 fuse these two subtrees into a single one by uniting them under a common root. 52 00:03:15,390 --> 00:03:19,770 That gives us the following final output of Huffman's algorithm. 53 00:03:22,080 --> 00:03:24,770 What prefix-free code does this tree correspond to? 54 00:03:24,770 --> 00:03:28,690 Well, as usual, let's label all of the left branches with zero and 55 00:03:28,690 --> 00:03:30,280 all of the right branches with ones. 56 00:03:33,090 --> 00:03:37,270 And now, as usual, the encoding of a character is just the symbols of zeros and 57 00:03:37,270 --> 00:03:41,860 ones that you see when you traverse a path from the root to that leaf. 58 00:03:41,860 --> 00:03:47,260 So for example, A will be encoded with 000, 59 00:03:47,260 --> 00:03:52,255 B with 0010, C with 10, D with 01, 60 00:03:52,255 --> 00:03:56,177 E with 0011, and F with 11.