1 00:00:03,820 --> 00:00:09,017 As a warm up we're going to look at run length encoding, which is actually an 2 00:00:09,017 --> 00:00:14,024 effective method in many applications. Simple method that's very effective. It's 3 00:00:14,024 --> 00:00:19,285 based on the idea that it's very often the case in a bit string that you have long 4 00:00:19,285 --> 00:00:24,627 runs of repeated bits. So in this case, there's a long run of zeros followed by a 5 00:00:24,627 --> 00:00:30,073 medium length run of ones, another medium length run of zeros, and then more ones. 6 00:00:30,073 --> 00:00:35,654 So there's 40 bits but only switches from zero to one in three places. And so what 7 00:00:35,654 --> 00:00:41,598 you can do is rather than write out all the bits. We can write counts to represent 8 00:00:41,598 --> 00:00:47,367 alternating runs of zeroes and ones. A very simple method. So in this case, 9 00:00:47,367 --> 00:00:53,689 there's fifteen zeroes, seven ones, seven zeroes and eleven ones. And, with four 10 00:00:53,689 --> 00:00:59,671 bits to represent each one of these counts, we can just write fifteen, seven, 11 00:00:59,671 --> 00:01:05,639 seven, and eleven to get, instead of 40 bits, get down to sixteen bits. That's 12 00:01:05,639 --> 00:01:12,467 effective, whether there's long runs of zeroes and ones, in a bitstream. now, you 13 00:01:12,467 --> 00:01:19,132 have to decide how many bits to use to store the counts. it's not necessarily a 14 00:01:19,132 --> 00:01:25,452 good idea to use, say, 32 bits. and maybe four is too small. So, in our code, we use 15 00:01:25,452 --> 00:01:30,696 eight bits. So that'll handle runs up to 256 bits. we used four in the example 16 00:01:30,696 --> 00:01:36,202 above, but in a realistic thing, it's fine to use eight. and then if we have longer 17 00:01:36,202 --> 00:01:41,840 runs, then we have to figure out what to do. if the run length is bigger than the 18 00:01:41,840 --> 00:01:47,084 mass count, well, we can just intersperse runs of length zero. And there's one way 19 00:01:47,084 --> 00:01:52,460 to handle it, there's other ways to handle it too. But this is a very simple scheme 20 00:01:52,656 --> 00:01:58,904 that covers all the bases and can be a very effective. and . For example, 21 00:01:58,904 --> 00:02:06,091 consider a bit map representation of this slide. There's huge long runs, saved with 22 00:02:06,469 --> 00:02:14,129 black and white. There's huge long runs of white that, depending on the resolution, 23 00:02:14,129 --> 00:02:22,055 might be hundreds or 1000 bits, but could be represented with just a few counts. and 24 00:02:22,055 --> 00:02:27,259 so in many applications of bit maps of texts and other things like that, this is 25 00:02:27,455 --> 00:02:32,854 very effective and it's used in all kinds of technologies like Jpeg and fax and 26 00:02:32,854 --> 00:02:38,107 others. and it's very simple t o implement. So this is our warmup data 27 00:02:38,107 --> 00:02:44,478 compression algorithm that implements run-length encoding. Actually we left the 28 00:02:44,478 --> 00:02:50,700 compress for the book. this is just the expand. so I'm given a bunch of counts. 29 00:02:51,050 --> 00:03:01,569 how do I reproduce the original uncompressed text string into . It's as 30 00:03:01,569 --> 00:03:10,455 simple as that. So log R is the number of bits per count. and so basically what we 31 00:03:10,455 --> 00:03:18,775 do is read log R bits at a time into an end, whatever the value is, so we put that 32 00:03:18,775 --> 00:03:24,406 into the end run. So, that's a number between zero and 256, which is the maximum 33 00:03:24,406 --> 00:03:30,326 you can get in eight bits. and then, starting with zero the first count. That's 34 00:03:30,326 --> 00:03:36,245 the number of zeroes we need to write, so we just write them one, one bit at a time. 35 00:03:36,245 --> 00:03:42,467 Zero at one bit at a time, and then we flip the bit to make it one. and read the 36 00:03:42,467 --> 00:03:48,947 next count, and now we, we write out that many 1s and so forth. So this is a ten 37 00:03:48,947 --> 00:03:55,912 line program that does, expansion for, run length and coding. and you can, think 38 00:03:55,912 --> 00:04:03,218 about it or look at the book, for, how to do, compression. it's just as simple. so 39 00:04:03,468 --> 00:04:09,788 this is just the, an example of the effectiveness of, learning some coding 40 00:04:09,788 --> 00:04:16,690 for, one letter, the letter Q, in a, in a typical black and white scheme doing. Even 41 00:04:16,690 --> 00:04:24,660 for a single letter, there's, lots of, redundancy, lots of runs of 0s. so with a 42 00:04:24,660 --> 00:04:32,274 relatively small, number of counts. we can, represent a, a bitmap. and this is 43 00:04:32,274 --> 00:04:39,887 the hard case. actually, most of a printed page is all this blank space, as I said. 44 00:04:40,153 --> 00:04:48,466 so, Typically. If you just don't compress at all and you have an eight and a half by 45 00:04:48,466 --> 00:04:54,484 eleven piece of paper with three hundred pixels per inch, that'd be eight million 46 00:04:54,484 --> 00:05:00,208 bits. But most of those are white, and typically with run length and coding, you 47 00:05:00,208 --> 00:05:05,785 can get substantial savings simply by basically counting the white bits. And 48 00:05:05,785 --> 00:05:11,362 even when there's letters involved, you can get, say maybe there's only three 49 00:05:11,362 --> 00:05:17,411 thousand characters. That's another example. If it's all text then maybe the 50 00:05:17,411 --> 00:05:25,325 text itself is a great way to compress it. or the program or the document processor 51 00:05:25,325 --> 00:05:31,405 that produced the text, but now we're starting to get into undecidabi lity 52 00:05:31,406 --> 00:05:38,210 issues. So lets think more in terms of a, a page that has a, an image, a drawing, 53 00:05:38,210 --> 00:05:44,247 and some text and so forth. so it makes sense to start with a bitmap and then use 54 00:05:44,247 --> 00:05:48,809 as few bits a possible. And then run length encoding's going to be very 55 00:05:48,809 --> 00:05:52,700 effective. that's our warm up case for data compression.