As a warm up we're going to look at run
length encoding, which is actually an
effective method in many applications.
Simple method that's very effective. It's
based on the idea that it's very often the
case in a bit string that you have long
runs of repeated bits. So in this case,
there's a long run of zeros followed by a
medium length run of ones, another medium
length run of zeros, and then more ones.
So there's 40 bits but only switches from
zero to one in three places. And so what
you can do is rather than write out all
the bits. We can write counts to represent
alternating runs of zeroes and ones. A
very simple method. So in this case,
there's fifteen zeroes, seven ones, seven
zeroes and eleven ones. And, with four
bits to represent each one of these
counts, we can just write fifteen, seven,
seven, and eleven to get, instead of 40
bits, get down to sixteen bits. That's
effective, whether there's long runs of
zeroes and ones, in a bitstream. now, you
have to decide how many bits to use to
store the counts. it's not necessarily a
good idea to use, say, 32 bits. and maybe
four is too small. So, in our code, we use
eight bits. So that'll handle runs up to
256 bits. we used four in the example
above, but in a realistic thing, it's fine
to use eight. and then if we have longer
runs, then we have to figure out what to
do. if the run length is bigger than the
mass count, well, we can just intersperse
runs of length zero. And there's one way
to handle it, there's other ways to handle
it too. But this is a very simple scheme
that covers all the bases and can be a
very effective. and . For example,
consider a bit map representation of this
slide. There's huge long runs, saved with
black and white. There's huge long runs of
white that, depending on the resolution,
might be hundreds or 1000 bits, but could
be represented with just a few counts. and
so in many applications of bit maps of
texts and other things like that, this is
very effective and it's used in all kinds
of technologies like Jpeg and fax and
others. and it's very simple t o
implement. So this is our warmup data
compression algorithm that implements
run-length encoding. Actually we left the
compress for the book. this is just the
expand. so I'm given a bunch of counts.
how do I reproduce the original
uncompressed text string into . It's as
simple as that. So log R is the number of
bits per count. and so basically what we
do is read log R bits at a time into an
end, whatever the value is, so we put that
into the end run. So, that's a number
between zero and 256, which is the maximum
you can get in eight bits. and then,
starting with zero the first count. That's
the number of zeroes we need to write, so
we just write them one, one bit at a time.
Zero at one bit at a time, and then we
flip the bit to make it one. and read the
next count, and now we, we write out that
many 1s and so forth. So this is a ten
line program that does, expansion for, run
length and coding. and you can, think
about it or look at the book, for, how to
do, compression. it's just as simple. so
this is just the, an example of the
effectiveness of, learning some coding
for, one letter, the letter Q, in a, in a
typical black and white scheme doing. Even
for a single letter, there's, lots of,
redundancy, lots of runs of 0s. so with a
relatively small, number of counts. we
can, represent a, a bitmap. and this is
the hard case. actually, most of a printed
page is all this blank space, as I said.
so, Typically. If you just don't compress
at all and you have an eight and a half by
eleven piece of paper with three hundred
pixels per inch, that'd be eight million
bits. But most of those are white, and
typically with run length and coding, you
can get substantial savings simply by
basically counting the white bits. And
even when there's letters involved, you
can get, say maybe there's only three
thousand characters. That's another
example. If it's all text then maybe the
text itself is a great way to compress it.
or the program or the document processor
that produced the text, but now we're
starting to get into undecidabi lity
issues. So lets think more in terms of a,
a page that has a, an image, a drawing,
and some text and so forth. so it makes
sense to start with a bitmap and then use
as few bits a possible. And then run
length encoding's going to be very
effective. that's our warm up case for
data compression.