This video we're going to talk about Error Correcting Codes. How we can get around some of the errors that channels may introduce. So we're going to talk about one, as very simplest one called a repetition code. And then we're going to go on to a, much more interesting and much more useful code actually classic codes called Hamming codes. Let's go through and see what I mean by error correction. So, here again is the digital communication model, that we've had before, but there is a new thing it has been stuck in here. So, we've already talked about source coding. And how to do compression and represent the source letters efficiently with bits. Now what's going to happen the channel coder, is going to be placed here. So that it converts the, what we going to call data bits into more bits. They have a very special structure, so that by the time these bits arrive and goes out of the channel where errors can be introduced, the channel decoder can repair those errors. And hopefully produce something that's the same as the data bits produced by the source coder. So, this may seem like magic. How can the, we compensate for errors that the channel introduces after they've occurred? And that's the it's actually pretty straightforward to see how you can do that. But generating efficient error correcting codes is somewhat tricky. So, can we mitigate channel induced errors? Absolutely, and these lead to the error codes which are commonly know as ECC. So, the simplest error correcting code is the repetition code. It's pretty easy. So, you take for each bit reduced out of the source coder, you repeat it. In this case, 3 times. So, a 0 data bit becomes 3 data bits in what we are going to call the codeword. And if the data bit is a 1, we produce 3 bits that are 111. We just repeat them, that's why it's called a repetition code. So this actually helps and in more detail about what's going on. Is that you have this little drawing here. We have a sequence of data bits that looks like I've got here a 1, 0 and a 1. And if we went through our tran, digital transmitter and worried about a BPSK baseband signal set. That might produce something that looks like that. Well now if you take that same sequence and go through the channel coder, it's going to repeat the bits. So now we have 111, 000,11 and 1. One of the ones got chopped off. And when that goes through the transmitter, we now have individual bits. And the correlation receiver knows those are individual bits. And is going to try to figure out what each individual bit is. So now there is a pe for, for every one of those new bits. All right. So this creates what is called an n1 code. So we're repeating the, each source bit N times. So for every data bit that comes in, we're producing N bits, so that's what the N comma 1 means. One bit in, N capital N bits, out in the code. Now the other thing that's important here, will be important in applications is this, what has, what's known as a code rate, which is the same as Re-expression of this. It says that for every bit in, I have to send capital N bits, so it has in the case of our repetition code it has a rate of a third which we going to see is a little on the low side. But, this code can correct. All single-bit errors. And let me show you how that works. And what we're going to do is code by majority vote. So here's the idea. And this is why the number of bits N in a repetition code is odd, because we're going to vote. So, let's say that the source bit, the data bit was a zero. We're sending 3 bits so there are 8 possible received codewords that could come out of the channel. All the bits could come out correctly and no changes and so we say. It's a 0, because we're going to raise our hands and vote. All say 0, all say 1. Pick the one with the most votes. So here's a case where one bit got flipped. And what it's going to do is still majority 0, so it corrects the error. So this is how you can correct errors after they have occurred. And that's because you just repeated the same thing several times. If you go down this list, you see there are times it's possible that 2 bits occurred in there and if the error correcting code is not going to be able to correct it. So, that is why this is a single-bit error correcting code. It can correct 1 bit an error but not 2 or 3. So that turns out to be an accomplishment. So for all the 8 possibilities of received codewords, we can correct one, two, three, four of those codewords and, and understand and then say that a 0 was actually send and we get it wrong four times. And same thing is going to happen when a 1 is sent, we won't be able to correct the error for 4 of the codewords and be wrong for the other four. So, what's the probability now that the data bit is decoded wrong? So that's going to be the sum of the probabilities for the cases in which an error still occurs. And that's just how it goes up and you get that word expression. So the question is, do we win? Is this probability what's called a decoding error? We don't get the original data bit back. Is that probability bigger or smaller than pe. In other words the situation in which we sent only, only one bit which is what we would do if we had no error correction. It had better be smaller. Then it turns out it is. You can show that if pe is less then a half, that this quantity is always less than pe and sometimes much, much less. Pe is like a 1000th, 10 to the minus 3 notice this square over here, and that's going to be probably the dominant term and so, and a 1000th this probability here is roughly 3 times 10 the minus 6. So we can really reduce probability of error by adding error correcting bits. So lets over where the repetition code is, its a very simple code, not just repeat the bits, very easy to do but it has some downsides. One, it has a very low coding rate, 1 over n, and we're going to discover in just a second an error correcting code that has a much higher rate and still corrects all single-bit errors. Note that as N increases the probability of just getting 1 bit in error actually becomes smaller in the probability of getting 2 bits in error or 3 bits in error. And that's just because you keep flipping coins a lap you going to not just wind up with just 1 bit in error. That's all the repetition code can do, just correct 1 bit in there. So we definitely need more powerful codes. Well let's talk about one. Well first we need to know about what are called block codes. And this is a generalization of the repetition code. Repetition, repetition code is a special case of a block code. So here, what we're going to do is take in a block of K source bits at once and what we're going to put through the channel are N bits derived from those K. So, in the case of the repetition code we took in 1 bit and put out N. More generally we're going to try to take in more than 1 bit at a time. Input out again. And that's an NK code. And that's why I wrote before that the repetition code was an N1 code. Special case. And here's an example of a 7, 4 code, which, as you might expect, has some pretty good properties. We're going to see that as we go through. So, here's what it does. It's trying to put out a codeword of 7 bits. Here's a list of the bits and here's a, a mathematical description of how they are related to the data bits. The first 4 bits simply repeat the data bits, so they're exactly the same. And then, for bits 5, 6, and 7, they are given by, it turns out, sums of the data bits, of particular data bits. Now what that o, o with a circle around it, plus with the circle around it, means, it means exclusive or in, it's, it's usually addition and binary arithmetic when there's no carats. So you, 0, exclusive or 0 is 0. 1 exclusive or 1 is 0. It's only when the 2 bits disagree, do you get a 1. And so, you may have 3 bits here. They get "added together" but the result is only going to be 1 bit. And so, no matter what the data bits are, you just apply these formulas. And that derives the 3 bits we get you might say, extra. We have 4 data bits coming through as they are. And we derive three more bits from those. Turns out these are good bits to derive, it turns out. So we gotta figure out the, how do we, can we think, how can we prove that the code I just wrote, that 7, 4 code, actually can correct errors. So we need to go and, and figure out a little bit more about error correcting codes and how to think about them. And the key to that is something called the Hamming Distance. So Hamming Richard Hamming was a Bell Labs engineer, and developed a whole sequence of things. Hamming Distance, and it turns out that 7, 4 code is a special case of the Hamming code family. So he worked on this a lot. So, we're in to find the distance between two codewords as being the sum, and I really mean sum here, of c1 exclusive over c2 so you're going to get N bits out of this. The sum, what that's going to be, is the, it means how many ones are there in that result. So you exclusive or these two things together, and then you tell me how many 1s there are. That's what sum means. Sum means think about them as a real, as actual 1 and 0, not binary numbers. So let's take an example. Suppose I have two codewords and I just pulled these out of the air. I want you to note that they differ in those 2 bits. Okay? So I claim these two codewords are a distance to a part, they differ in 2 bits. Now, mathematically we can see that if I add these two together and you can test this for yourself, and the result is this. And remember the exclusive or gives you a one only if the bits disagree. That's the key. And of course the number of bits you have in there is 2. And so that's, we know the distance between these codewords is 2. Now, I want point out something that's going to be useful later on we have this relationship here. If I take this and add it c1, I will give c2 and I will also find c1 is equal to c2 plus this binary sequence. This is going to be useful later on. Okay. Let's see why this is important. So distance is a the Hamming distance, it says how far apart 2 codewords are and how many bits have to be changed for 1 to equal to the other. So, how far apart do two codewords have to be so that in spite of the fact that there's a single-bit error, I actually can figure out what the original codeword was? How far do they have to be apart? And another way of saying that is, when can a single-bit error in one codeword not be confused as a single-bit error in another codeword? That would be a disaster. If we couldn't figure this out, and if a single-bit error occurred, we wouldn't know what the original codeword was, which means we wouldn't know the original data sequence. So they have to be far enough apart that 1, an error in 1 codeword is not confused with another codeword. So I have a little geometric picture to show you, so we can figure this out. So, sort of abstractly, we have here two codewords. We have c1 and c2. And of course, they're different. Now suppose I have a single-bit error in c1. So I'm using e1 to represent some particular single-bit error, which means this e sequence has a bunch of 0s with a 1 in one position. That's what the error decision names, and that gives, once I add these two together it gives me, let's say that. Well, suppose I had, it was possible that c2, if I add to it another single-bit error sequence, give me at the same, same place. So these sums the number of 1s in e1 and e2 is each one. How far apart are c1 and c2? Well, because I can get to c2 by taking c1. Adding e1 and then if I add in e2, that's going to get me over here. That means because these two each have a bit of bond that when I exclusive or them together I get 2 bits and the distance between them is 2. This would be a disaster in terms of pair correction. This is not a codeword, in general and it's 1 bit away from c1, it's 1 bit away from c2. I can't tell if c1 or c2 is the actual codeword. I'm in big trouble. This kind of code, if it does not have a adding distance of greater than 2, I cannot correct single-bit errors. So, what distance is good enough? I think it's pretty clear that we need to have a minimum distance between any possible pair of codewords that's greater than 3. If it's greater than 3 that's fine. But 3s the threshold. And the idea is that if we have one codeword here, and another codeword there, and an error takes us up here. Well an error might take this up here. And since these, let's say are at least the distance one apart. At least, then they won't be confused, and I can get back to the original codeword and correct that single bit error. So, this is an important criterion for having a single-bit error correcting code. And guess what? The repetition code satisfies that property. So there's only 2 codewords, because it only takes in one bit at a time. And so, we add together those codewords. We get 111 and the sum of those means there's three 1s. So sure enough, this satisfies our criterion for a single-bit error correcting code. Well, as it stands, I would have to find the distance between all possible pairs and an error correcting code and see what their distance is between them. Well it turns out, because of the properties of block codes, there's a very efficient way of doing it, which actually provides some insight in what's going on. But to do so I'm going to have to re-express these codes in terms of matrix notation. So we're going to take a little notational diversion if you will. So, we're going to say that the generation of a codeword from a data block, a bit, a block of data bits, is given by this matrix equation c equals Gb. And G here stands for generator. This is called the generator matrix. And in the case of the repetition code there's only 1 bit, cause it's an n1 code, there's only 1 bit taken in at a time its, it, it puts out 3 bits and the G that looks like that means simply repeat. If you think about it for a second what does this mean for the case of something like this when you have 1. And it means, just simply give it back. Well, lets look at our 7, 4 code which is a bit more complicated. So here's that code that I wrote before, same thing. And so now it's taking in a block of 4 data bits, putting out a block of 7 coded data bit, coded bits. And what G do I need to express this? And I claim it's pretty easy to do. So because this code just repeats the original data bits, the upper part of this matrix is the identity matrix. I, capital I, because that means just repeat the bits. The hard part, if you will, is down here but that's pretty easy to do. You simply put a 1 in the generator matrix for the bits you want in the position of the bits you want to select in your sum. So I've got three 1's and then a 0 that means data bits 1, 2 and 3. And the next one says, take data bits 2, 3 and 4. Sure enough that's what it is. Take data bits 1, 2 and 4, and that's what that is. This is a very concise way, this generator matrix is a very concise way of writing down what the code is. How it operates on the data bits. It's not clear you actually implement it this way, but it's a nice mathematical way of doing it as you can see in a second. And the reason it's important is because this code has really cool properties. In fact it's linear. Really important here. So, because this is just a matrix multiplication, this code is linear. Suppose I add together two data bits, sequences. Well that's gotta be some other databit sequence. Who knows what it is, but it's something. Well, you can know this is exclusive or still obeys the distribution laws. This is Gb1, exclusive or Gb2. Well by definition, that's c1 and that's c2. So, it obeys the superposition principle. And that has all kinds of interesting consequences. Because, since b1 plus b2 is some other data bit sequence. In our 7, 4 code these are each 7, 4 bit entries. We exclusive or those 4 bit blocks together we get some other bit block. Well that has to be some other data bit sequences. That turns out to be c3 which means it's a codeword. So adding together two codewords results in another codeword. Well, that means, that the distance between any two codewords has got to be the same as the number of bits in another codeword. So, we only need to examine individual codewords to figure out the code's error correcting capability. We just need to examine how many 1s are there in every codeword and that tells us what the Hamming distances are between all possible pairs. So, that's why I wanted to go into matrix notation to prove this linearity property. It's really important. So there's my generator matrix for my 7, 4 code I want to point out something. Suppose I have a 1 bit sequence, data bit sequence. What's the resulting codeword for that? What you should find is that the resulting codeword for that is the first column of G. If you have another simple bit data block with the 1 in the second position, that corresponds to the second column etc. So staring me in the face, in the columns of G are 4 codewords. So all I have to do is count up the number of bits in each column and sure enough, I'm a happy guy. They're all greater than 3, well that's four of the codewords. Next thing I need to do is add together two columns to create more codewords. Well, I want you to note because of the construction of this code you know that the upper part is going to have 2 bits. As long as every row, every pair here differs in at least 1 bit. I know that it's going to give me one down here, I've got two up here, so I have at least three. And if you look at the way this bottom part was constructed, you look at all pairs they differ in the bottom part in at least one place. So, that means all pairs have 3 bits have the number of bits of 3. And now if you add three of them together you can see from the top you're going to get a 1 in all cases. So that gives me a link to 3. It doesn't matter what happens down below. So this is a single-bit error correcting code. And that's because all the codewords have at least three 1s. The fact that one of them has 4 really doesn't help. It's the minimum of all these distances that matters for being able to have that single-bit error correcting code. So, and I want to point out as opposed to the repetition code, it's efficiency is much higher. Repetition code that we talked about at a, a code rate of a third, low, 1 bit in 3 bits out. Now it's 4 bits in 7 bits out. Much higher efficiency. And it too can correct single-bit errors just as well as the repetition code. So, error-correcting codes. Even though the channel introduces errors, the error correcting code can repair some of the errors. If more than one bit occur in error our single bit error correcting codes can't correct them. But at least all single bit error errors can be corrected. And I showed you with the repetition code that lowers the probability of the received bits being an error. This is very different than in analog communication. In that case, once the signal comes out of the channel, there's nothing you can do other than remove out of bound noise, but you cannot remove the out of bound noise by linear filtering you are stuck with. You can't do anything with it. Here, we're removing fixing errors and lowering the probability. We're essentially making the channel less noisy by adding error correcting bits. Now we must sent bits at a higher rate than required by the source. But efficient codes, i.e. Ones having a, a rate that's close to one are fairly easily designed. And we'll see that in a succeeding video. However, do want to point out that error correcting code have limited capabilities. We can't fix double bit errors, in fact for the case of repetition code and the Hamming code that I've showed you that if a double bit error occurs, we're going to think a single bit error occurred and we're going to correct it badly. We're going to make at any mistake, and so longer codewords double-bit errors occur more frequently than single-bit errors and that can be a problem. So in the next video I'm going to talk about how you actually do the error correction for a Hamming code.