Okay, so today, we're gonna start our third installment of. Ele 475 computer architecture. And this is going to be more review and we're going to finish up talking about hazards. And today we're going to be talking about control hazards. And then a little bit later, we're going to start talking about caches and why we have caches. So let's start off by looking at control hazards. And just to recap, the four different types of or it's going to be the three different types of hazards we've talked about in this class so far. We're, we're talking about structural hazards, data hazards, and now we're going to talk about control hazards. Okay, so why, or excuse me, what do exec information do we need to calculate the next program counter? Hm, well, is it the same thing for every instruction? When we go to execute an arithmetic logic instruction, an add instruction, do we need the same information to calculate the next program counter? You might say well isn't there some piece of magical hardware which just calculates the next program counter? Well, yes but we need to talk about what that magical piece of hardware is. So, as you might have guessed, it's actually different for branches and jumps than it is for sort of more traditional instructions or, or arithmetic, logical instructions or everything else. So let's start off by looking at jumps. So, if we look at a jump, a jump, you need to look at the op code to make sure that it's actually a jump. You also need to look at the offset within the instruction, and you need to look at the current program counter. And you take that all together, and like oh, it's a jump. The decoder says it's a jump, the decode pipe stage of your processor says okay it's a jump, and then you can take the permanent counter, add it to the offset, and you probably need to do that either in ALU, or if you need a special adder to do it. And the pipelines we've drawn so far on our 5-stage pipe, we get a special adder just for that. And you do that offsite calculation, and then you want to go and vector your machine so the next instruction you go to execute is at the target of the jump. Now, this, gets a little more complicated when you start looking at jump, register. So jump register, you don't know where you're going until you go to code the instruction, fetch the information from the register file, to me at least you don't need to, do some conditional calculation, but you will down here. But we're just gonna have to look at the op code to know that it's a, a jump register. We don't need to look at any offset, because we are jumping directly to a register value in something like MIPS. In other instruction sets, you might need to look at other, other sorts of, information. So you might have, for instance, a register indirect, jump register type of instruction. Or even a memory indirect jump sort of instruction. Conditional branches, now things start getting a little more complicated. We need to look at the op code, we need to look at the current program counter, we need to go look at that register, which is gonna give us the condition. So we're branching based on whether some value is, let's say greater than or less than zero. Hm, okay. So we need to go look at that, but we don't know that until quite a bit down, farther down the pipe, and we also need to take the offset and add it to the program counter when we do a PC relative conditional branch. That's how it's defined in MIPS. In other instruction set you can have different types of conditional branches, either a absolute addressing scheme or something where it might be register, register indirect, or, or some, some other thing like that. But, for MIPS, we're just going to take program counter, add it to our offset. And branch that if the register is, that we, the register or the condition is what we were looking for. If not, you wanna just follow through, and go to PC+4 or the next instruction, if you, if your instruction is four bytes long. Hm. Okay, everything else. Believe it or not, we do need to actually think about this case. It's not some magical piece of hardware, we're gonna discuss this magical piece of hardware today. You need to take the op code and you need to take the PC, and you need to add some constant to it to compute that. You know, you want to fall through to the next, next instruction. So, you know, while we're looking at this we, we might have to look at the program counter, but we also have to look at information which comes at different stages in the pipeline. The op code doesn't get decoded probably until something like the decode stage. Registers don't get fetched, let's say, until the instruction, register fetch or decode stage. And, for condition, you may even need to do some comparison of zero or comparison of another register. So you need to do some math or, or run it through your ALU in your execution stage. Something like jump register is similar there. You're not gonna know the destination until maybe, once you've gone into the either execute stage or possibly way, way at the end of the decode stage. So let's take a look at a basic control hazard and the basic control hazard is we wanna execute instructions and we wanna fall through to the next instruction which sounds. Pretty basic, but. You would say, "Why, why is there any structural hazard there?We're not changing the control flow." So let's draw the pipeline diagram. Assuming that we've no branch delay slots in our architecture. And we'll talk, more about branch delay slots in a second. Let's draw a pipeline diagram here. So we're gonna plot time, and then we're going to step through a basic instruction sequence here, and this basic instruction sequence is actually going to start here. We're going to have instruction one and instruction two. Instruction one is taking some register and adding it to something else. We talked about there being data dependencies or data hazards. In this case, there is no data dependence and no data hazard here. You'll see that this is range register r1 and this is reading from register r2. So there's no, there's no data hazards here. We just want to look at the, the control hazard. The first instruction just goes down the pipe. Such decode, execute, memory, write back. Our five stage MIPS pipe. The second instruction it starts going down the pipe. And the first goes into the, the fetch stage. But the problem here is we actually need to stall the fetch stage, or we need to, be, because we don't know that the second instruction is the second instruction yet. We don't, for instance, we don't know that this first instruction is not a branch or a jump. So we don't know the address of the next instruction. That's kind of odd. Now why do we not know this? So, going back to, to this example here. One thing that's common through all these different cases, is they all need to decode the op code. Well, where do we do the decoding of the op code? We don't do that until the decode stage of the pipe. So we don't do that until here. And we're not able to use that information until the end of the cycle, which would be sort of here. And we would need that information to determine what's going on here. So if you had a branch, for instance here where it's not able to get around and change the program counter and change what is being indexed into the instruction memory on this cycle. So, what we're going to have to do, is we're going to have to insert a decode bubble here for this structural hazard. Now, if you sort of play this forward for more instructions, what you're going realize is this is not very efficient. Every instruction that goes down the pipe is going to hit a control hazard, and every instruction that goes down the pipe, you're going to basically going to hit this decode, decoding hazard, and every instruction now takes two cycles. So your clock per instruction for this is not gonna be very good. Let's, let's analyze that now so we can, we can draw this in the other pipeline diagram axis and see that what's happening here is we're, let's take the execute stage. We're executing instruction I1, there were no oping. Instruction I2, no op. I3 no op. And, and you compute this all out, you end up with a CPI of two. So your machine is running at strictly half the performance that you want it to run at. Well that's, that's not very good. So let's start to talk about some techniques to, mitigate the effect of control hazards. And, we're gonna actually have a whole lecture later in the course about branch prediction which is one of the main techniques, in order to, mitigate control hazards. But let's, let's move forward here and, and take a look at one of these techniques, and this technique is speculation. So what's the solution to this? So the most basic solution is we actually speculate that the next address is going to not be a branch, or the current instruction is not going to be a branch, the next address is going to be the PC plus four. So what does this look like in a pipe? Well, there's this nice adder here. We're going to take the PC, and if nothing else is happening sort of later on in the pipe we're just going to be selecting PC+4 on this control path here. So we're just going to be sort of walking down here, we are gonna be, doing, executing 96, 100, 104 and we are not actually gonna even look at the instructions, until, lets say, something more interesting happens. So, we can just speculate, that the next, next address is PC + four. So, that's, that's great, but that adds some wriggles. What happens when we have, like a, a jump here? Hm, so this jump, if we speculated PC plus four, we went and fetched instruction three here which is at address 104, but the jumps says we're supposed to go to 304, so this instruction is not even supposed to execute. So we need some mechanism to kill instructions in the pipeline, kill live instructions in the pipe. So how do we, how do we go about doing this? So let's, let's look at a brief example here. So we need some way to kill an instruction, and what we're going to do is we're going to add a multiplexer here, which will, multiplex in a no-op. And if we have a jump that gets to the decode stage of the pipe, we're going to wire back in and say, oh, that instruction that we just executed, or the instruction we just fetched, this one here, it, it's not actually supposed to go down the pipe. We should, we should kill it. So we're going to swing this mux, and right at the end of the cycle, we're gonna say, no, that's actually a no op. We're gonna insert into the pipe, and we're gonna redirect this multiplexer here to the actual jump location. And this is what I was talking about before about the extra adder here. Here's our extra adder which is computing our destination. And sometimes people try to sort of put these two things together but we're gonna take part of the instruction and we're gonna take current PC and add to that and that's gonna compute our new destination of the jump. Yeah. Sorry, so here's the control on this muck, so we just have to look to see if it's a jump, or jump and link. And we inserted no op. Otherwise, we actually take the thing coming out of construction memory. So let's look at this sort of as a things flowing down the pipe. If we have instruction one, the add at the beginning of the execute stage, instruction two here now in the decode stage and we just fetched 104 out of the PC. As we go forward one cycle, we're gonna actually take what we, took out of the instruction memory that was 104, and we're gonna, kill it, and put a new op in it's place. The jump is now entering the execute stage. The add is entering the memory stage, and we've redirected the front of the pipe here, and we're actually fetching now, the destination of the jump, or the instruction at 304. So an important question pops up here on the screen. What happens if we have a stall and a jump in the decode stage at the same time. Are there interactions here, that we should be worrying about? Hm, that's a tricky one. Well, the first question is, what is a jump? What are reasons that a jump would actually stall in the decode stage? It's not a whole lot. [laugh] In a basic pipe, probably a jump would not actually stall, in the, in the decode stage. The more complex pipes, you know, there are sometimes just stall signals that say, there's some big structural conflict later in the pipe, just stall the whole rest of the pipe. So it is possible for things to stall. One important thing is that in a, in a very simple pipe like this, if you have a [inaudible], let's say there actually is some reason this jump is stalling, and you have a, a jump in that stage, what happens sort of do we kill the instruction, do we let it go forward? Both of them are actually possible to do. More complex pipes might even think about actually allowing the jump to happen, and sort of squishing out any no ops that get inserted later on in the pipe. And let's, let's do that from a pipeline diagram perspective, cuz that might, shed a little light on this. And, instead of drawing this instruction as continuing down the pipe, we're just gonna put no ops here and dashes there. So the first instruction goes on the pipe. The jump goes on the pipe, doesn't install on anything, because we have the PC plus four speculation. There's no stall here, this add gets fetched, but doesn't ever make it into the next stage of the pipe because it gets killed. Then we have, the next add, the, the target of the jump, showing up, and we go and execute that. And if we look for the resource utilization, we can plot it the other direction, you'll see the no op moving forward in pipe stages over time.