1 00:00:03,085 --> 00:00:09,005 Okay, so today, we're gonna start our third installment of. 2 00:00:09,025 --> 00:00:14,030 Ele 475 computer architecture. And this is going to be more review and 3 00:00:14,030 --> 00:00:17,055 we're going to finish up talking about hazards. 4 00:00:17,055 --> 00:00:21,056 And today we're going to be talking about control hazards. 5 00:00:21,056 --> 00:00:27,016 And then a little bit later, we're going to start talking about caches and why we 6 00:00:27,016 --> 00:00:30,082 have caches. So let's start off by looking at control 7 00:00:30,082 --> 00:00:33,093 hazards. And just to recap, the four different 8 00:00:33,093 --> 00:00:38,027 types of or it's going to be the three different types of hazards we've talked 9 00:00:38,027 --> 00:00:41,049 about in this class so far. We're, we're talking about structural 10 00:00:41,049 --> 00:00:44,089 hazards, data hazards, and now we're going to talk about control hazards. 11 00:00:45,067 --> 00:00:53,059 Okay, so why, or excuse me, what do exec information do we need to calculate the 12 00:00:53,059 --> 00:01:00,036 next program counter? Hm, well, is it the same thing for every 13 00:01:00,036 --> 00:01:04,030 instruction? When we go to execute an arithmetic logic 14 00:01:04,030 --> 00:01:07,796 instruction, an add instruction, do we need the same information to calculate the 15 00:01:07,796 --> 00:01:11,017 next program counter? You might say well isn't there some piece 16 00:01:11,017 --> 00:01:14,045 of magical hardware which just calculates the next program counter? 17 00:01:14,045 --> 00:01:18,009 Well, yes but we need to talk about what that magical piece of hardware is. 18 00:01:18,009 --> 00:01:23,290 So, as you might have guessed, it's actually different for branches and jumps 19 00:01:23,290 --> 00:01:28,752 than it is for sort of more traditional instructions or, or arithmetic, logical 20 00:01:28,752 --> 00:01:33,540 instructions or everything else. So let's start off by looking at jumps. 21 00:01:33,540 --> 00:01:38,697 So, if we look at a jump, a jump, you need to look at the op code to make sure that 22 00:01:38,697 --> 00:01:42,248 it's actually a jump. You also need to look at the offset within 23 00:01:42,248 --> 00:01:46,075 the instruction, and you need to look at the current program counter. 24 00:01:46,075 --> 00:01:50,006 And you take that all together, and like oh, it's a jump. 25 00:01:50,006 --> 00:01:54,064 The decoder says it's a jump, the decode pipe stage of your processor says okay 26 00:01:54,064 --> 00:01:59,011 it's a jump, and then you can take the permanent counter, add it to the offset, 27 00:01:59,011 --> 00:02:03,099 and you probably need to do that either in ALU, or if you need a special adder to do 28 00:02:03,099 --> 00:02:06,048 it. And the pipelines we've drawn so far on 29 00:02:06,048 --> 00:02:09,062 our 5-stage pipe, we get a special adder just for that. 30 00:02:10,010 --> 00:02:14,067 And you do that offsite calculation, and then you want to go and vector your 31 00:02:14,067 --> 00:02:19,043 machine so the next instruction you go to execute is at the target of the jump. 32 00:02:19,043 --> 00:02:24,046 Now, this, gets a little more complicated when you start looking at jump, register. 33 00:02:24,046 --> 00:02:29,425 So jump register, you don't know where you're going until you go to code the 34 00:02:29,425 --> 00:02:35,134 instruction, fetch the information from the register file, to me at least you 35 00:02:35,134 --> 00:02:39,865 don't need to, do some conditional calculation, but you will down here. 36 00:02:39,865 --> 00:02:44,402 But we're just gonna have to look at the op code to know that it's a, a jump 37 00:02:44,402 --> 00:02:46,992 register. We don't need to look at any offset, 38 00:02:46,992 --> 00:02:51,423 because we are jumping directly to a register value in something like MIPS. 39 00:02:51,423 --> 00:02:55,511 In other instruction sets, you might need to look at other, other sorts of, 40 00:02:55,511 --> 00:02:58,309 information. So you might have, for instance, a 41 00:02:58,309 --> 00:03:01,967 register indirect, jump register type of instruction. 42 00:03:01,967 --> 00:03:05,786 Or even a memory indirect jump sort of instruction. 43 00:03:05,786 --> 00:03:10,033 Conditional branches, now things start getting a little more complicated. 44 00:03:10,033 --> 00:03:14,067 We need to look at the op code, we need to look at the current program counter, we 45 00:03:14,067 --> 00:03:18,042 need to go look at that register, which is gonna give us the condition. 46 00:03:18,042 --> 00:03:22,092 So we're branching based on whether some value is, let's say greater than or less 47 00:03:22,092 --> 00:03:23,088 than zero. Hm, okay. 48 00:03:23,088 --> 00:03:27,453 So we need to go look at that, but we don't know that until quite a bit down, 49 00:03:27,453 --> 00:03:33,507 farther down the pipe, and we also need to take the offset and add it to the program 50 00:03:33,507 --> 00:03:37,011 counter when we do a PC relative conditional branch. 51 00:03:37,030 --> 00:03:41,072 That's how it's defined in MIPS. In other instruction set you can have 52 00:03:41,072 --> 00:03:46,337 different types of conditional branches, either a absolute addressing scheme or 53 00:03:46,337 --> 00:03:52,449 something where it might be register, register indirect, or, or some, some other 54 00:03:52,449 --> 00:03:56,380 thing like that. But, for MIPS, we're just going to take 55 00:03:56,380 --> 00:04:01,790 program counter, add it to our offset. And branch that if the register is, that 56 00:04:01,790 --> 00:04:06,219 we, the register or the condition is what we were looking for. 57 00:04:06,219 --> 00:04:11,784 If not, you wanna just follow through, and go to PC+4 or the next instruction, if 58 00:04:11,784 --> 00:04:14,696 you, if your instruction is four bytes long. 59 00:04:14,696 --> 00:04:16,420 Hm. Okay, everything else. 60 00:04:16,420 --> 00:04:20,452 Believe it or not, we do need to actually think about this case. 61 00:04:20,452 --> 00:04:25,249 It's not some magical piece of hardware, we're gonna discuss this magical piece of 62 00:04:25,249 --> 00:04:28,599 hardware today. You need to take the op code and you need 63 00:04:28,599 --> 00:04:32,770 to take the PC, and you need to add some constant to it to compute that. 64 00:04:32,770 --> 00:04:36,480 You know, you want to fall through to the next, next instruction. 65 00:04:36,480 --> 00:04:41,171 So, you know, while we're looking at this we, we might have to look at the program 66 00:04:41,171 --> 00:04:45,525 counter, but we also have to look at information which comes at different 67 00:04:45,525 --> 00:04:51,092 stages in the pipeline. The op code doesn't get decoded probably 68 00:04:51,092 --> 00:04:56,465 until something like the decode stage. Registers don't get fetched, let's say, 69 00:04:56,465 --> 00:04:59,816 until the instruction, register fetch or decode stage. 70 00:04:59,816 --> 00:05:04,787 And, for condition, you may even need to do some comparison of zero or comparison 71 00:05:04,787 --> 00:05:09,068 of another register. So you need to do some math or, or run it 72 00:05:09,068 --> 00:05:14,696 through your ALU in your execution stage. Something like jump register is similar 73 00:05:14,696 --> 00:05:18,046 there. You're not gonna know the destination 74 00:05:18,046 --> 00:05:23,325 until maybe, once you've gone into the either execute stage or possibly way, way 75 00:05:23,325 --> 00:05:30,434 at the end of the decode stage. So let's take a look at a basic control 76 00:05:30,434 --> 00:05:37,383 hazard and the basic control hazard is we wanna execute instructions and we wanna 77 00:05:37,383 --> 00:05:41,588 fall through to the next instruction which sounds. 78 00:05:41,588 --> 00:05:44,969 Pretty basic, but. You would say, "Why, why is there any 79 00:05:44,969 --> 00:05:49,088 structural hazard there?We're not changing the control flow." So let's draw the 80 00:05:49,088 --> 00:05:52,799 pipeline diagram. Assuming that we've no branch delay slots 81 00:05:52,799 --> 00:05:56,671 in our architecture. And we'll talk, more about branch delay 82 00:05:56,671 --> 00:06:00,632 slots in a second. Let's draw a pipeline diagram here. 83 00:06:00,632 --> 00:06:05,600 So we're gonna plot time, and then we're going to step through a basic instruction 84 00:06:05,600 --> 00:06:10,056 sequence here, and this basic instruction sequence is actually going to start here. 85 00:06:10,056 --> 00:06:13,057 We're going to have instruction one and instruction two. 86 00:06:14,027 --> 00:06:22,039 Instruction one is taking some register and adding it to something else. 87 00:06:22,039 --> 00:06:26,070 We talked about there being data dependencies or data hazards. 88 00:06:26,070 --> 00:06:30,082 In this case, there is no data dependence and no data hazard here. 89 00:06:30,082 --> 00:06:35,082 You'll see that this is range register r1 and this is reading from register r2. 90 00:06:35,082 --> 00:06:38,073 So there's no, there's no data hazards here. 91 00:06:38,073 --> 00:06:41,070 We just want to look at the, the control hazard. 92 00:06:41,070 --> 00:06:44,061 The first instruction just goes down the pipe. 93 00:06:44,061 --> 00:06:48,060 Such decode, execute, memory, write back. Our five stage MIPS pipe. 94 00:06:48,060 --> 00:06:51,095 The second instruction it starts going down the pipe. 95 00:06:52,074 --> 00:06:55,046 And the first goes into the, the fetch stage. 96 00:06:55,046 --> 00:06:59,873 But the problem here is we actually need to stall the fetch stage, or we need to, 97 00:06:59,873 --> 00:07:05,051 be, because we don't know that the second instruction is the second instruction yet. 98 00:07:05,051 --> 00:07:10,065 We don't, for instance, we don't know that this first instruction is not a branch or 99 00:07:10,065 --> 00:07:13,063 a jump. So we don't know the address of the next 100 00:07:13,063 --> 00:07:17,010 instruction. That's kind of odd. 101 00:07:17,010 --> 00:07:24,005 Now why do we not know this? So, going back to, to this example here. 102 00:07:24,005 --> 00:07:31,017 One thing that's common through all these different cases, is they all need to 103 00:07:31,017 --> 00:07:36,026 decode the op code. Well, where do we do the decoding of the 104 00:07:36,026 --> 00:07:40,020 op code? We don't do that until the decode stage of 105 00:07:40,020 --> 00:07:43,024 the pipe. So we don't do that until here. 106 00:07:43,069 --> 00:07:51,371 And we're not able to use that information until the end of the cycle, which would be 107 00:07:51,371 --> 00:07:54,811 sort of here. And we would need that information to 108 00:07:54,811 --> 00:07:59,625 determine what's going on here. So if you had a branch, for instance here 109 00:07:59,625 --> 00:08:04,702 where it's not able to get around and change the program counter and change what 110 00:08:04,702 --> 00:08:08,867 is being indexed into the instruction memory on this cycle. 111 00:08:08,867 --> 00:08:14,543 So, what we're going to have to do, is we're going to have to insert a decode 112 00:08:14,543 --> 00:08:19,387 bubble here for this structural hazard. Now, if you sort of play this forward for 113 00:08:19,387 --> 00:08:24,009 more instructions, what you're going realize is this is not very efficient. 114 00:08:24,009 --> 00:08:29,054 Every instruction that goes down the pipe is going to hit a control hazard, and 115 00:08:29,054 --> 00:08:33,945 every instruction that goes down the pipe, you're going to basically going to hit 116 00:08:33,945 --> 00:08:40,229 this decode, decoding hazard, and every instruction now takes two cycles. 117 00:08:40,229 --> 00:08:45,360 So your clock per instruction for this is not gonna be very good. 118 00:08:45,360 --> 00:08:51,424 Let's, let's analyze that now so we can, we can draw this in the other pipeline 119 00:08:51,424 --> 00:08:58,514 diagram axis and see that what's happening here is we're, let's take the execute 120 00:08:58,514 --> 00:09:02,354 stage. We're executing instruction I1, there were 121 00:09:02,354 --> 00:09:04,477 no oping. Instruction I2, no op. 122 00:09:04,477 --> 00:09:08,373 I3 no op. And, and you compute this all out, you end 123 00:09:08,373 --> 00:09:12,986 up with a CPI of two. So your machine is running at strictly 124 00:09:12,986 --> 00:09:16,592 half the performance that you want it to run at. 125 00:09:16,592 --> 00:09:21,471 Well that's, that's not very good. So let's start to talk about some 126 00:09:21,471 --> 00:09:25,260 techniques to, mitigate the effect of control hazards. 127 00:09:25,260 --> 00:09:31,229 And, we're gonna actually have a whole lecture later in the course about branch 128 00:09:31,229 --> 00:09:36,562 prediction which is one of the main techniques, in order to, mitigate control 129 00:09:36,562 --> 00:09:39,779 hazards. But let's, let's move forward here and, 130 00:09:39,779 --> 00:09:44,271 and take a look at one of these techniques, and this technique is 131 00:09:44,271 --> 00:09:47,202 speculation. So what's the solution to this? 132 00:09:47,202 --> 00:09:52,804 So the most basic solution is we actually speculate that the next address is going 133 00:09:52,804 --> 00:09:57,606 to not be a branch, or the current instruction is not going to be a branch, 134 00:09:57,606 --> 00:10:00,550 the next address is going to be the PC plus four. 135 00:10:00,553 --> 00:10:05,292 So what does this look like in a pipe? Well, there's this nice adder here. 136 00:10:05,292 --> 00:10:10,551 We're going to take the PC, and if nothing else is happening sort of later on in the 137 00:10:10,551 --> 00:10:14,346 pipe we're just going to be selecting PC+4 on this control path here. 138 00:10:14,346 --> 00:10:21,215 So we're just going to be sort of walking down here, we are gonna be, doing, 139 00:10:21,215 --> 00:10:26,781 executing 96, 100, 104 and we are not actually gonna even look at the 140 00:10:26,781 --> 00:10:31,973 instructions, until, lets say, something more interesting happens. 141 00:10:31,973 --> 00:10:36,893 So, we can just speculate, that the next, next address is PC + four. 142 00:10:36,896 --> 00:10:41,058 So, that's, that's great, but that adds some wriggles. 143 00:10:41,058 --> 00:10:44,526 What happens when we have, like a, a jump here? 144 00:10:44,526 --> 00:10:51,030 Hm, so this jump, if we speculated PC plus four, we went and fetched instruction 145 00:10:51,030 --> 00:10:56,612 three here which is at address 104, but the jumps says we're supposed to go to 146 00:10:56,612 --> 00:11:00,154 304, so this instruction is not even supposed to execute. 147 00:11:00,154 --> 00:11:05,107 So we need some mechanism to kill instructions in the pipeline, kill live 148 00:11:05,107 --> 00:11:11,583 instructions in the pipe. So how do we, how do we go about doing 149 00:11:11,583 --> 00:11:17,353 this? So let's, let's look at a brief example 150 00:11:17,353 --> 00:11:21,643 here. So we need some way to kill an 151 00:11:21,643 --> 00:11:29,049 instruction, and what we're going to do is we're going to add a multiplexer here, 152 00:11:29,049 --> 00:11:35,079 which will, multiplex in a no-op. And if we have a jump that gets to the 153 00:11:35,079 --> 00:11:41,077 decode stage of the pipe, we're going to wire back in and say, oh, that instruction 154 00:11:41,077 --> 00:11:46,659 that we just executed, or the instruction we just fetched, this one here, it, it's 155 00:11:46,659 --> 00:11:50,084 not actually supposed to go down the pipe. We should, we should kill it. 156 00:11:50,084 --> 00:11:54,073 So we're going to swing this mux, and right at the end of the cycle, we're gonna 157 00:11:54,073 --> 00:11:58,007 say, no, that's actually a no op. We're gonna insert into the pipe, and 158 00:11:58,007 --> 00:12:01,062 we're gonna redirect this multiplexer here to the actual jump location. 159 00:12:01,081 --> 00:12:06,052 And this is what I was talking about before about the extra adder here. 160 00:12:06,052 --> 00:12:10,021 Here's our extra adder which is computing our destination. 161 00:12:10,040 --> 00:12:15,081 And sometimes people try to sort of put these two things together but we're gonna 162 00:12:15,081 --> 00:12:20,097 take part of the instruction and we're gonna take current PC and add to that and 163 00:12:20,097 --> 00:12:24,053 that's gonna compute our new destination of the jump. 164 00:12:26,063 --> 00:12:29,039 Yeah. Sorry, so here's the control on this muck, 165 00:12:29,039 --> 00:12:33,017 so we just have to look to see if it's a jump, or jump and link. 166 00:12:33,017 --> 00:12:36,072 And we inserted no op. Otherwise, we actually take the thing 167 00:12:36,072 --> 00:12:44,066 coming out of construction memory. So let's look at this sort of as a things 168 00:12:44,066 --> 00:12:49,058 flowing down the pipe. If we have instruction one, the add at the 169 00:12:49,058 --> 00:12:54,035 beginning of the execute stage, instruction two here now in the decode 170 00:12:54,035 --> 00:12:57,036 stage and we just fetched 104 out of the PC. 171 00:12:57,095 --> 00:13:03,046 As we go forward one cycle, we're gonna actually take what we, took out of the 172 00:13:03,046 --> 00:13:09,050 instruction memory that was 104, and we're gonna, kill it, and put a new op in it's 173 00:13:09,050 --> 00:13:12,541 place. The jump is now entering the execute 174 00:13:12,541 --> 00:13:16,113 stage. The add is entering the memory stage, and 175 00:13:16,113 --> 00:13:21,915 we've redirected the front of the pipe here, and we're actually fetching now, the 176 00:13:21,915 --> 00:13:25,853 destination of the jump, or the instruction at 304. 177 00:13:25,853 --> 00:13:30,742 So an important question pops up here on the screen. 178 00:13:30,742 --> 00:13:38,052 What happens if we have a stall and a jump in the decode stage at the same time. 179 00:13:38,052 --> 00:13:43,039 Are there interactions here, that we should be worrying about? 180 00:13:43,039 --> 00:13:48,002 Hm, that's a tricky one. Well, the first question is, what is a 181 00:13:48,002 --> 00:13:51,011 jump? What are reasons that a jump would 182 00:13:51,011 --> 00:14:01,016 actually stall in the decode stage? It's not a whole lot. 183 00:14:01,016 --> 00:14:05,046 [laugh] In a basic pipe, probably a jump would not actually stall, in the, in the 184 00:14:05,046 --> 00:14:08,047 decode stage. The more complex pipes, you know, there 185 00:14:08,047 --> 00:14:13,021 are sometimes just stall signals that say, there's some big structural conflict later 186 00:14:13,021 --> 00:14:16,000 in the pipe, just stall the whole rest of the pipe. 187 00:14:16,000 --> 00:14:20,031 So it is possible for things to stall. One important thing is that in a, in a 188 00:14:20,031 --> 00:14:25,017 very simple pipe like this, if you have a [inaudible], let's say there actually is 189 00:14:25,017 --> 00:14:30,014 some reason this jump is stalling, and you have a, a jump in that stage, what happens 190 00:14:30,014 --> 00:14:33,055 sort of do we kill the instruction, do we let it go forward? 191 00:14:33,055 --> 00:14:40,054 Both of them are actually possible to do. More complex pipes might even think about 192 00:14:40,054 --> 00:14:46,065 actually allowing the jump to happen, and sort of squishing out any no ops that get 193 00:14:46,065 --> 00:14:50,079 inserted later on in the pipe. And let's, let's do that from a pipeline 194 00:14:50,079 --> 00:14:54,057 diagram perspective, cuz that might, shed a little light on this. 195 00:14:54,057 --> 00:14:59,042 And, instead of drawing this instruction as continuing down the pipe, we're just 196 00:14:59,042 --> 00:15:04,038 gonna put no ops here and dashes there. So the first instruction goes on the pipe. 197 00:15:04,038 --> 00:15:09,023 The jump goes on the pipe, doesn't install on anything, because we have the PC plus 198 00:15:09,023 --> 00:15:15,063 four speculation. There's no stall here, this add gets 199 00:15:15,063 --> 00:15:21,696 fetched, but doesn't ever make it into the next stage of the pipe because it gets 200 00:15:21,696 --> 00:15:26,075 killed. Then we have, the next add, the, the 201 00:15:26,075 --> 00:15:31,084 target of the jump, showing up, and we go and execute that. 202 00:15:35,050 --> 00:15:40,033 And if we look for the resource utilization, we can plot it the other 203 00:15:40,033 --> 00:15:45,032 direction, you'll see the no op moving forward in pipe stages over time.