1 00:00:03,037 --> 00:00:07,051 Let's continue on and move on to a different item here. 2 00:00:07,051 --> 00:00:13,036 We've talked about structural hazards, now we're gonna move on and talk about data 3 00:00:13,036 --> 00:00:17,089 hazards. Okay, so what it a, what is a data hazard? 4 00:00:17,089 --> 00:00:24,040 So a data hazard occurs when you have one instruction that depends on the data of a 5 00:00:24,040 --> 00:00:27,097 previous instruction, or a previous data value. 6 00:00:27,097 --> 00:00:34,382 Sorry, previous data value depending on previous instruction is not precise 7 00:00:34,382 --> 00:00:37,729 enough. We want to depend on a previous data value 8 00:00:37,729 --> 00:00:44,206 or data value that is generated by a previous instruction that is still in the 9 00:00:44,206 --> 00:00:49,963 pipeline. And like stall like, structural hazards, 10 00:00:49,963 --> 00:00:54,378 data hazards also have a couple different approaches which we will not talk about 11 00:00:54,378 --> 00:00:58,034 all of them today. But let's, let's start talking let's 12 00:00:58,034 --> 00:01:02,020 introduce them at least. First thing is you can schedule around it. 13 00:01:02,084 --> 00:01:07,049 Okay, so what does this mean. Let's say we have a processor pipeline, 14 00:01:07,049 --> 00:01:13,019 and the processor pipeline is generating values and we have one instruction which 15 00:01:13,019 --> 00:01:18,637 dependent on another instruction but the first instruction let's say takes a couple 16 00:01:18,637 --> 00:01:23,069 cycles to generate that value. The value won't be ready. 17 00:01:23,069 --> 00:01:26,034 So, can't go and issue subsequent instruction. 18 00:01:26,034 --> 00:01:30,069 So, we have data hazard here, data dependency hazard, while you can schedule 19 00:01:30,069 --> 00:01:33,057 around it. So, you could, for instance introduce, no 20 00:01:33,057 --> 00:01:37,064 operation instruction into your instruction sequence, and have the 21 00:01:37,064 --> 00:01:42,023 programmer avoid the hazard, if the programmer knows the micro architecture of 22 00:01:42,023 --> 00:01:45,059 the machine. And this is actually shown up in some 23 00:01:45,059 --> 00:01:49,067 earlier processors. Another famous example of this is the 24 00:01:49,067 --> 00:01:54,066 floating point unit for the Intel I860U, which is a old, sort of, early risk 25 00:01:54,066 --> 00:01:59,020 architecture made by Intel. In the I860 the floating point unit was 26 00:01:59,020 --> 00:02:04,013 not interlocked so if you execute a floating point instruction and you have 27 00:02:04,013 --> 00:02:09,031 another instruction, which is coming down the pipe which uses the result of that, 28 00:02:09,031 --> 00:02:13,043 you might get the wrong value. So the program was the program's 29 00:02:13,043 --> 00:02:15,833 responsibility to make sure that didn't occur. 30 00:02:15,833 --> 00:02:22,231 And you actually, put in no ops there. Next approach, which we'll be talking more 31 00:02:22,231 --> 00:02:27,590 about in today's lecture, is to stall. So if you have a data dependency, you can 32 00:02:27,590 --> 00:02:33,032 actually stall earlier, excuse me, stall later instructions dependent on earlier 33 00:02:33,032 --> 00:02:36,635 instructions. And some of the, the important thing here 34 00:02:36,635 --> 00:02:43,565 to note, is you're going to freeze the pipeline until the, preceding instruction 35 00:02:43,565 --> 00:02:47,029 has, generated the value, but hardware does this freezing. 36 00:02:47,029 --> 00:02:51,068 The hardware, does the freezing, and, we're gonna, develop this more today, but, 37 00:02:51,068 --> 00:02:55,038 you actually have to freeze, everything before that instruction. 38 00:02:55,038 --> 00:02:58,019 You just can't, freeze the dependent instruction. 39 00:02:58,019 --> 00:03:02,053 And, that's because the, the sort of, traffic behind it will catch up on the 40 00:03:02,053 --> 00:03:05,011 earlier traffic and sort of pile up into it. 41 00:03:05,011 --> 00:03:09,056 So, if you want to, make a pipeline, which works out like this, you're going to 42 00:03:09,056 --> 00:03:12,020 actually want to, stall everything earlier. 43 00:03:12,020 --> 00:03:15,071 And we'll look at the wiring that you need to do to do that. 44 00:03:16,034 --> 00:03:21,008 Another solution is you can bypass, so what this, an example of this is, you can 45 00:03:21,008 --> 00:03:26,000 add extra hardware to your data path, and the extra hardware is going to send the 46 00:03:26,000 --> 00:03:30,098 value as soon as it gets created, so you may not have to wait for it to get to the 47 00:03:30,098 --> 00:03:34,069 end of the pipeline. So if the data value gets made early, you 48 00:03:34,069 --> 00:03:39,025 can just forward that to an instruction which needs it, but that adds extra 49 00:03:39,025 --> 00:03:45,003 hardware and complexity to your design. And finally we're gonna talk about not in 50 00:03:45,003 --> 00:03:49,083 this lecture, this is the one thing, the solution we'll talk about later is, you 51 00:03:49,083 --> 00:03:53,009 can speculate. So, if you have a data hazard, you could 52 00:03:53,009 --> 00:03:57,096 assume it's not a problem or you could assume that, you know, everything's gonna 53 00:03:57,096 --> 00:04:00,079 be okay. I'll just use the encrypt value for a 54 00:04:00,079 --> 00:04:05,096 little bit of time and we'll assume that the value, the old value is equal to the 55 00:04:05,096 --> 00:04:08,941 new value or you do data speculations, other ways to do this. 56 00:04:08,941 --> 00:04:13,045 And if you make a mistake you catch it by the time you get to the end. 57 00:04:13,045 --> 00:04:17,013 And you basically have to re-execute the instruction with the correct value then. 58 00:04:17,013 --> 00:04:20,013 So you can do speculation. And this is kind of like a big guessing 59 00:04:20,013 --> 00:04:22,009 game here. But this is used in out of order 60 00:04:22,009 --> 00:04:25,078 processors in multiple different ways. And we'll be talking about that in a 61 00:04:25,078 --> 00:04:30,070 couple lectures from now. Okay so let's look at the an example data 62 00:04:30,070 --> 00:04:34,010 hazard executing on our processing pipeline. 63 00:04:34,010 --> 00:04:39,020 So we have two instructions here. We have an add with an immediate. 64 00:04:39,020 --> 00:04:42,095 Of register zero plus ten in your register one. 65 00:04:42,095 --> 00:04:49,071 So we're gonna be using in this class the notation where the left most operand or 66 00:04:49,071 --> 00:04:56,032 the left most register is the destination. And the right operands are the input and 67 00:04:56,032 --> 00:05:01,064 the source operands. So we're gonna add ten plus zero R0 in, in 68 00:05:01,064 --> 00:05:05,099 MIPS R0 is hardwired to zero, and we're gonna add that into R1. 69 00:05:05,099 --> 00:05:11,061 And then this next instruction here is going to exhibit a read after write data 70 00:05:11,061 --> 00:05:15,033 dependence. And this read after write data dependence, 71 00:05:15,033 --> 00:05:21,002 we have an addi, and it's just going to use exactly this value which gets created, 72 00:05:21,002 --> 00:05:26,001 the instruction right before. So this is going to take R1, add seventeen 73 00:05:26,001 --> 00:05:29,094 to it, and put it in, deposit it into R4 or register four. 74 00:05:29,094 --> 00:05:34,013 But we have a little bit of a challenge here because it uses the result of the 75 00:05:34,013 --> 00:05:36,063 instruction right before it. Hm. 76 00:05:36,063 --> 00:05:40,037 Okay. So, what, what happens in this design? 77 00:05:40,037 --> 00:05:44,037 Let's say our instructions are merging down. 78 00:05:44,072 --> 00:05:51,006 And, we have the first add here and the second adds back here. 79 00:05:51,006 --> 00:05:55,609 Okay, nothing bad happened so far. Now the first add goes here and the second 80 00:05:55,609 --> 00:05:58,530 add is here, okay nothing bad has happened so far. 81 00:05:58,530 --> 00:06:05,123 But the question is what do we fetch out of the register file for the second add. 82 00:06:05,123 --> 00:06:12,122 Because the result of running a one is available here but it hasn't made it back 83 00:06:12,122 --> 00:06:17,480 into the register file yet. So it's actually gonna fetch the old 84 00:06:17,480 --> 00:06:18,300 value. Hm. 85 00:06:18,300 --> 00:06:24,077 That's, that's not good. So if you were just to play this without 86 00:06:24,077 --> 00:06:29,090 any stalling or interlocking, you'll actually have this instruction read the 87 00:06:29,090 --> 00:06:34,083 old value of R, or excuse me this instruction, the second instruction read 88 00:06:34,083 --> 00:06:39,028 the old value of R1, and not the new value that we want it to read. 89 00:06:39,028 --> 00:06:42,025 So, we need to think about this a lot harder. 90 00:06:43,093 --> 00:06:47,080 Yeah, R1 is stale. Oops, we made a mistake. 91 00:06:48,043 --> 00:06:52,028 So how do we go about resolving these hazards. 92 00:06:52,028 --> 00:06:59,064 Well we want to somehow detect them, these data hazards and then we want to feed this 93 00:06:59,064 --> 00:07:06,237 information back and, later stages provide dependence information to earlier stages. 94 00:07:06,237 --> 00:07:09,573 So this is a later stage. This is an earlier stage. 95 00:07:09,573 --> 00:07:12,502 And we're going to feed information back here. 96 00:07:12,502 --> 00:07:18,773 And dependent on that information that is fed back, we're either going to stall or 97 00:07:18,773 --> 00:07:22,613 kill instructions. So the most basic example here is we're 98 00:07:22,613 --> 00:07:27,415 gonna have stage four influence stage three, and can say stage three can make 99 00:07:27,415 --> 00:07:31,653 some decisions based on that and can maybe stall or kill instructions. 100 00:07:31,653 --> 00:07:36,433 And likewise stage three will influence stage two and stage two will influence 101 00:07:36,433 --> 00:07:39,867 stage one. But this is not really good enough. 102 00:07:39,867 --> 00:07:43,124 Because let's say stage four tells stage three to do something. 103 00:07:43,124 --> 00:07:47,558 If stage three doesn't tell the earlier stages, we are gonna pile up, like cars 104 00:07:47,558 --> 00:07:50,399 are gonna pile up here all into stage three. 105 00:07:50,399 --> 00:07:55,148 So this typically means that you need some higher level, sort of, feedback here, 106 00:07:55,148 --> 00:07:59,485 where you have Stage four giving information to all the previous stages, 107 00:07:59,485 --> 00:08:04,133 Stage three giving information to all the previous stages, and vice versa. 108 00:08:04,133 --> 00:08:09,053 So that you'll actually be able to provide information to all the previous stages, 109 00:08:09,053 --> 00:08:12,544 and then everyone can make a decision based on this. 110 00:08:12,544 --> 00:08:19,190 And it's really important here is, control your pipeline like this. 111 00:08:19,190 --> 00:08:26,180 Basically requires that a pipeline this is really important because if you don't and 112 00:08:26,180 --> 00:08:30,303 if you have sort of feedback going the other direction. 113 00:08:30,303 --> 00:08:36,058 You might end up with some sort of deadlock in your processor because if you 114 00:08:36,058 --> 00:08:42,114 have a early instruction which dependent on a later instruction then you might have 115 00:08:42,114 --> 00:08:45,338 that resource might never get free cause the. 116 00:08:45,338 --> 00:08:50,418 Later instruction might be dependent on the earlier instruction, vice versa. 117 00:08:50,418 --> 00:08:55,644 And, all of a sudden, you have a sort of big cycle, and everyone is dependent on 118 00:08:55,644 --> 00:09:00,692 everyone, and the machine just stops. So it's really important that when you're 119 00:09:00,692 --> 00:09:06,585 doing this that stage I plus one only feeds strictly information back to stages 120 00:09:06,585 --> 00:09:12,017 I to, or one to I. Okay, so let's resolve some data hazards 121 00:09:12,017 --> 00:09:17,099 and look at how we'd actually do this on our simplified pipeline here. 122 00:09:17,099 --> 00:09:22,345 We're going to use the same example case here, we have two adds and we have a 123 00:09:22,345 --> 00:09:29,602 dependence between R through register one. So, the first thing to realize that we're 124 00:09:29,602 --> 00:09:35,626 going to have to do is we're gonna have, where, where do we need to stall the pipe 125 00:09:35,626 --> 00:09:39,053 line? Or where do we need to stop the pipe line? 126 00:09:39,053 --> 00:09:43,512 Or interlock the pipe line? And we're gonna look at these two adds, it 127 00:09:43,512 --> 00:09:48,953 wasn't a problem until the second add went to go and read the register file here. 128 00:09:48,953 --> 00:09:54,725 If the first if we, if we didn't read the register file until later, this would not 129 00:09:54,725 --> 00:09:58,676 have been a problem. Because the value might have been up to 130 00:09:58,676 --> 00:10:01,476 date. But because we read it so early here, we 131 00:10:01,476 --> 00:10:06,463 pipeline sort of when data gets computed and when it gets written back. 132 00:10:06,463 --> 00:10:11,259 And read into different stages. We have this, this, this challenge. 133 00:10:11,259 --> 00:10:16,440 So, what we're going to do is we're going to stall here. 134 00:10:16,440 --> 00:10:22,130 And reread the register file over and over again until the first value. 135 00:10:22,130 --> 00:10:25,427 Of our one gets written to the register file. 136 00:10:25,427 --> 00:10:28,680 Unfortunately, we're gonna be waiting a while here. 137 00:10:28,680 --> 00:10:31,285 Because we don't write the register file here. 138 00:10:31,285 --> 00:10:36,285 We don't write the register file here. We write the register file here. 139 00:10:36,285 --> 00:10:41,899 Through that wire. So we like to we're gonna stall multiple 140 00:10:41,899 --> 00:10:46,007 cycles here. So when we're doing the stalling in the 141 00:10:46,007 --> 00:10:51,808 stage we're gonna wanna think really hard on what should be going down the pipe 142 00:10:51,808 --> 00:10:55,342 during that time. Well, we've stalled instruction, the 143 00:10:55,342 --> 00:11:01,032 second ad here, instruction two here, here and obstruction one keeps flowing down the 144 00:11:01,032 --> 00:11:04,463 pipe so we don't want to stall these later stages. 145 00:11:04,463 --> 00:11:09,622 We only want to stall this stage and we need to stall the previous stages so, 146 00:11:09,622 --> 00:11:16,808 instruction don't pile up into us. Now how do we go about, from a wiring 147 00:11:16,808 --> 00:11:21,541 perspective. Executing instructions, and sort of having 148 00:11:21,541 --> 00:11:25,095 the first set of instructions clear out of the pipeline. 149 00:11:25,095 --> 00:11:30,412 Well, we're actually going to insert a multiplexer here on the instruction 150 00:11:30,412 --> 00:11:34,346 register side of things. And this multiplexer is going to insert no 151 00:11:34,346 --> 00:11:37,107 op instructions, or no operation instructions. 152 00:11:37,107 --> 00:11:41,849 We're actually going to be inserting bubbles into the pipe right here. 153 00:11:41,849 --> 00:11:46,320 So that the first instruction, the first add, can clear out of the pipe. 154 00:11:46,320 --> 00:11:49,331 And we know what's, what's, executing here. 155 00:11:49,331 --> 00:11:54,503 So, we don't want to accidentally have the second instruction start executing, even 156 00:11:54,503 --> 00:11:58,995 though it's stalled here. Cuz then it's gonna possible change state 157 00:11:58,995 --> 00:12:03,799 as it goes down the pipeline. And it'll also affect sort of dependency 158 00:12:03,799 --> 00:12:08,819 calculations here. So we wanna, we wanna insert no ops. 159 00:12:08,819 --> 00:12:16,271 And the stall condition, as I said, goes to the program counter, the instruction 160 00:12:16,271 --> 00:12:20,065 register. So these flip flops before the stall point 161 00:12:20,065 --> 00:12:25,911 and it's also going to go into the select line on this multiplexer to choose to 162 00:12:25,911 --> 00:12:30,026 insert no loss. Okay, so that's the beginning of this and 163 00:12:30,026 --> 00:12:36,008 I want to just say that this is sometimes called interlocking or interlocks. 164 00:12:36,008 --> 00:12:41,007 Its an important nomenclature here. You're interlocking the execution of the 165 00:12:41,007 --> 00:12:44,936 instruction here and depending on the other ones then you're actually checking 166 00:12:44,936 --> 00:12:47,752 that sometimes people call stalling it or locking. 167 00:12:47,752 --> 00:12:50,327 Okay. So let's draw a pipeline diagram of what's 168 00:12:50,327 --> 00:12:53,041 going on here. So we're going to plot time, on the X 169 00:12:53,041 --> 00:12:55,390 axis, and instructions on the vertical axis. 170 00:12:55,390 --> 00:12:59,138 And we'll look at the other the resource graph also. 171 00:12:59,138 --> 00:13:04,615 Okay, so we have our first instruction going down the pipe here. 172 00:13:04,615 --> 00:13:09,814 It takes register zero, adds ten to it, and it's gonna go fetch, decode, execute, 173 00:13:09,814 --> 00:13:14,445 memory, write back. Second instruction starts going down the 174 00:13:14,445 --> 00:13:19,524 pipe here. Third, fourth, fifth well you'll note 175 00:13:19,524 --> 00:13:24,425 something here we've stalled and the pipeline. 176 00:13:24,425 --> 00:13:31,611 Because we need to wait. For the first instruction to write back to 177 00:13:31,611 --> 00:13:36,244 the register file, before we can go read the result, or read that value from the 178 00:13:36,244 --> 00:13:38,779 register file. So we strictly have to have this 179 00:13:38,779 --> 00:13:43,417 instruction here, the second instruction, the dependent instruction, in the decode 180 00:13:43,417 --> 00:13:48,991 stage, and read the register file here on the cycle after the write-back occurs. 181 00:13:48,991 --> 00:13:53,441 And we detect this stall condition this whole time. 182 00:13:53,441 --> 00:13:59,510 As, as denoted here with the, the nice purple box. 183 00:13:59,510 --> 00:14:02,253 And. As I said, we need to stall earlier 184 00:14:02,253 --> 00:14:05,563 instructions. So this doesn't stall not only instruction 185 00:14:05,563 --> 00:14:10,412 I2 but installs instruction I3 cuz it's in a earlier stage in the pipe and it needs 186 00:14:10,412 --> 00:14:13,430 to be, it needs to be stalled. Okay. 187 00:14:13,430 --> 00:14:18,064 So we could also graph this the other direction and, and the reason it is useful 188 00:14:18,064 --> 00:14:22,312 to graph the other direction is you can where no operations get inserted or no ops 189 00:14:22,312 --> 00:14:27,944 get inserted. So here we, we've plotted the other 190 00:14:27,944 --> 00:14:32,758 direction of time versus stage of the pipeline or resource. 191 00:14:32,758 --> 00:14:39,072 And you can see the we can see what's in the different stages, in the different 192 00:14:39,072 --> 00:14:43,956 stages, and at some point here, there's nothing in these stages. 193 00:14:43,956 --> 00:14:50,201 Instead we've actually inserted no ops, and it's, comes from the fact that we 194 00:14:50,201 --> 00:14:55,638 basically have I3 sitting in the instruction fetch for three cycles and i2 195 00:14:55,638 --> 00:15:00,838 sitting in the decode for three cycles, and the later stages of the pipe get no 196 00:15:00,838 --> 00:15:04,649 ops inserted and that is what that multiplexor is doing. 197 00:15:04,649 --> 00:15:09,494 So now that we've talked about how stalling happens in a pipeline diagram, 198 00:15:09,494 --> 00:15:13,677 lets move on and look at, what the logic looks like inside of this. 199 00:15:13,677 --> 00:15:19,557 So here we have our data path for our stall our data path for our five stage 200 00:15:19,557 --> 00:15:22,648 pipe. And in stalling, what we are really trying 201 00:15:22,648 --> 00:15:28,783 to do here is, detects the case when a earlier instruction writes are register 202 00:15:28,783 --> 00:15:36,042 which a later instruction is going to use. So in this case we are going to detect the 203 00:15:36,042 --> 00:15:42,075 FA instruction in the decode stage is reading a value which instruction in the 204 00:15:42,075 --> 00:15:45,077 execute. Memory stage or write back stage writes. 205 00:15:45,077 --> 00:15:48,083 So an uncommitted instruction writes to a register. 206 00:15:48,083 --> 00:15:53,021 And a previous instruction here. Or, excuse me, a later instruction goes 207 00:15:53,021 --> 00:15:56,009 and reads that, and we stall at the decode stage. 208 00:15:56,009 --> 00:16:00,017 When we stall, we're gonna actually wanna stall everything behind it. 209 00:16:00,017 --> 00:16:03,074 As we'd sort of already talked about here. Okay. 210 00:16:03,074 --> 00:16:06,039 So let's start calculating the signal here. 211 00:16:06,039 --> 00:16:09,077 We'll call it. It's the control signal, so we'll call it 212 00:16:09,077 --> 00:16:11,025 c. We'll call it c stall. 213 00:16:11,025 --> 00:16:15,031 So we'll draw a little, blob up here. We'll call this the, the stall 214 00:16:15,031 --> 00:16:18,027 calculation. And what goes into this calculation? 215 00:16:18,027 --> 00:16:23,025 Well, it's sort of a complex calculation. But the first thing we're gonna wanna to 216 00:16:23,025 --> 00:16:27,181 check is, is we're gonna want to check, the, destination for the operand. 217 00:16:27,181 --> 00:16:33,114 So we're gonna check the destination for in some instruction that was an earlier 218 00:16:33,114 --> 00:16:36,564 instruction. And this is the register identifier, not 219 00:16:36,564 --> 00:16:39,988 the data value, but instead the register identifier. 220 00:16:39,988 --> 00:16:44,001 So this would be RD in a typical MIPS instruction. 221 00:16:44,001 --> 00:16:48,842 And we're gonna compare that, and we'll call it WS in this calculation. 222 00:16:48,842 --> 00:16:53,302 And we're gonna compare it to the two source inputs here or the, the source 223 00:16:53,302 --> 00:16:57,634 operand register identifiers. And these are both because of 32 registers 224 00:16:57,634 --> 00:17:02,606 in MIPS, these are both five, or these are all five bit values, all three of these 225 00:17:02,606 --> 00:17:05,520 values. And we're gonna wire that all into our 226 00:17:05,520 --> 00:17:08,249 stall control unit here. Okay. 227 00:17:08,249 --> 00:17:13,039 And if we get a match, the most basic thing we're gonna wanna do is we're gonna 228 00:17:13,039 --> 00:17:18,309 say stall everything earlier. And, insert no op instructions later down 229 00:17:18,309 --> 00:17:21,606 the pipe. So the stall's going to control a lot of 230 00:17:21,606 --> 00:17:24,943 things. It's basically going to control the front 231 00:17:24,943 --> 00:17:28,804 end of the pipe. And it's gonna disallow in the instruction 232 00:17:28,804 --> 00:17:34,712 here for moving far forward in the pipe, if anything in these later stages has a 233 00:17:34,712 --> 00:17:40,426 same-destination operand, or in this case so far, we're comparing against just this 234 00:17:40,426 --> 00:17:43,779 location. So, we're comparing against if the write 235 00:17:43,779 --> 00:17:48,923 back stage has the same register identifier destination as either of the 236 00:17:48,923 --> 00:17:54,303 two source operands for the instruction at the, at the decode stage. 237 00:17:54,303 --> 00:18:01,561 Okay, so, what should we do if, should we just stall always if the RS fields, one of 238 00:18:01,561 --> 00:18:05,541 these source fields, here or here, matches RD here? 239 00:18:05,541 --> 00:18:11,030 Some RD. Should we just always stall in that case? 240 00:18:11,030 --> 00:18:16,148 Well, hm. Not every instruction is going to write a 241 00:18:16,148 --> 00:18:19,781 register. So, what if we have an instruction for 242 00:18:19,781 --> 00:18:24,439 instance something like a store instruction which does not write a 243 00:18:24,439 --> 00:18:27,774 register. So if this, we have a storage instruction 244 00:18:27,774 --> 00:18:32,133 which is not ready to register, we probably shouldn't be doing this compare 245 00:18:32,133 --> 00:18:36,624 operation because we can have better performance if we don't stall under those 246 00:18:36,624 --> 00:18:41,400 conditions, and we're gonna introduce a signal here called write enabler, WE. 247 00:18:41,400 --> 00:18:45,027 And WE's gonna get wired into our stall calculation. 248 00:18:45,027 --> 00:18:49,312 Likewise, not every instruction reads, both input operands. 249 00:18:49,312 --> 00:18:52,936 And a good example of this is an immediate instruction. 250 00:18:52,936 --> 00:18:57,519 An immediate instruction only reads one of the source, operands. 251 00:18:57,519 --> 00:19:00,636 Or the source register identified operands. 252 00:19:00,636 --> 00:19:04,941 And the other value comes from the immediate bits, which are in the 253 00:19:04,941 --> 00:19:09,008 instruction coding. So this is going to do, introduce a read 254 00:19:09,008 --> 00:19:14,821 enable calculation here, because not every instruction reads, reads a register. 255 00:19:14,821 --> 00:19:20,256 Okay, so lets calculate this a little bit more here, so we are going to have 256 00:19:20,256 --> 00:19:25,878 something which calculates the destination and I will talk about what this blob is 257 00:19:25,878 --> 00:19:30,454 here in a second. But then we need to add effectively, we 258 00:19:30,454 --> 00:19:35,778 write enable bits or we write the enable for this location in the pipe. 259 00:19:35,778 --> 00:19:38,274 And need to these read enable signals here. 260 00:19:38,274 --> 00:19:42,600 And these get calculated from the instruction registers in the respective 261 00:19:42,600 --> 00:19:46,059 locations in the pipe. So, some decode bits there happening, or 262 00:19:46,059 --> 00:19:50,708 maybe it all gets decoded here in the decode stage, and then we just pipeline 263 00:19:50,708 --> 00:19:54,960 those bits forward. And we do this calculation here, and now 264 00:19:54,960 --> 00:20:01,134 we can say well if the instruction in the decode stage matches what is being written 265 00:20:01,134 --> 00:20:05,438 back, the write back value, then we stall. Okay. 266 00:20:05,438 --> 00:20:12,041 That's, that's close to our full solution. Let's talk about this circle here. 267 00:20:12,041 --> 00:20:17,947 What's going on in this circle? Well, what you might notice is, something 268 00:20:17,947 --> 00:20:24,257 like a jump and link or a jump and link register have an implicit destination 269 00:20:24,257 --> 00:20:28,094 target. The destination is not actually encoded in 270 00:20:28,094 --> 00:20:35,005 RD field of the instruction. So, instead we need to add a multiplexor 271 00:20:35,005 --> 00:20:41,083 here, which multiplexes from the bits out the instruction register and just a hard 272 00:20:41,083 --> 00:20:45,087 coded value of 31, which is gonna denote register 31. 273 00:20:45,087 --> 00:20:52,000 And by doing that, we can handle jump in links and jump in link registers. 274 00:20:52,000 --> 00:20:58,154 Okay, so to, to finish this out, we want to compare not against just in the right 275 00:20:58,154 --> 00:21:03,782 back stage of the pipe. But we wanna compare in all three of these 276 00:21:03,782 --> 00:21:10,302 subsequent stages here, here, and here. So we add extra calculation logic here, 277 00:21:10,302 --> 00:21:16,051 which computes the, right enable. And, the register identifier, based on the 278 00:21:16,051 --> 00:21:19,703 instruction register. And, this might just be, generate early in 279 00:21:19,703 --> 00:21:22,488 the pipe4 depending on your pipeline design. 280 00:21:22,488 --> 00:21:25,595 That's sort of more traditional pipe line control. 281 00:21:25,595 --> 00:21:29,068 In, all this right now, is ignoring a jumps and branches. 282 00:21:29,068 --> 00:21:33,663 If you introduce jumps and branches, things get, a little bit more complicated. 283 00:21:33,663 --> 00:21:38,541 We'll talk about that in the control section, control hazard section of today's 284 00:21:38,541 --> 00:21:43,894 lecture. So now lets go on and talk about different 285 00:21:43,894 --> 00:21:49,286 instructions. And where they have, where they, yet 286 00:21:49,286 --> 00:21:53,020 there's source operands. And what is their destination operens? 287 00:21:53,020 --> 00:21:57,075 And if every instruction in something like Mips, has all sources and all deaths. 288 00:21:57,093 --> 00:22:02,040 So this is just summing up what I said before, is that not every instruction 289 00:22:02,040 --> 00:22:06,099 reads, and not every instruction writes. So, as we can see here, ALU instructions. 290 00:22:06,099 --> 00:22:12,069 Read two operands and running operands, but to store reads two operands, writes no 291 00:22:12,069 --> 00:22:16,042 operand. Jumps and links only write and don't read. 292 00:22:16,042 --> 00:22:20,008 So, it's a mix and match, even in something like MIPS. 293 00:22:20,008 --> 00:22:25,093 And one other thing I wanted to point out is, where this is actually encoded in the 294 00:22:25,093 --> 00:22:31,006 instruction moves around a little bit. Mips tried to make this relatively 295 00:22:31,006 --> 00:22:36,028 uniform, but there's some examples here where you see the destination changes a 296 00:22:36,028 --> 00:22:41,021 little bit between immediates versus non-immediates, and this is just cuz they 297 00:22:41,021 --> 00:22:45,059 didn't the encoding space there to leave everything in a fixed location. 298 00:22:45,077 --> 00:22:49,049 Something writes a destination or not. And, we have two things. 299 00:22:49,049 --> 00:22:52,083 We have a source, or the source operator and identifier. 300 00:22:52,083 --> 00:22:58,023 And then we have, the right enable, whether it is, being ridden or not. 301 00:22:58,023 --> 00:23:03,053 And, as you can see, there's a little, like, case statement here dependent on the 302 00:23:03,053 --> 00:23:07,040 instruction type. And this is, the instruction which is in 303 00:23:07,040 --> 00:23:11,008 the, later stages of the pipeline that is executing. 304 00:23:11,008 --> 00:23:16,051 So we have if the ALU instruction comes out of our D, if it's a ALU, immediate 305 00:23:16,051 --> 00:23:22,009 instruction or a load, it comes out of RT, if it's a jump, jump link, it's R31, and 306 00:23:22,009 --> 00:23:25,001 this says what you need to compare against. 307 00:23:25,031 --> 00:23:32,062 And then, whether you need to write or not is a little bit complex here if you have a 308 00:23:32,062 --> 00:23:37,056 LU, an LUI, or a write. It writes it, except the case if the right 309 00:23:37,056 --> 00:23:42,056 source, or the right, right numbers or, the right register identifier is zero. 310 00:23:42,056 --> 00:23:47,743 Cuz in MIPS, the zero register is a throw away register, you don't need to interlock 311 00:23:47,743 --> 00:23:51,043 against it. You wouldn't be incorrect if you did 312 00:23:51,043 --> 00:23:55,043 interlock against it, you would just have slower performance. 313 00:23:55,043 --> 00:24:00,070 And then jump and link, jump and link register, always write and then everything 314 00:24:00,070 --> 00:24:05,083 else doesn't write the register file. So, we've, this is a, so the first part of 315 00:24:05,083 --> 00:24:09,048 our calculation. And now we need to, do a calculation, of 316 00:24:09,048 --> 00:24:13,361 whether we actually read the value. And there's gonna be two of these, one for 317 00:24:13,361 --> 00:24:16,764 the first operand, one for the second operand. 318 00:24:16,764 --> 00:24:21,322 Okay let's build up for the different instructions here. 319 00:24:21,322 --> 00:24:27,286 So what's basically transforming this table into some logic equations that we're 320 00:24:27,286 --> 00:24:33,962 gonna use. So, ALU, ALUI, loads, stores, branches, 321 00:24:33,962 --> 00:24:39,800 they all read. Jump it register jump and link, register 322 00:24:39,800 --> 00:24:43,903 all read, at least the first source operand. 323 00:24:43,903 --> 00:24:50,481 So, RE1 gets set to true or one or on if it's any of these op codes. 324 00:24:50,481 --> 00:24:57,287 But for jump and jump link, which don't read a first operand it doesn't, that 325 00:24:57,287 --> 00:25:04,536 comparison has to not compare against this value otherwise you'd be stalling too 326 00:25:04,536 --> 00:25:09,377 often. For, the second operand, only, true ALU 327 00:25:09,377 --> 00:25:16,243 instructions, so not immediate instructions, and stores read that second 328 00:25:16,243 --> 00:25:20,450 operand. And everything else does it. 329 00:25:20,450 --> 00:25:26,528 Okay, so now let's put together the actual stall signal, and this is the stall signal 330 00:25:26,528 --> 00:25:31,111 for sort of the decode stage and the fetch stage of the pipe. 331 00:25:31,111 --> 00:25:41,052 And we're gonna end up with stall equaling a comparison between the source register 332 00:25:41,052 --> 00:25:47,032 identifier in the decode stage compared with the right. 333 00:25:47,032 --> 00:25:52,686 Register identifier in the execute stage And. 334 00:25:52,686 --> 00:26:08,038 That it's actually writing or we need to check with the memory stage. 335 00:26:08,038 --> 00:26:11,077 So the same calculation with the memory stage. 336 00:26:12,079 --> 00:26:15,082 Same calculation here for the write back stage. 337 00:26:15,082 --> 00:26:21,009 And then we also want to make sure, so we take this whole expression, and we end it 338 00:26:21,009 --> 00:26:25,079 with whether we actually have a read enable for the first source operand. 339 00:26:25,079 --> 00:26:30,074 Cuz if we don't read the first source operand, there's no reason to stall for 340 00:26:30,074 --> 00:26:34,066 it. And we do a similar sort of thing here, 341 00:26:34,066 --> 00:26:39,061 for the second source operand. We use, RE2 that we derived here, and we 342 00:26:39,061 --> 00:26:42,085 ended together with an expression which says. 343 00:26:42,085 --> 00:26:48,095 Does the RT, the registry identifier RT in the instruction, is it the same thing as, 344 00:26:48,095 --> 00:26:53,084 the different destination, register identifiers in the subsequent 345 00:26:53,084 --> 00:26:57,079 instructions? In the different stages of the pipe, later 346 00:26:57,079 --> 00:27:03,048 stages of the pipe. Okay, so is this everything? 347 00:27:04,048 --> 00:27:12,556 Hm, it looks pretty complicated. And this is well, this isn't so bad so 348 00:27:12,556 --> 00:27:17,034 far. If we, if we make the pipe longer, we're 349 00:27:17,034 --> 00:27:22,639 gonna end up with more terms inside of these two equations. 350 00:27:22,639 --> 00:27:31,023 Well, no, that's not quite the full story. So what are, what are we missing here? 351 00:27:31,023 --> 00:27:34,080 Why else would we have to stall a pipeline? 352 00:27:34,080 --> 00:27:41,070 Well, unfortunately, this only takes into account instructions where the destination 353 00:27:41,070 --> 00:27:45,094 is available right at the end of the execute stage. 354 00:27:45,094 --> 00:27:48,085 Here and here. This encapsulates it. 355 00:27:48,085 --> 00:27:59,531 These two comparisons encapsulate that. Well, something like a load doesn't 356 00:27:59,531 --> 00:28:05,482 necessarily encapsulate that because the load value is not ready until all the way 357 00:28:05,482 --> 00:28:11,046 down here. So we might need to insert some extra 358 00:28:11,078 --> 00:28:16,084 stalls for that. Also, loads and stores are more 359 00:28:16,084 --> 00:28:25,086 complicated because you might have a data dependence through the data memory itself. 360 00:28:25,086 --> 00:28:29,092 This example here. We have a little snippet of code here. 361 00:28:29,092 --> 00:28:35,086 We have a load which is going to take or excuse me a store here which is going to 362 00:28:36,008 --> 00:28:40,006 take register two and write it to some place in memory. 363 00:28:40,006 --> 00:28:46,000 And then we have a load here that we're going to read out of some place in memory 364 00:28:46,000 --> 00:28:52,085 and put into register four. Okay so the question comes up. 365 00:28:52,085 --> 00:29:05,110 Is there any possible data hazard here? Yes, cuz what if R1 plus seven is equal to 366 00:29:05,110 --> 00:29:10,224 R3 + five. So we're going to be having a case where 367 00:29:10,224 --> 00:29:19,978 you actually have two different values here where one is, needs to pick up the 368 00:29:19,978 --> 00:29:26,083 data value of the previous store. So the load needs to pick up the data 369 00:29:26,083 --> 00:29:32,020 value of the previous store if, and only if, this is equal to this. 370 00:29:32,070 --> 00:29:33,028 Hm. Okay. 371 00:29:33,028 --> 00:29:38,056 So that's, that's not so bad. So let's look at these, data hazards a 372 00:29:38,056 --> 00:29:44,044 little bit more and figure out how we can derive the equation to check for them. 373 00:29:45,079 --> 00:29:54,034 So, just a recap here, our example is we have registers R2 are storing it into a 374 00:29:54,034 --> 00:30:00,093 location here, and we're reading from possibly the same location. 375 00:30:00,093 --> 00:30:06,938 We don't know. So what if our, our one plus seven is 376 00:30:06,938 --> 00:30:13,073 equal to R3 plus five? Well, first of all we're writing and 377 00:30:13,073 --> 00:30:18,945 reading to the same address in time. Right next to each other. 378 00:30:18,945 --> 00:30:26,550 Well, our hazard is actually avoided because our memory system is so fast. 379 00:30:26,550 --> 00:30:31,987 Because everything goes down the, the pipeline, an order will actually go to, 380 00:30:31,987 --> 00:30:35,096 right to the memory. And the next cycle we'll be able to read 381 00:30:35,096 --> 00:30:37,623 out of that memory. And we'll pick up the new value. 382 00:30:37,623 --> 00:30:43,686 Pick up the, the changed value. But I want to introduce this because in 383 00:30:43,686 --> 00:30:47,867 more realistic memory systems, this requires much more careful handling, 384 00:30:47,867 --> 00:30:52,015 because if you have a memory system which takes multiple cycles for the store to 385 00:30:52,015 --> 00:30:56,500 happen, or the store happens let's say, at the end of the pipe into the, the memory 386 00:30:56,500 --> 00:31:01,593 then you're not gonna necessarily get that value, and you might need to bypass that 387 00:31:01,593 --> 00:31:05,477 or you might need to do something more intelligent. 388 00:31:05,477 --> 00:31:12,042 Okay, so let's, we talked about stalling the pipeline. 389 00:31:15,050 --> 00:31:19,049 Now let's look at, if we want to improve the performance some more. 390 00:31:19,049 --> 00:31:24,038 So one of the things you may not have noticed in that stall, but did happen, is 391 00:31:24,038 --> 00:31:28,079 that if there were any instructions in the later portion of the pipeline. 392 00:31:28,079 --> 00:31:33,025 Which in earlier, its instruction decode stage [inaudible], it stops. 393 00:31:33,025 --> 00:31:36,098 So, no place do we actually afford the data values early. 394 00:31:36,098 --> 00:31:42,031 And now we want to talk about forwarding and bypassing, of how to add extra data 395 00:31:42,031 --> 00:31:47,063 paths, to allow a value to be sent from a later stage to an earlier stage, faster 396 00:31:47,063 --> 00:31:51,090 than having to wait for the right back of the pipeline to occur. 397 00:31:56,046 --> 00:32:05,022 So here we have the, bypassing, or here we have our data path that we had from 398 00:32:05,022 --> 00:32:09,048 before. And, what I'm trying to get at is, you 399 00:32:09,048 --> 00:32:14,279 have the problem that you have a value here or you have, you have an instruction 400 00:32:14,279 --> 00:32:17,033 here. And if there was any instruction which 401 00:32:17,033 --> 00:32:22,045 write the source, writes through a operand register or identifier that this 402 00:32:22,045 --> 00:32:26,006 instruction is gonna wanna read, it's going to stall. 403 00:32:26,006 --> 00:32:31,006 But, as you might notice, a little bit of insight here, is, if you have an add, you 404 00:32:31,006 --> 00:32:36,045 can actually try to read this value early. But our data path is not good enough to do 405 00:32:36,045 --> 00:32:42,003 that right now. Okay, so let's add in a bypass here. 406 00:32:42,068 --> 00:32:48,031 So we're gonna add in this bypass which takes the result of the ALU, turns it 407 00:32:48,031 --> 00:32:54,067 around, and puts a multiplexer here, and we can now detect whether using sort of a 408 00:32:54,067 --> 00:33:00,037 similar sort of signal as our stall signal, we can detect whether two operands 409 00:33:00,037 --> 00:33:06,001 match, and if they do, we get the result value out of the ALU early, and run it 410 00:33:06,001 --> 00:33:12,495 through this multiplexer. Okay, so an important question is does 411 00:33:12,495 --> 00:33:18,046 this help our example we had before from performance perspective. 412 00:33:18,046 --> 00:33:25,000 So our stalling logic that we put in was good enough to make sure there wasn't 413 00:33:25,000 --> 00:33:28,068 error. But its not good enough to actually have 414 00:33:28,068 --> 00:33:33,056 good performance because you have to wait for the value to get to the register file 415 00:33:33,056 --> 00:33:37,028 before you go ahead. So, here we have the same example that we 416 00:33:37,028 --> 00:33:41,069 had earlier in class where you have, something which writes register one and 417 00:33:41,069 --> 00:33:46,040 something that reads register one, and to ALU add instruction, in this instruction 418 00:33:46,040 --> 00:33:50,434 back to back. So does, does this get help? 419 00:33:50,434 --> 00:33:54,362 Well, yes. So this you can see, clearly see that this 420 00:33:54,362 --> 00:34:00,896 instruction is, the, the result here is gonna come back around and we only, we, we 421 00:34:00,896 --> 00:34:05,345 effectively don't have to stall the second instruction. 422 00:34:05,345 --> 00:34:09,486 Because it can pick up that data value right then and there. 423 00:34:09,486 --> 00:34:14,441 The data value gets calculated in this stage, it can sort of loop around real 424 00:34:14,441 --> 00:34:17,976 fast here. And we don't have to stall at all. 425 00:34:17,976 --> 00:34:24,532 In this case. Okay so, quick quiz question here. 426 00:34:24,532 --> 00:34:32,391 Two other cases. We have a memory operation, we just erased 427 00:34:32,391 --> 00:34:37,988 memory one, and then we have a jump in link, and then an add. 428 00:34:37,988 --> 00:34:42,618 Does this bypass, right here, help, with these two cases? 429 00:34:42,618 --> 00:34:47,681 Well. We said it helped here. 430 00:34:47,681 --> 00:34:53,008 This case here. Well, when is the memory, load? 431 00:34:53,008 --> 00:34:57,095 This is a load. When does the load result get calculated? 432 00:34:57,095 --> 00:35:02,309 The load result doesn't get calculated until the output here of the data memory 433 00:35:02,309 --> 00:35:05,924 or right here. So all the son. 434 00:35:05,924 --> 00:35:10,544 That's after this bypass. So it's too late. 435 00:35:10,544 --> 00:35:16,882 So we still need to stall the pipeline here for load with a dependent 436 00:35:16,882 --> 00:35:23,458 instruction, dependent on the load. So we're gonna stall there. 437 00:35:23,458 --> 00:35:26,109 Okay. Now, a trickier one. 438 00:35:26,109 --> 00:35:31,299 Jump and link 500. And then something which reads R31. 439 00:35:31,299 --> 00:35:37,290 So, little bit of, background here on the MIPS instruction set. 440 00:35:37,290 --> 00:35:41,960 Jump and link implicitly writes to register 31. 441 00:35:41,960 --> 00:35:45,458 It's the, the link register. Okay. 442 00:35:45,458 --> 00:35:48,916 So, so that means we have a data dependence. 443 00:35:48,916 --> 00:35:53,468 Where, where does the jump in link, what, what gets put into R31? 444 00:35:53,468 --> 00:35:58,372 So it's the program counter or the program counter plus four is what it is 445 00:35:58,372 --> 00:36:02,084 architected as in MIPS. You could probably build it either way 446 00:36:02,084 --> 00:36:06,778 depending on how you do jump register. So on first, first look this looks like it 447 00:36:06,778 --> 00:36:12,046 should actually like solve a lot of problems we should be able to by pass our 448 00:36:12,046 --> 00:36:15,886 result of our jump in link to right where it needs to go. 449 00:36:15,886 --> 00:36:18,843 Mm. It's a little unsatisfying though, 450 00:36:18,843 --> 00:36:24,438 because, if you look at the rest of the pipe here, if you have a jump. 451 00:36:24,438 --> 00:36:30,585 Let's say in the execute stage. How, how are, you know, is, is the 452 00:36:30,585 --> 00:36:34,978 consumer of that instruction gonna be here or not? 453 00:36:34,978 --> 00:36:40,891 So this one's kind of a trick question. So, does it help? 454 00:36:40,891 --> 00:36:45,657 Well, you can bypass out and around, but the thing after it in the pipe is probably 455 00:36:45,657 --> 00:36:50,691 not going to be appropriate instruction. So, if I were to answer this question, 456 00:36:50,691 --> 00:36:53,021 does it help? I'd probably say no. 457 00:36:53,021 --> 00:36:58,019 Because at least in the pipe drawing here, you're not gonna be executing the 458 00:36:58,019 --> 00:37:03,055 subsequent instruction, or executing the, even if this is, the instruction at 500, 459 00:37:03,055 --> 00:37:06,032 you're not going be actually executing that. 460 00:37:06,032 --> 00:37:11,036 You'll probably have to, sort of, wait for that, jump to resolve somewhere further 461 00:37:11,036 --> 00:37:16,046 down the pipe, and, then go pick it up. So the bypass, bypasses don't always help. 462 00:37:16,046 --> 00:37:20,093 And, especially in something where it's not a, a fully bypassed pipeline. 463 00:37:20,093 --> 00:37:24,011 Okay. Oh, before I move off this slide, I wanted 464 00:37:24,011 --> 00:37:29,057 to, say that this is called bypassing. Sometimes this is also called, forwarding 465 00:37:29,057 --> 00:37:32,044 values. And we're gonna be using those terms 466 00:37:32,044 --> 00:37:38,010 interchangeably in this class. Okay so now, now we get more, more details 467 00:37:38,010 --> 00:37:40,060 here. We start to look at how to derive the 468 00:37:40,060 --> 00:37:43,085 bypass signal. We're gonna do, we're gonna build this the 469 00:37:43,085 --> 00:37:48,056 same way we derived the stall signal and we're gonna take terms out of the stall 470 00:37:48,056 --> 00:37:53,028 calculation we had before. So, if you will recall, we have the 471 00:37:53,028 --> 00:37:56,010 pipeline diagram here with the stall signal. 472 00:37:56,010 --> 00:38:01,023 We have stalled sa, stages, and we ended up having to stall in this case where we 473 00:38:01,023 --> 00:38:04,063 have a ALU op followed by an ALU op that's dependent. 474 00:38:05,091 --> 00:38:11,066 So each, each stall or kill introduces a bubble into the pipeline and this is gonna 475 00:38:11,066 --> 00:38:18,038 give us a clocks per instruction over one. With that new data path, which bypasses 476 00:38:18,038 --> 00:38:24,019 out of the ALU into input operand A, we can see that we actually can remove all 477 00:38:24,019 --> 00:38:30,015 these stalls and just do the bypassing. So, it actually shrinks the time taking to 478 00:38:30,015 --> 00:38:34,071 execute this code. And this new data path is really a great 479 00:38:34,071 --> 00:38:38,083 thing here. This bypass has taken us from greater than 480 00:38:38,083 --> 00:38:42,088 one clock per instruction to one clock per instruction. 481 00:38:44,051 --> 00:38:54,003 And we're actually forwarding out of the execution unit for one into the decode 482 00:38:54,003 --> 00:39:00,018 unit here and gets consumed here. And the execution unit in time three for 483 00:39:00,018 --> 00:39:08,005 the, the instruction two. Okay so let's drive the bypass signal. 484 00:39:09,021 --> 00:39:12,056 And we'll start off by looking at our original stall signal. 485 00:39:12,056 --> 00:39:15,017 So this is just the stall signal we had before. 486 00:39:15,063 --> 00:39:22,036 And, first thing we're gonna do. Is if you look at this, this case right 487 00:39:22,036 --> 00:39:34,029 here where we compared, the execute right destination to the, decode first source 488 00:39:34,029 --> 00:39:37,822 operand. We don't need that anymore. 489 00:39:37,822 --> 00:39:41,464 We added a bypass. And we added a forwarding signal to handle 490 00:39:41,464 --> 00:39:45,545 that case. So just sort of put a line through that. 491 00:39:45,545 --> 00:39:52,097 Okay, the next question that comes up is we had in this diagram here, we added this 492 00:39:52,097 --> 00:39:57,061 multiplexer here, to choose between reading from the register file and reading 493 00:39:57,061 --> 00:40:00,076 the data which came out of the arithmetic logic unit. 494 00:40:02,094 --> 00:40:07,027 And we need to ask, what is the control on that multiplexer? 495 00:40:07,027 --> 00:40:11,031 Well, it's the exact same case that we just crossed out. 496 00:40:11,031 --> 00:40:14,069 When that case is true, we wanna do the bypass. 497 00:40:14,069 --> 00:40:18,030 So we just take that, those terms, and put it here. 498 00:40:18,030 --> 00:40:23,030 And that's actually the control in the multiplexer, a source here. 499 00:40:25,076 --> 00:40:28,042 Is this correct? Hm. 500 00:40:28,042 --> 00:40:38,046 Is this the full story? Well, unfortunately no. 501 00:40:39,050 --> 00:40:44,086 It's close, really close. It looks like it should work, but 502 00:40:44,086 --> 00:40:53,047 unfortunately. Only ALU and arithmetic logic immediate 503 00:40:53,047 --> 00:41:01,073 instructions can benefit from this. If you have something like a load, you 504 00:41:01,073 --> 00:41:05,071 need to wait for the data value to show up. 505 00:41:06,003 --> 00:41:11,075 So, this, a source here needs to have some component saying, make sure it's a load, 506 00:41:11,075 --> 00:41:17,082 or make sure it's not a load, if you will. And up here, we actually reintroduce that 507 00:41:17,082 --> 00:41:21,082 term, checking to see whether it's something with a load. 508 00:41:22,086 --> 00:41:28,028 So what we're gonna do is we're gonna split the write-enable into two 509 00:41:28,028 --> 00:41:31,083 components. The write enable that you're bypassing and 510 00:41:31,083 --> 00:41:36,039 the write enable that you're stalling. And we're gonna re-introduce these sort of 511 00:41:36,039 --> 00:41:40,083 two components here with two slightly different write enables, dependent on the 512 00:41:40,083 --> 00:41:47,344 decode of the instructions in the execute stage of the pipeline, Okay so let's do 513 00:41:47,344 --> 00:41:52,027 that. And we replaced, we still have this term 514 00:41:52,027 --> 00:41:56,055 back in the stall signal. We still have this term in the, the 515 00:41:56,055 --> 00:42:00,091 bypassing signal, but we now have two different write enables. 516 00:42:00,091 --> 00:42:05,013 One for bypass calculation and one for a stall calculation. 517 00:42:05,013 --> 00:42:11,019 And the bypass calculation these two different signals are, calculated based on 518 00:42:11,019 --> 00:42:14,084 the decode of the instruction in the execute stage. 519 00:42:14,084 --> 00:42:20,755 So we bypass only when it's a, a, arithmetic logic unit, or an immediate 520 00:42:20,755 --> 00:42:27,935 arithmetic instruction and let's say the destination is not zero. 521 00:42:27,935 --> 00:42:38,033 And we stall if it's a load or a jump in link or jump in link register also falls 522 00:42:38,033 --> 00:42:43,061 into that case. And that's when we have to do the stall. 523 00:42:43,061 --> 00:42:52,043 Because the ju, jump in link, and jump in link register And at least. 524 00:42:52,043 --> 00:42:58,007 This data path we only have this multiplexer here, register 31 at the end. 525 00:42:58,007 --> 00:43:02,074 You can build data paths which have different multiplexer's for that and you 526 00:43:02,074 --> 00:43:05,053 might be able to remove that clause from this. 527 00:43:07,061 --> 00:43:16,082 Okay, so what I notice here is our loads, and jumping link registers, and jumping 528 00:43:16,082 --> 00:43:21,133 links are gonna stall when we have a match on the registers. 529 00:43:21,133 --> 00:43:26,536 And something like ALU instructions generally, write and able to bypass is 530 00:43:26,536 --> 00:43:31,030 going to not stall and use the bypass of the following logic. 531 00:43:31,030 --> 00:43:36,524 Okay, so let's take a look at what this looks like for a fully by-pass data path. 532 00:43:36,524 --> 00:43:41,802 So our fully by-pass data path we're going to add all the destinations locations out 533 00:43:41,802 --> 00:43:45,614 of here, out of here. We're going to run that back and we're 534 00:43:45,614 --> 00:43:50,329 going to add to big multiplexors here. Because in our first case we only 535 00:43:50,329 --> 00:43:54,488 multiplexed for the first source operand, or the A source operand. 536 00:43:54,488 --> 00:43:59,419 But we want actually want to multiplex the inputs for a and b the two source 537 00:43:59,419 --> 00:44:03,515 operands. And we're also gonna add this PC here for 538 00:44:03,515 --> 00:44:10,230 the jump and link that handles some of the more complex pieces here because we, 539 00:44:10,230 --> 00:44:17,263 otherwise we have to put multiplexers here for sort of R31 of multiplexing in the PC 540 00:44:17,263 --> 00:44:22,197 or something like that. So we've effectively be able to bypass 541 00:44:22,197 --> 00:44:26,137 everything here. So the question here is, is there still a 542 00:44:26,137 --> 00:44:31,279 need for the stall signal. So this is more than what we had before. 543 00:44:31,279 --> 00:44:36,096 This is more than just a source. We now can bypass out of not only here to 544 00:44:36,096 --> 00:44:40,229 there but we can bypass out of after the memory operations. 545 00:44:40,229 --> 00:44:45,487 So maybe this changes our stall signal so that we don't need to stall on loads 546 00:44:45,487 --> 00:44:47,044 anymore. That'd be great. 547 00:44:47,044 --> 00:44:51,727 They'd have better performance. Well, unfortunately, no. 548 00:44:51,727 --> 00:44:56,294 We still need this. You still need to check. 549 00:44:56,294 --> 00:45:03,365 If the opcode is, is a, is a load, in this stage of the pipe, even with a fully 550 00:45:03,365 --> 00:45:10,091 bypassed data path. So we've, we've resolved a bunch of the 551 00:45:10,091 --> 00:45:16,198 data hazards but the loads, still need to, wait, or the instructions dependent on 552 00:45:16,198 --> 00:45:21,879 loads still need to wait, because you don't know, the results of the value, 553 00:45:21,879 --> 00:45:26,337 until, you come out of here. So you can't issue a subsequent 554 00:45:26,337 --> 00:45:30,349 instruction, into the ALU stage early. You need to stall. 555 00:45:30,349 --> 00:45:34,643 But, this is basically our full stall calculation at this point. 556 00:45:34,643 --> 00:45:39,910 Because we add all those bypasses, we've removed a lot of the other complexity from 557 00:45:39,910 --> 00:45:44,897 our stall signal. And in this case you'll see that loads 558 00:45:44,897 --> 00:45:50,249 have a latency of two cycles. Okay. 559 00:45:50,249 --> 00:45:55,047 So, as I said, the last technique you can look at is speculation, where you try to 560 00:45:55,047 --> 00:45:56,940 guess things. Guess data values. 561 00:45:56,940 --> 00:46:00,275 Guess things like that, or try to execute code out of order. 562 00:46:00,275 --> 00:46:03,371 We're gonna talk about that later in the course. 563 00:46:03,371 --> 00:46:05,761 That's, that's not really in today's lecture. 564 00:46:05,761 --> 00:46:11,551 It's not really review material, but we will, we will discuss that to some, some 565 00:46:11,551 --> 00:46:14,504 extent. So now we're gonna move on to talking 566 00:46:14,504 --> 00:46:19,535 about control hazards, and because we're running a little low on time we'll look at 567 00:46:19,535 --> 00:46:22,014 that a little bit more in the next lecture.