1 00:00:03,062 --> 00:00:09,658 So in, in this processor we're going to have basically three pipelines here, a 2 00:00:09,658 --> 00:00:17,054 long multiply pipeline, a memory pipe that say takes two cycles, and then a short ALU 3 00:00:17,054 --> 00:00:24,657 pipe here on the top. We have a scoreboard like we had before. 4 00:00:24,657 --> 00:00:32,056 And this is going to track where, where data is available in the pipe. 5 00:00:38,088 --> 00:00:45,048 Where data is available in the pipe and architectural register file is surfing at 6 00:00:45,048 --> 00:00:48,078 the end. So this was roughly similar to the in 7 00:00:48,078 --> 00:00:54,024 order issue, in order right back, in order commit processor, but there are some 8 00:00:54,024 --> 00:00:58,063 interesting things here. If you compare this picture here to this 9 00:00:58,063 --> 00:01:02,034 picture here, we just dropped all these pipeline stages. 10 00:01:02,085 --> 00:01:06,009 So that's pretty cool. We don't have to bypass out of there 11 00:01:06,009 --> 00:01:08,084 anymore. We can just sort of shove the date in the 12 00:01:08,084 --> 00:01:12,068 architectural register file. And if we preserve read after write, write 13 00:01:12,068 --> 00:01:16,068 after write, and write after read dependencies things, things should be 14 00:01:16,068 --> 00:01:20,065 okay. Let's see if we can actually do that. 15 00:01:20,065 --> 00:01:26,038 So let's first take a look at the scoreboard for this in order issue, in 16 00:01:26,038 --> 00:01:32,098 order in order fetch, in order issue, out of order, execute, and right back, and out 17 00:01:32,098 --> 00:01:37,775 of order commit processor. So the, the scoreboard looks very similar 18 00:01:38,054 --> 00:01:42,060 to the in order, in order, in order, in order machine. 19 00:01:43,198 --> 00:01:47,087 And we can use it to track structural hazard on the right back port. 20 00:01:47,087 --> 00:01:52,047 And this is really important. If you try to have, let's say a multiply, 21 00:01:52,047 --> 00:01:56,668 and then an add after it, it's possible that the multiply and add under pipe 22 00:01:56,668 --> 00:02:01,081 lining might have to go use the right back port at the same time. 23 00:02:01,081 --> 00:02:06,061 So, you got structural hazard. And we're going to show a pipeline diagram 24 00:02:06,061 --> 00:02:09,068 of that happening in the, in the next slide. 25 00:02:12,009 --> 00:02:18,011 We still don't, don't actually need to have a more complex scoreboard. 26 00:02:18,011 --> 00:02:24,096 So briefly, someone asked in last lecture, in the scoreboard, do we need to track 27 00:02:24,096 --> 00:02:29,063 which functional unit each value is going down the pipe. 28 00:02:29,063 --> 00:02:33,053 We still need to do that for this relatively simple pipe, because for a 29 00:02:33,053 --> 00:02:37,016 write after write dependence, we are basically just going to stall. 30 00:02:37,016 --> 00:02:41,044 If you want to break that requirement, then you need to start tracking more 31 00:02:41,044 --> 00:02:45,084 complex things in the score board, and you may even want to have some thing like 32 00:02:45,084 --> 00:02:48,086 register name, which we'll talk about in the next class. 33 00:02:48,086 --> 00:02:54,122 That's going to allow you to basically break a write after write dependence 34 00:02:54,122 --> 00:03:00,085 dynamically in the processor. So an important point here because these 35 00:03:00,085 --> 00:03:08,325 pipe stages are different lengths now. In our scoreboard, we had different places 36 00:03:08,325 --> 00:03:12,221 where the bits, where the, where the entries in the scoreboard could be. 37 00:03:12,221 --> 00:03:16,406 But now if we go to execute an instruction, which is long, we are going 38 00:03:16,406 --> 00:03:21,263 to put it in one of sort of like the first entry in the scoreboard is going to march 39 00:03:21,263 --> 00:03:24,900 down every cycle as tracking the information that goes down the pipe. 40 00:03:24,900 --> 00:03:29,464 But if we are trying to let's say execute a add instruction, it doesn't actually 41 00:03:29,464 --> 00:03:33,467 have to wait four cycles to get into the architecture register file so we can 42 00:03:33,467 --> 00:03:38,542 actually insert here a one and then it just marks it down the balance of the pipe 43 00:03:38,542 --> 00:03:45,021 for a particular location. And this is what I was saying that because 44 00:03:45,021 --> 00:03:50,093 we are not going to allow write after writes hazards on a particular register, 45 00:03:52,018 --> 00:03:56,055 you're never going to have a case where there's basically multiple inversely 46 00:03:56,055 --> 00:04:00,063 ordered bits in this table. If you had that, you would actually have a 47 00:04:00,063 --> 00:04:05,054 more advanced scoreboard, and I'll show a picture of that a little later in today's 48 00:04:05,054 --> 00:04:10,046 lecture. Okay so let's, let's go through an 49 00:04:10,046 --> 00:04:16,062 examples of how to use the scoreboard and how to walk through a in order issue in 50 00:04:16,062 --> 00:04:22,041 order fetching issue, out of order execute and write back an out of order commit 51 00:04:22,041 --> 00:04:27,856 processor. So, here is the same code sequence we had 52 00:04:27,856 --> 00:04:36,019 in the previous example case. So we're going to be using this throughout 53 00:04:36,019 --> 00:04:40,062 all of class. Let's, let's take a look at this and sort 54 00:04:40,062 --> 00:04:45,012 of see how this pipeline diagram happens and notice a few things. 55 00:04:45,012 --> 00:04:51,287 First let's take a look at some read after write hazards, and what the pipeline has 56 00:04:51,287 --> 00:04:57,021 to do. Mul R5, R1, R4. 57 00:04:57,021 --> 00:05:04,082 This reads R1. Wire one is created by this multiply in 58 00:05:04,082 --> 00:05:14,920 instruction zero. So it actually has to wait to get the 59 00:05:14,920 --> 00:05:19,776 bypass here from. Instruction two is going to have to wait 60 00:05:19,776 --> 00:05:24,008 to get this value. So you can see here it's basically just 61 00:05:24,008 --> 00:05:26,094 stalled. And what's happening here is your in-order 62 00:05:26,094 --> 00:05:30,499 issue, you can't go try to issue subsequent instructions under that. 63 00:05:30,499 --> 00:05:34,995 Later in today's class we're going to be talking about pipelines, which actually 64 00:05:34,995 --> 00:05:39,348 have out-of-order issue such that while this is waiting around you can think about 65 00:05:39,348 --> 00:05:43,674 trying to go and issue the next instruction that's not dependant on that 66 00:05:43,674 --> 00:05:47,137 previous instruction. And by doing that we get more performance 67 00:05:47,137 --> 00:05:50,937 cause we can basically be re-ordering instructions and trying to use our 68 00:05:50,937 --> 00:05:53,732 functionally as use our ALUs as much as possible. 69 00:05:53,732 --> 00:06:02,444 But for right now, we have a bypass coming out of Y3 here out of this multiply down 70 00:06:02,444 --> 00:06:07,769 into the register file access stage of that next multiply. 71 00:06:07,769 --> 00:06:11,348 So that's sort of one thing that's going on. 72 00:06:11,608 --> 00:06:17,127 Let's take a look at another read after write dependence here. 73 00:06:17,127 --> 00:06:24,498 So, another read after write dependence is actually register eleven, gets written 74 00:06:24,498 --> 00:06:30,198 here and read down there. Well, what are we doing special for this? 75 00:06:30,198 --> 00:06:33,259 So, instead of marches on the pipe, the right happens here. 76 00:06:33,259 --> 00:06:37,488 Let's see where the read for this instruction, instruction four tries to 77 00:06:37,488 --> 00:06:39,929 happens. Well, the read tries to happen there. 78 00:06:39,929 --> 00:06:43,922 Well, at that point, the data's actually in the architectural register file. 79 00:06:43,922 --> 00:06:48,321 We don't have to worry about any funny, funny bypassing or anything like that. 80 00:06:48,501 --> 00:06:52,795 And you can sort of work through the rest of the, the, the, the things going on 81 00:06:52,795 --> 00:06:55,539 here. But there's a few other things I wanted to 82 00:06:55,539 --> 00:06:59,063 point out in this picture. First, because you have different pipe 83 00:06:59,063 --> 00:07:02,967 links, you can actually see that, let's say, this add here is running the 84 00:07:02,967 --> 00:07:08,314 architectural register file, before a previous instruction writes the register 85 00:07:08,314 --> 00:07:13,461 file in program order. And this has some, some consequences, some 86 00:07:13,461 --> 00:07:19,811 large consequences when you're starting to think about if, let's say, this 87 00:07:19,811 --> 00:07:26,556 instruction here, the multiply that preceded the add took some sort of fault 88 00:07:26,556 --> 00:07:31,351 or took some sort of exception. Because now, you basically change the 89 00:07:31,351 --> 00:07:37,266 architecture register file before anyone, before that other instruction has finished 90 00:07:37,266 --> 00:07:40,595 and other instruction didn't actually finish. 91 00:07:40,595 --> 00:07:43,481 So what does, what does that mean? Whoa. 92 00:07:43,724 --> 00:07:49,549 One other thing I wanted to point out in this picture, which is a really 93 00:07:49,549 --> 00:07:57,476 interesting case is right here. So this add instruction here is dependent 94 00:07:57,476 --> 00:08:00,805 on R12. R12 gets created here. 95 00:08:00,805 --> 00:08:05,246 And basically at the end of the stage is ready to bypass. 96 00:08:05,246 --> 00:08:11,136 So if we look down we'll say, well, this instruction here we don't even try to read 97 00:08:11,136 --> 00:08:14,734 from the bypass until here, so that value is ready. 98 00:08:14,734 --> 00:08:18,480 But for some weird reason this instruction stalls. 99 00:08:18,480 --> 00:08:24,331 Can anyone see what's going on with that instruction? 100 00:08:24,331 --> 00:08:33,317 All of its inputs are ready. It's ready to go. 101 00:08:33,317 --> 00:08:36,575 It's a party to go to, but there's a issue. 102 00:08:36,575 --> 00:08:39,820 Okay. So that, that is what's happening here, is 103 00:08:39,820 --> 00:08:48,088 that if you were to remove this I stall stage here, this would get pulled forward, 104 00:08:48,088 --> 00:08:53,024 and now you'd have this writing and that writing at the same time. 105 00:08:53,024 --> 00:08:58,047 So they should have a structural hazard on the right port of the register file there. 106 00:08:58,047 --> 00:09:02,071 So you have to, to, to, to stall. Okay, so let's, let's take a look at how 107 00:09:02,071 --> 00:09:08,006 that shows up in the reorder excuse me, how it happens that show's up in the 108 00:09:08,006 --> 00:09:13,037 scoreboard. So when we go to sort of look at this so 109 00:09:13,037 --> 00:09:15,084 what's, what's we have here is we have cycles. 110 00:09:15,084 --> 00:09:19,096 There's eighteen cycles across the top. Or actually, there's nineteen cycles 111 00:09:19,096 --> 00:09:23,003 across the top. The first one's zero, but I don't draw 112 00:09:23,003 --> 00:09:29,045 that. If we look at, let's say, this cycle here, 113 00:09:29,045 --> 00:09:35,049 instruction one which this add is in the I stage. 114 00:09:35,049 --> 00:09:44,011 So it's in the issue stage of the pipe. So it's looking for it's operands 115 00:09:44,011 --> 00:09:52,221 basically. And what's going happen is this add needs 116 00:09:52,221 --> 00:10:00,733 to check to make sure that it's not going to conflict on the right port of the 117 00:10:00,733 --> 00:10:04,666 previous MUL. In this case it doesn't actually happen. 118 00:10:04,666 --> 00:10:10,223 But when this instruction moves here, so this is what I was saying is that it 119 00:10:10,223 --> 00:10:15,380 doesn't actually put 1s in the four locations, instead it marches down. 120 00:10:15,380 --> 00:10:21,068 Instead its going to, start here at the before two cycles to go before it writes 121 00:10:21,068 --> 00:10:24,764 to the register file, because the pipe is shorter. 122 00:10:24,764 --> 00:10:31,123 So it has to check this location, and says, well, for my register that I'm 123 00:10:31,123 --> 00:10:37,804 trying to write, is anything else currently scheduled to write that location 124 00:10:37,804 --> 00:10:42,071 in two cycles And our scoreboard can answer that question. 125 00:10:42,071 --> 00:10:48,434 If there was a one in this box, yes. We would know that there was a MUL or some 126 00:10:48,434 --> 00:10:53,505 other instruction that had long leniency that you would get conflicts. 127 00:10:53,505 --> 00:11:00,282 So in this instruction we're to move here. This one would also, clocks every cycle, 128 00:11:00,282 --> 00:11:04,406 and moves down our scoreboard. It's going to conflict at that point, and 129 00:11:04,406 --> 00:11:09,429 we're going to know we're going to have a write hazard on the register file. 130 00:11:09,429 --> 00:11:14,240 So we can sort of, we can sort of see those things happening in our scoreboard. 131 00:11:14,240 --> 00:11:18,355 And then we, we could also use our scoreboard to actually detect that real 132 00:11:18,355 --> 00:11:22,467 case, this case here, this last instruction, this last add, we're going to 133 00:11:22,467 --> 00:11:29,513 see that show up over here. So let's, let's try and find that. 134 00:11:29,513 --> 00:11:39,066 Okay, so, we have instruction six. It wants to basically move forward in the 135 00:11:39,066 --> 00:11:42,089 pipe, but it checks this location here and says, okay. 136 00:11:42,089 --> 00:11:47,066 Or, or in this cycle, here, Instruction six basically should be the issue stage, 137 00:11:47,066 --> 00:11:52,019 and doesn't move out of issue stage. It's, it's sitting in the issue stage. 138 00:11:52,019 --> 00:11:56,091 It looks, in this location, which is basically two, two cycles till the end of 139 00:11:56,091 --> 00:11:59,058 the pipe, and sees that there's a one there. 140 00:12:00,032 --> 00:12:05,447 The, the box is trying to indicate that. So it looks there, and says, oh, there's a 141 00:12:05,447 --> 00:12:08,503 one there. That means I can't issue the stage and I 142 00:12:08,503 --> 00:12:11,303 need to stall, and we get the stall showing up. 143 00:12:11,303 --> 00:12:17,373 These other boxes, are here to donate the here to represent the other adds that are 144 00:12:17,373 --> 00:12:22,304 happening in the other MULs and things so we actually check with these other 145 00:12:22,304 --> 00:12:27,496 locations, actually this is just the other one add, these are, we are going to check 146 00:12:27,496 --> 00:12:33,205 for this add here, this add here, this add checks here and sees a conflict and has to 147 00:12:33,205 --> 00:12:39,452 check again the next cycle that's why there's four little boxes vertically on 148 00:12:39,452 --> 00:12:46,091 that, on that chart. Other things, that this is a different 149 00:12:46,091 --> 00:12:50,297 representation here. You can see R1 is being written and has a 150 00:12:50,297 --> 00:12:55,106 long lines in the pipe. This other register has a shorter or other 151 00:12:55,106 --> 00:13:00,497 register has a shorter life in this time, because in a, in our scoreboard, because 152 00:13:00,497 --> 00:13:02,395 it's an add instruction. Okay. 153 00:13:02,395 --> 00:13:08,320 So, do we have any questions about that before we move on to a more complex pipe? 154 00:13:08,320 --> 00:13:13,652 So this is assuming a fixed latency for every instruction, that is correct. 155 00:13:13,652 --> 00:13:17,559 Or at least per pipe or function unit in the pipe. 156 00:13:17,559 --> 00:13:22,091 You can definitely have function units which have variable leniency. 157 00:13:22,091 --> 00:13:27,010 So an example of that is, well there's sort of two good examples of that. 158 00:13:27,010 --> 00:13:31,585 One is something like a divider unit. Sometimes people build divider units, so 159 00:13:31,585 --> 00:13:36,705 that you keep dividing until you're done. And it's sort of a way to shorten the 160 00:13:36,705 --> 00:13:40,559 length of a divide. So that sometimes has a variable length. 161 00:13:40,559 --> 00:13:45,394 Another good example of this is something like a load, that misses in your cache. 162 00:13:45,394 --> 00:13:49,718 An out of order processor, and you have to wait for the load to come back. 163 00:13:49,718 --> 00:13:52,678 Good ways to handle that actually in a scoreboard. 164 00:13:52,678 --> 00:13:57,490 Sometimes scoreboards will just have an extra sort of special bit on the side for 165 00:13:57,490 --> 00:14:02,129 each destination register which says, this register's just out to lunch, you know, I, 166 00:14:02,129 --> 00:14:06,074 it's in some long variable length pipeline, I don't know what's happening on 167 00:14:06,074 --> 00:14:09,015 it. Don't try to bypass it, don't try to do 168 00:14:09,015 --> 00:14:13,005 anything special with it. And just wait for it to come back and that 169 00:14:13,005 --> 00:14:16,232 bit clears. So processors I built for these variable 170 00:14:16,232 --> 00:14:20,005 length instructions we'll typically just have an extra bit. 171 00:14:20,005 --> 00:14:26,031 For maybe the different functional units, maybe the divider, and one for the load 172 00:14:26,031 --> 00:14:30,073 miss case, or something like that, such that, you know, if that exceptional case 173 00:14:30,073 --> 00:14:35,037 happens, or if the load misses, or if the you go ahead and take a divide, which has 174 00:14:35,037 --> 00:14:39,063 a variable length, cuz divide can take anywhere from, like, two cycles up to, 175 00:14:39,063 --> 00:14:44,004 like, twenty cycles in some pipelines. And in every, everything in the middle. 176 00:14:44,004 --> 00:14:47,065 You'll just mark a bit saying, this register's not ready, in the scoreboard. 177 00:14:47,065 --> 00:14:51,025 And then, if someone tries to go read that register, it just knows to stall. 178 00:14:51,051 --> 00:14:55,068 So it's a slower performance sort of way to deal with that but that's a, that's a, 179 00:14:55,068 --> 00:14:59,655 that's a tough, tough case to handle. A scoreboard can help there and it's a 180 00:14:59,655 --> 00:15:02,034 sort of extra information in the scoreboard. 181 00:15:03,046 --> 00:15:08,053 Okay so like I said we have this out of order commit processor. 182 00:15:08,053 --> 00:15:14,042 It's doing out of order write back and it's doing out of order commit. 183 00:15:14,042 --> 00:15:17,526 Oh. Well, out of order write back maybe okay 184 00:15:17,769 --> 00:15:23,218 we maintain our write after right dependency so we're not actually going to 185 00:15:23,218 --> 00:15:29,251 end up with inflex state in the architecture register file because of that 186 00:15:29,251 --> 00:15:35,232 but something bad can happen is what happens if we go and try to take an 187 00:15:35,232 --> 00:15:39,639 exception. So let's say we have our same instruction 188 00:15:39,639 --> 00:15:43,456 sequence that we've been looking at up until this point. 189 00:15:43,456 --> 00:15:48,023 And here, we're wandering around. We're going down the pipe. 190 00:15:48,023 --> 00:15:53,786 And, this instruction here take some sort of fault and its figure it out at the end 191 00:15:53,786 --> 00:15:57,874 of the pipe at our commit stage so the multiply goes all the way to down to the 192 00:15:57,874 --> 00:16:00,551 end. We end up with I don't know, multiplies 193 00:16:00,551 --> 00:16:05,785 don't take a whole lot of great faults but let's say it takes some sort of exception. 194 00:16:05,785 --> 00:16:10,642 What, what is, what is going to happen? Well, that instruction is dead, all the 195 00:16:10,642 --> 00:16:14,064 other instructions are dead, cuz it took a fault. 196 00:16:14,064 --> 00:16:18,022 Unfortunately, we already wrote the register file. 197 00:16:18,022 --> 00:16:20,004 Done, done, done, done, done. Yeah. 198 00:16:20,004 --> 00:16:25,087 Okay, so now we end up in our trap handler or our exception handler or interrupt 199 00:16:25,087 --> 00:16:32,315 handler and all of a sudden register eleven is just wrong, it has the wrong 200 00:16:32,315 --> 00:16:37,030 architectural value. So this is one of the reasons why people 201 00:16:37,030 --> 00:16:40,075 should try not to build out of order commit. 202 00:16:40,075 --> 00:16:46,912 It gets, gets it gets tricky to have out of order commits with precise exceptions. 203 00:16:46,912 --> 00:16:50,259 Now there are, there are some ways to do it. 204 00:16:50,489 --> 00:16:53,786 So one way is limit the types of instructions. 205 00:16:53,786 --> 00:16:58,981 So if you have a in order issue, out of order commit, out of order write back, 206 00:16:58,981 --> 00:17:05,957 what you could think about doing is, you know that this doesn't write until this 207 00:17:05,957 --> 00:17:09,779 point here. So what if we resolve all of our previous 208 00:17:09,779 --> 00:17:14,051 let's say, all of our let's say all of our previous exceptions. 209 00:17:14,051 --> 00:17:21,677 If we move our commit point earlier in the processor, we can actually make this work 210 00:17:21,677 --> 00:17:26,771 and have precise exceptions. So if our commit point, let's say, is in 211 00:17:26,771 --> 00:17:32,372 the either memory one stage, or first stage of the multiplier or something like 212 00:17:32,372 --> 00:17:38,102 that, at that point, you know, we haven't written any other state here that's wrong. 213 00:17:38,102 --> 00:17:43,068 That write back hasn't happened, so we can still kill everything down. 214 00:17:43,068 --> 00:17:49,046 Unfortunately, that means that you can't have an exception happening here, here or 215 00:17:49,046 --> 00:17:52,326 anywhere else sort of like, later in your pipe. 216 00:17:52,326 --> 00:17:56,336 So that's, that's a problem. So you, you can limit the types of 217 00:17:56,336 --> 00:18:01,658 exceptions, push your commit point early, and still have an out of order commit 218 00:18:01,658 --> 00:18:05,835 processor with precise exceptions. But that even is tricky. 219 00:18:05,835 --> 00:18:10,521 So, so, this is a great question. So why can we not have two commit points? 220 00:18:10,521 --> 00:18:15,355 Some processors do have two commit points. And some processors will have a, it's 221 00:18:15,355 --> 00:18:19,894 called sliding commit point. So that you try to commit things, sort of, 222 00:18:19,894 --> 00:18:25,361 early, and then if something else for certain types of instructions, you can 223 00:18:25,361 --> 00:18:29,054 move the commit point later. But typically, you want to have a big 224 00:18:29,054 --> 00:18:34,076 point in one place where you say after this point in the pipe all of the state 225 00:18:34,076 --> 00:18:38,688 has past this has been committed and those instructions cannot be rolled back and 226 00:18:38,688 --> 00:18:42,094 those instructions cannot be undone. But there are examples of things where 227 00:18:42,094 --> 00:18:47,067 people will have a sliding commit point. I've actually built a processor which has 228 00:18:47,084 --> 00:18:51,024 a moving commit point. But it's, it get tricky, because what it 229 00:18:51,024 --> 00:18:55,059 basically means is certain types of instructions cannot execute after certain 230 00:18:55,059 --> 00:18:59,026 other types of instructions. Because if they do, these will violate 231 00:18:59,026 --> 00:19:02,027 that, that sliding commit point. Like this example here. 232 00:19:02,072 --> 00:19:06,029 If, if the fault can be taken here, there's no way to solve this problem. 233 00:19:06,029 --> 00:19:10,022 But you could have something, whereas a sliding commit point, where, if you have 234 00:19:10,022 --> 00:19:14,005 these, a mall followed by, let's say, an add, you can actually sort of slide the 235 00:19:14,005 --> 00:19:17,002 commit point out. So there are processor ideas that you can 236 00:19:17,002 --> 00:19:20,503 try to have a sliding commit point. But otherwise you have to, you have to 237 00:19:20,503 --> 00:19:25,678 check, that gets quite a bit complicated. But I don't want to really get into that, 238 00:19:25,678 --> 00:19:28,305 today. Let's, let's leave that for sort of 239 00:19:28,305 --> 00:19:32,124 advanced topics discussion. But in, in, what we're going talk about in 240 00:19:32,124 --> 00:19:35,451 this class, we're going say we want to have one commit point. 241 00:19:35,451 --> 00:19:40,101 We want it to stay one place in the pipe. Is the canonical location that pass that 242 00:19:40,101 --> 00:19:45,565 point all the data that is inflight is, has executed and is committed and we need 243 00:19:45,565 --> 00:19:48,040 to know sort of one location for that.