1 00:00:03,043 --> 00:00:08,044 Okay, lets get started. So today we're gonna be continuing our 2 00:00:08,067 --> 00:00:12,090 venture into out of order processors and super scours. 3 00:00:12,090 --> 00:00:19,023 And we're gonna be, start talking about how to fix write after write dependencies, 4 00:00:19,023 --> 00:00:24,000 and write after read dependencies in, in a processor pipeline. 5 00:00:24,000 --> 00:00:30,065 We're also gonna talk a little bit about one of the questions we had last time. 6 00:00:30,088 --> 00:00:36,061 About on a branch what do you do to the Reorder buffer and how do you clean up the 7 00:00:36,061 --> 00:00:40,079 state in the reorder buffer. So let's, let's start off, so, so, a 8 00:00:40,079 --> 00:00:44,063 little bit about what we're doing today. We're gonna talk about speculation 9 00:00:44,063 --> 00:00:47,014 branches. As I said, we're gonna review about that. 10 00:00:47,030 --> 00:00:51,034 And then answer, answer the question we had last time about what happens to the 11 00:00:51,034 --> 00:00:53,090 reorder buffer when you have a speculative branch. 12 00:00:53,090 --> 00:00:58,017 And what are good strategies there. Then we're going to talk about registry 13 00:00:58,017 --> 00:01:02,055 naming and how to break write after write and write after read dependencies. 14 00:01:02,055 --> 00:01:06,093 And then we're going to talk about memory disambiguation, if we have time. 15 00:01:07,010 --> 00:01:11,094 And memory disambiguation is basically figuring out how to have loads and stores 16 00:01:11,094 --> 00:01:16,072 execute out, or out of order relative to each other and figuring out how to get the 17 00:01:16,072 --> 00:01:19,043 right data for a particular load, after a store. 18 00:01:19,043 --> 00:01:26,075 And we call that memory disambiguation. Okay so let's start off by looking once 19 00:01:26,075 --> 00:01:33,017 again at our in order, in order, in order, in order pipeline or i4 for short and 20 00:01:33,017 --> 00:01:38,097 let's look at what happens on a branch. So here we have some code. 21 00:01:38,097 --> 00:01:43,006 Instruction three is a branch. Branches to target T here. 22 00:01:43,006 --> 00:01:47,066 And it's executing along. And because everything is in order we 23 00:01:47,066 --> 00:01:53,065 don't actually have to worry about, any form of control hazard really happening 24 00:01:53,065 --> 00:01:56,093 here. Or, or any, form of real, really bad 25 00:01:56,093 --> 00:02:02,070 hazard happening here if we can just basically reach behind us and kill all of 26 00:02:02,070 --> 00:02:08,003 the instructions, behind us. None of them will have committed any state 27 00:02:08,003 --> 00:02:12,092 at that point. So this is, this is pretty nice, so none 28 00:02:12,092 --> 00:02:17,928 of the, we did have speculative instructions here, these three ads, and 29 00:02:17,928 --> 00:02:23,035 they started doing stuff. But, they don't have a chance to get to 30 00:02:23,035 --> 00:02:27,062 the right back state of the pipe. So they don't even touch the physical 31 00:02:27,062 --> 00:02:30,075 register file. So we don't have to clean up, physical 32 00:02:30,075 --> 00:02:34,007 register file. We don't have to clean up any of the, the, 33 00:02:34,007 --> 00:02:37,056 the state here. We do need to, reset the scoreboard when 34 00:02:37,056 --> 00:02:41,096 we take, a, branch mispredict here. And the speculative instruction, the 35 00:02:41,096 --> 00:02:45,093 speculative state is wrong. But otherwise, everything, everything is 36 00:02:45,093 --> 00:02:49,044 okay. Okay so life gets a little more 37 00:02:49,044 --> 00:02:54,075 complicated. When we start to look at in order. 38 00:02:55,008 --> 00:03:01,079 Fetch, in order, instruction fetch in order, instruction issue out of order. 39 00:03:01,079 --> 00:03:05,093 Executing right back and out of order. Commit. 40 00:03:06,090 --> 00:03:12,068 So, here is, here is our pipeline diagram. And, what's interesting to see here, is 41 00:03:12,068 --> 00:03:17,075 here we have our branch, note that the, sort of, what is happening in the 42 00:03:17,075 --> 00:03:23,033 instructions moves around a little bit. Well, besides that, nothing, nothing much 43 00:03:23,033 --> 00:03:28,033 else is really changing here. We're still able to catch our speckle of 44 00:03:28,033 --> 00:03:33,040 instructions, before they write back. And what's key here is, we're doing 45 00:03:33,040 --> 00:03:37,063 in-order issue. And it's because we do the in order issue 46 00:03:37,063 --> 00:03:42,056 that the subsequent speculative instructions, while they are speculative, 47 00:03:42,056 --> 00:03:46,074 they're not gonna run ahead and write back early if you will. 48 00:03:46,074 --> 00:03:52,029 And what that basically mean is we'd have like a, like a, for this add instruction, 49 00:03:52,029 --> 00:03:57,098 there's no W sitting here before this branch hits the branch resolution spot in 50 00:03:57,098 --> 00:04:01,054 the pipe. And well let's say we resolve our branches 51 00:04:01,054 --> 00:04:04,059 in X0. And then, so this branch here can just 52 00:04:04,059 --> 00:04:09,030 redirect the front of the pipe, it can, squash all the subsequent instructions, 53 00:04:09,030 --> 00:04:12,065 and reset the scoreboard. So life, life is relatively easy. 54 00:04:13,053 --> 00:04:18,097 In order issue or, excuse me, in order fetch, in order issue, out of order 55 00:04:18,097 --> 00:04:25,038 writeback, in order commit. Gets a little more complicated here. 56 00:04:25,038 --> 00:04:31,064 Well, what's nice here is. We can prevent instructions from writing 57 00:04:31,064 --> 00:04:37,015 the physical register file for the same reason as the other pipe, because we did 58 00:04:37,015 --> 00:04:41,000 in order issue. The subsequent respective instructions can 59 00:04:41,000 --> 00:04:45,032 not go execute any earlier. And we know that, you know, it's pretty 60 00:04:45,032 --> 00:04:50,083 quick to go actually issue the or, it's pretty quickly after we issue the branch 61 00:04:50,083 --> 00:04:53,069 instruction that we can resolve the target. 62 00:04:53,069 --> 00:04:58,093 So, there's maybe, a little bit of shadow there, but if it's one cycle nothing is 63 00:04:58,093 --> 00:05:02,045 going to be able to the right back stage of the pipe. 64 00:05:03,054 --> 00:05:07,095 Now we do need to clean up the reorder buffer, because this pipe, starts to have, 65 00:05:08,012 --> 00:05:11,036 a reorder buffer. So it's not just a, scoreboard that we 66 00:05:11,036 --> 00:05:14,037 need to clean up. But we need to actually clean up the 67 00:05:14,037 --> 00:05:17,038 reorder buffer here. And, and we have, we have an option. 68 00:05:17,038 --> 00:05:21,074 We can either remove from the reorder buffer immediately, or we can wait until 69 00:05:21,074 --> 00:05:26,031 these other instructions sorta get to the commit stage to remove from the reorder 70 00:05:26,031 --> 00:05:29,005 buffer. And this is the question we had last time. 71 00:05:29,005 --> 00:05:32,068 And I'll address them in two more slides in a little more detail. 72 00:05:33,089 --> 00:05:38,013 Okay so now we start to get to out of order issue processors. 73 00:05:38,013 --> 00:05:43,056 So here we have in order fetch, out of order issue, out of order right back and 74 00:05:43,056 --> 00:05:48,042 out of order commit processor. And if you recall this is the processor 75 00:05:48,042 --> 00:05:54,005 that we looked at last time that which we said could not have precise interrupts. 76 00:05:54,005 --> 00:05:57,053 Because you can have things basically right early. 77 00:05:57,053 --> 00:06:01,099 Well for that same reason. You can have instructions that write the 78 00:06:01,099 --> 00:06:06,084 register file or write the physical register file early in a pipeline like 79 00:06:06,084 --> 00:06:09,068 this. Or actually in this pipe there is both 80 00:06:09,068 --> 00:06:12,950 architectural register file, physical register files all together. 81 00:06:13,428 --> 00:06:17,083 But if you take a look here, let's take a look at this ad. 82 00:06:17,083 --> 00:06:22,081 This ad writes the register file before the branch which is dependent on the 83 00:06:22,081 --> 00:06:24,467 multiply has been resolved. Uh-oh. 84 00:06:24,467 --> 00:06:27,800 Well we just wrote, the architectural register file. 85 00:06:27,800 --> 00:06:33,116 We wrote non-roll, roll-backable state, if you will, or state that is not able to be 86 00:06:33,116 --> 00:06:36,549 rolled back. And we actually committed the wrong state 87 00:06:36,549 --> 00:06:41,304 and this speculative instruction was not supposed to have executed in the correct 88 00:06:41,304 --> 00:06:47,171 program order. So this is the same problem we see with 89 00:06:47,171 --> 00:06:52,939 imprecise exceptions showing up here. So, you know, one thing you could do is, 90 00:06:52,939 --> 00:06:58,721 if you have a pipeline like this, you can try to fix this by not having any form of 91 00:06:58,721 --> 00:07:02,274 control speculation. You can basically stall all these 92 00:07:02,274 --> 00:07:05,247 subsequent instructions here, these three adds. 93 00:07:05,247 --> 00:07:10,085 And, and, actually, potentially all the rest of these question mark instructions 94 00:07:10,085 --> 00:07:15,064 here until the, branch has been resolved. But that's gonna limit your performance. 95 00:07:15,064 --> 00:07:19,141 So there's a, there's a problem with this form of pipeline with out of order commit 96 00:07:19,141 --> 00:07:24,476 here, is that you have no way to sort of roll back any state. 97 00:07:24,476 --> 00:07:33,189 Okay, so this takes us to the, our pinnacle pipeline that we had last time. 98 00:07:33,189 --> 00:07:38,278 In order, issue, or excuse me, in order such out of order issue. 99 00:07:38,278 --> 00:07:43,338 Out of order executing right back and in order commit. 100 00:07:43,338 --> 00:07:47,575 And let's, let's, let's take a look at this. 101 00:07:47,575 --> 00:07:54,211 That's sort of competing questions here that we have to think about. 102 00:07:54,211 --> 00:08:01,310 First thing is we see that this actually does a write, right here before the branch 103 00:08:01,310 --> 00:08:04,974 is known. But conveniently, its writing a different 104 00:08:04,974 --> 00:08:08,588 data structure. Its not running our architectural stage. 105 00:08:08,588 --> 00:08:13,963 Its running our physical register file. And just like on a, interrupt of some 106 00:08:13,963 --> 00:08:19,692 form, we can rollback the architectural register file into the physical register 107 00:08:19,692 --> 00:08:23,445 file. We can do that for a branch here also, . 108 00:08:23,445 --> 00:08:30,059 So, one of the interesting questions that comes up is, where do we resolve? 109 00:08:30,059 --> 00:08:34,762 The branch. And when do we try to kill subsequent 110 00:08:34,762 --> 00:08:39,087 instructions? Do we try to do that right when the branch 111 00:08:39,087 --> 00:08:41,549 gets resolved? Oops. 112 00:08:41,549 --> 00:08:48,519 Or do we wait'til the branch commits. Hmm. 113 00:08:48,519 --> 00:08:55,093 Okay, so this, this is, actually goes back to the question we had last time of how 114 00:08:55,093 --> 00:08:59,503 easy is it to go clean up the re-order buffer, and how easy is it to go clean up 115 00:08:59,503 --> 00:09:03,636 the physical register file. So let's take a look at this example here. 116 00:09:03,636 --> 00:09:08,707 Now, having said that, this is all doable. People who build pipelines actually do go 117 00:09:08,707 --> 00:09:11,814 clean up all these sort of in-flight instructions. 118 00:09:11,814 --> 00:09:14,483 Well, let's look at the complexity of that. 119 00:09:14,483 --> 00:09:22,659 So here we have, right when we know the branch gets resolved we actually Kill all 120 00:09:22,659 --> 00:09:30,077 of these instructions. And we redirect the front of the pipe to 121 00:09:30,077 --> 00:09:36,738 go fetch our target, Our, our true target. Well let's go look what's happening in the 122 00:09:36,738 --> 00:09:41,668 physical register file for this case. So in this case in the physical register 123 00:09:41,668 --> 00:09:46,649 file this mall has been in the in the physical register file that's a good 124 00:09:46,649 --> 00:09:49,518 value. We don't wanna keep that mall. 125 00:09:49,518 --> 00:09:54,392 This add here is also read in the physical register file. 126 00:09:54,392 --> 00:09:57,246 We don't want to keep that. Ugh. 127 00:09:57,246 --> 00:09:59,409 Life starts to get a lot more complicated here. 128 00:09:59,409 --> 00:10:03,963 In a pipeline like this, what we're really gonna have to do is we're gonna have to 129 00:10:03,963 --> 00:10:08,089 clean up speculative state in the physical register file and we're gonna have to do 130 00:10:08,089 --> 00:10:10,937 selective rollback. So instead of just taking the entire 131 00:10:10,937 --> 00:10:15,059 architecture register file and overwriting the physical register file, on rollback, 132 00:10:15,059 --> 00:10:19,307 we're gonna have to figure out which of these things were speculative and which of 133 00:10:19,307 --> 00:10:22,050 these things were not speculative. That's doable. 134 00:10:22,050 --> 00:10:26,428 Well, you probably need some structures to go do that, over what we've, over and 135 00:10:26,428 --> 00:10:28,756 above what we've already talked about in class. 136 00:10:28,756 --> 00:10:36,664 Or rather you, we would have to track which physical registers need to be rolled 137 00:10:36,664 --> 00:10:44,034 back on a, a speculation mis predict. Something a little bit easier. 138 00:10:44,079 --> 00:10:51,079 Is just a wait to the commit stage. So if we wait to the commit stage, we can 139 00:10:51,079 --> 00:10:56,077 see here as we commit. Well, we know that all of the previous 140 00:10:56,077 --> 00:11:03,058 instructions of this branch have committed now to the architectural register file. 141 00:11:03,096 --> 00:11:08,016 So we know the architectural register file is up to date, relative to the branch, so 142 00:11:08,016 --> 00:11:11,091 we can, and then these other sets of instructions may or may not be in the 143 00:11:11,091 --> 00:11:14,796 physical register file. They, if the physical register file is 144 00:11:14,796 --> 00:11:19,062 completely outdated at this point. So what's nice here is we can copy the 145 00:11:19,062 --> 00:11:23,073 entire architectural register file to the physical register file, and effectively 146 00:11:23,073 --> 00:11:27,023 roll back everything. Okay. 147 00:11:27,023 --> 00:11:35,090 So this, this brings us to the question that we had, During last class about the 148 00:11:35,090 --> 00:11:42,045 reorder buffer and, and what do you do with reorder pointers in this branch 149 00:11:42,071 --> 00:11:48,003 misspeculation case. So it's, the question really here is well. 150 00:11:49,032 --> 00:11:53,064 Do we have to wait for these instructions here to get to the end of the pipe, these, 151 00:11:53,064 --> 00:11:57,075 these speculative instructions to get all the way to the end of the pipe, to go 152 00:11:57,075 --> 00:12:01,024 clean up the reorder buffer? Or can we just adjust a pointer of the 153 00:12:01,024 --> 00:12:05,093 next instruction in the reorder buffer? And the answer, so I've spent a bunch of 154 00:12:05,093 --> 00:12:10,029 time thinking about this, is, is we should just be able to adjust the pointer in the 155 00:12:10,045 --> 00:12:14,044 reorder buffer to say where the next location is and just fill that in with 156 00:12:14,044 --> 00:12:18,080 this target instruction here and that will effectively clean out all of this state 157 00:12:18,080 --> 00:12:21,011 here. But where this gets tricky, that's, that, 158 00:12:21,011 --> 00:12:24,073 that works great in this case. But as I said, if you go look at this 159 00:12:24,073 --> 00:12:29,008 other case here where you actually have, you're trying to pre-emptively sort of 160 00:12:29,008 --> 00:12:31,089 kill things, this is not going to work in this case. 161 00:12:31,089 --> 00:12:36,053 Because what's really going to happen is we have selective roll back we're going to 162 00:12:36,053 --> 00:12:40,088 have to perform here, and just changing the pointer in the router buffer is not 163 00:12:40,088 --> 00:12:45,041 enough to go do that, we're going to have to sort of individually clean up entries 164 00:12:45,041 --> 00:12:48,072 if we wanna go do that, and that gets a lot more complicated. 165 00:12:50,015 --> 00:12:57,064 One other thing which we haven't talked about yet, and one, one motivation for why 166 00:12:57,064 --> 00:13:02,011 you may want to wait. To actually clean up the reorder buffer 167 00:13:02,011 --> 00:13:06,093 until un, you have, you wanna wait til the expected instructions reach the end of 168 00:13:06,093 --> 00:13:09,079 the, the commit stage of the pipe if you will. 169 00:13:09,096 --> 00:13:14,039 To clean up the reorder buffer is this other structures, if you have a registry 170 00:13:14,039 --> 00:13:17,076 namer, which you're gonna wanna clean up in that same manner. 171 00:13:17,076 --> 00:13:22,063 So when we talk about registry renaming in this in this lecture, and what happens is 172 00:13:22,063 --> 00:13:26,240 if you sorta think about these inflight speculative instructions. 173 00:13:26,240 --> 00:13:30,010 If you have more physical registers than architectural registers. 174 00:13:30,058 --> 00:13:36,074 It's possible that if you have to abort these instructions, these speculative 175 00:13:36,074 --> 00:13:42,075 instructions, you have other structures like a free list of physical registers, 176 00:13:42,075 --> 00:13:48,027 which you have to de-allocate somehow. And if you serve to a bulk deallocate 177 00:13:48,027 --> 00:13:52,044 whats a convenient place to sort of deallocate when it tries to commit. 178 00:13:52,062 --> 00:13:57,044 So let's look at that in a minute, but that's kind of what I want to get across 179 00:13:57,044 --> 00:14:03,008 so you can adjust the pointer for the simple case Trying to do something more 180 00:14:03,008 --> 00:14:07,038 aggressive, gets quite a bit harder, because you have to speculatively roll 181 00:14:07,038 --> 00:14:11,008 back the physical register file in addition to the rear order buffer. 182 00:14:11,008 --> 00:14:14,095 Well, the rear buffering is just a pointer, but the, the physical register 183 00:14:14,095 --> 00:14:21,002 you can't just do that with. And if you wait to the end of the pipe you 184 00:14:21,002 --> 00:14:27,000 can get you can deallocate physical registers a little bit easier.