1 00:00:03,046 --> 00:00:08,332 Okay, so let's take a look at what we have to add to our pipelines. 2 00:00:08,332 --> 00:00:13,775 So we have our in order fetch, out of order issue, out of order write back and 3 00:00:13,775 --> 00:00:18,096 in order commit, plus that we had before. Note, it had variable length pipes. 4 00:00:18,096 --> 00:00:22,037 It had a reorder buffer. It had a feature storer buffer. 5 00:00:22,037 --> 00:00:25,022 It had a scoreboard, it had an instruction cue. 6 00:00:25,022 --> 00:00:28,087 So it had all the, the structures we talked about last time. 7 00:00:29,046 --> 00:00:32,032 And now we're going to add two more structures to it. 8 00:00:32,092 --> 00:00:35,099 And were gonna modify the structures that are there slightly. 9 00:00:35,099 --> 00:00:38,045 Now let's, let's talk about what this is gonna do. 10 00:00:38,045 --> 00:00:43,029 So the first structure we're gonna add. Is a free list. 11 00:00:44,000 --> 00:00:50,065 And the free list is gonna keep track of physical registers that we could go use. 12 00:00:52,060 --> 00:00:57,015 So the physical registers will probably have more physical registers than we have 13 00:00:57,015 --> 00:01:00,075 architectural registers. But you need to keep track of which ones 14 00:01:00,075 --> 00:01:05,036 are free to be used because we are gonna basically be allocating deallocating from 15 00:01:05,036 --> 00:01:08,092 the number of physical registers quickly while as we execute. 16 00:01:09,087 --> 00:01:13,068 The other structure here, we call the rename table. 17 00:01:13,091 --> 00:01:19,023 Sometimes this is called the rat. Which is the, sort of the intel 18 00:01:19,023 --> 00:01:25,085 nomenclature for this; or actually the rat is either this table or the table we were 19 00:01:25,085 --> 00:01:31,086 discussing in the Tomasulo algorithm variance of this but they're very similar. 20 00:01:32,009 --> 00:01:38,363 And what this table does is it's going to map from architectural register to the 21 00:01:38,363 --> 00:01:41,071 most up to date version in our physical register file. 22 00:01:41,071 --> 00:01:46,019 So, it's gonna say, with instruction that's sitting here, at our decode stage, 23 00:01:46,019 --> 00:01:49,049 where do we go find the value, cuz this gets complicated. 24 00:01:49,049 --> 00:01:51,620 We're going to, we just renamed everything. 25 00:01:51,620 --> 00:01:55,776 We have different names for everything. It's in some physical registers. 26 00:01:55,776 --> 00:01:58,189 We need to go figure out where the value is. 27 00:01:58,189 --> 00:02:02,636 And that's what this table, table does. We're also going to add two fields to your 28 00:02:02,636 --> 00:02:04,945 buffer. I'll talk about that in a second, and 29 00:02:04,945 --> 00:02:08,929 we're going to want to increase the size of the physical register file, so that we 30 00:02:08,929 --> 00:02:12,172 can get more performance. If we have the same number of physical 31 00:02:12,172 --> 00:02:16,255 registers as we have architectural registers, and we need to have at least 32 00:02:16,255 --> 00:02:20,109 one physical register for each architectural register, we're not going to 33 00:02:20,109 --> 00:02:24,794 get anymore performance from having a register renaming step to our pipe. 34 00:02:24,794 --> 00:02:30,606 Okay so this is kind of for completeness where everything gets written in the pipe 35 00:02:30,606 --> 00:02:34,604 in time. Two things I wanted to point out here, are 36 00:02:34,604 --> 00:02:40,026 the free list gets updated at the front and also gets updated here at the end, and 37 00:02:40,026 --> 00:02:45,379 the condition that you need to de-allocate a physical register, or a physical 38 00:02:45,379 --> 00:02:50,209 register gets a little complicated, and, or we'll talk about that. 39 00:02:50,209 --> 00:02:55,178 And the rename table, it gets red up here because that tells you that actually where 40 00:02:55,178 --> 00:02:58,834 you get the value. It also gets updated here when we actually 41 00:02:58,834 --> 00:03:04,062 emit an instruction down the pipe. And we also want to update some pending 42 00:03:04,062 --> 00:03:07,958 bits when we get to the end of the pipe. So that it knows whether to go sort of 43 00:03:07,958 --> 00:03:11,587 pickup from the physical register file or the architecture register file for roll 44 00:03:11,587 --> 00:03:15,183 back issues. Okay, so let's jump into these data 45 00:03:15,183 --> 00:03:24,914 structures and see what we add. Okay as I said we, we we're gonna add 46 00:03:24,914 --> 00:03:29,866 stuff to the reorder buffer. Here, now our previous reorder buffer 47 00:03:29,866 --> 00:03:34,410 looked very similar to this. We had some state where things were 48 00:03:34,410 --> 00:03:39,470 pending, free, or finished where we said dash, dash represents free and f means 49 00:03:39,470 --> 00:03:42,862 finished. It means the instruction got to the end of 50 00:03:42,862 --> 00:03:49,492 the pipe and is waiting to commit. We got a bit that says, well it was after 51 00:03:49,492 --> 00:03:52,581 a branch. You might have multiple of these branches 52 00:03:52,581 --> 00:03:56,819 if you allow multiple branches in flight. A bit that says a store not. 53 00:03:56,819 --> 00:03:59,766 A bit that says whether, it writes a register. 54 00:03:59,766 --> 00:04:05,111 So this says the destination is valid and that's important for us to know because we 55 00:04:05,111 --> 00:04:09,209 meet at the end of the pipe we need to know whether to actually commit some state 56 00:04:09,209 --> 00:04:14,959 into the architecture register file. And we have a, a field here, which we had 57 00:04:14,959 --> 00:04:18,949 before, which is the physical register file specifier. 58 00:04:18,949 --> 00:04:25,433 So this tells us where to go read from. That's, that's all, that's all good. 59 00:04:25,433 --> 00:04:28,422 But now we add some extra, extra bits here. 60 00:04:28,422 --> 00:04:33,358 And the first one is a architectural register file specifier. 61 00:04:33,358 --> 00:04:39,030 Okay, so this gets a little complicated. What, what are we thinking about here? 62 00:04:39,030 --> 00:04:42,668 Why do we need this? When we get to the end of the pipe and we 63 00:04:42,668 --> 00:04:48,051 are going to do commit, if we go back and look at this picture here, in the commit 64 00:04:48,051 --> 00:04:53,169 stage, we take something from the physical register file, put it into and the rear 65 00:04:53,169 --> 00:04:57,594 buffer drives this and says, okay copy that into the architecture register file, 66 00:04:57,594 --> 00:05:02,019 when the commit occurs. Well now we've renamed everything. 67 00:05:02,019 --> 00:05:07,026 So it's not an identity map from physical register number to architectural register 68 00:05:07,026 --> 00:05:09,092 number. So we needed to know where you could 69 00:05:09,092 --> 00:05:12,093 actually write in the architectural register file. 70 00:05:14,037 --> 00:05:17,092 And that's what this does. It just tells us where to go write. 71 00:05:17,092 --> 00:05:21,018 So, this is where we read from, this is where we write to. 72 00:05:21,018 --> 00:05:26,007 When this instruction, let's say it's the most recent instruction here that's turned 73 00:05:26,007 --> 00:05:28,005 to finish. It's going to go commit. 74 00:05:28,005 --> 00:05:32,006 We read the value from here. We write it into the value pointed to by 75 00:05:32,006 --> 00:05:36,049 here. And then we have one other field here, 76 00:05:36,049 --> 00:05:40,060 which is a little bit odd. We have the previous physical register. 77 00:05:41,012 --> 00:05:44,003 Why, why would we need that? That doesn't make any sense. 78 00:05:44,003 --> 00:05:48,028 It is the, what is, what is this doing? So this is something we actually read out 79 00:05:48,028 --> 00:05:50,066 of the rename table at the front of the pipe. 80 00:05:50,066 --> 00:05:53,063 And what it's going to tell us, is it's going to tell us. 81 00:05:55,039 --> 00:06:00,002 Let's say this was register four. This is where we, the in-flight physical 82 00:06:00,002 --> 00:06:05,004 register is, and this is the previous physical register that h-, held the value 83 00:06:05,004 --> 00:06:11,028 of register four before we did the update. And the reason we need to know this, is 84 00:06:11,028 --> 00:06:16,290 when we hit the end of the pipe, we need some way to de-allocate physical 85 00:06:16,290 --> 00:06:21,635 registers, and we're going to use this to track that, and we'll give an example in a 86 00:06:21,635 --> 00:06:25,007 minute. But what this is really going to do its 87 00:06:25,007 --> 00:06:31,028 going to say; oh we wrote to the new value of register four, which means that the in 88 00:06:31,028 --> 00:06:35,784 flight value, let's say it was register four, physical register 27. 89 00:06:35,784 --> 00:06:41,016 And the new one is physical register 30 or something like that, need to deallocate 90 00:06:41,016 --> 00:06:45,013 physical register 27 and we can do that when we reach the end of the pipe by 91 00:06:45,013 --> 00:06:49,626 committing this instruction out of the reorder buffer and cleaning up all the 92 00:06:49,626 --> 00:06:52,267 state. Okay. 93 00:06:52,267 --> 00:06:56,731 A, a quick picture here of the, the rename table, the renaming table. 94 00:06:56,731 --> 00:07:03,473 This is indexed by register. P tells us whether we have a write in 95 00:07:03,473 --> 00:07:08,620 flight. So it knows that, that value is not in the 96 00:07:08,620 --> 00:07:14,032 architectural register file. And p regulator tells us where in the 97 00:07:14,032 --> 00:07:16,341 physical register file to go find the value. 98 00:07:16,341 --> 00:07:20,666 And this is really important when a subsequent instruction is looking for that 99 00:07:20,666 --> 00:07:25,254 value, shows up, and it wants to get that value before it hits the architecture 100 00:07:25,254 --> 00:07:28,484 register file. It looks here, this tells us, tells us, oh 101 00:07:28,484 --> 00:07:31,209 its pending, its gonna be here in a little bit. 102 00:07:31,209 --> 00:07:35,617 Together with this and the scoreboard, you might even be able to bypass it early. 103 00:07:35,617 --> 00:07:39,119 In, in, a, in a good day. In a bad day we have to wait for it to get 104 00:07:39,119 --> 00:07:43,193 to the physical register file, but it's a lot better than having to go pick it out 105 00:07:43,193 --> 00:07:48,888 of the architectural register file. And finally we have a free list. 106 00:07:48,888 --> 00:07:53,700 And this is literally just a bit per physical register, which is very different 107 00:07:53,700 --> 00:08:01,844 than a bit per architectural register. And this is going to have, let's say we 108 00:08:01,844 --> 00:08:07,141 have big N physical registers. Or rather we have 256 physical registers, 109 00:08:07,141 --> 00:08:13,150 and we have a bit saying whether that register has been deallocated and is ready 110 00:08:13,150 --> 00:08:19,208 to be used, for future register renaming. Or whether the instruction, or whether 111 00:08:19,208 --> 00:08:24,323 there's a instruction using that physical register, or it's waiting to commit to the 112 00:08:24,323 --> 00:08:29,024 architectural register file, this will tell us that information in this table 113 00:08:29,024 --> 00:08:33,039 here, and just a bit that says whether it's free or not, pretty simple. 114 00:08:34,060 --> 00:08:40,024 Actually, before I go on here, I wanted to make a, make an interesting observation. 115 00:08:40,024 --> 00:08:44,046 Where, where does this register renaming become really important? 116 00:08:44,046 --> 00:08:49,077 Well, if we go look at something like the original Intel architecture, they had 117 00:08:49,077 --> 00:08:53,080 eight registers. If you want to run high performance code, 118 00:08:53,080 --> 00:08:56,034 you have to re-use those registers pretty quickly. 119 00:08:57,094 --> 00:09:02,070 So they got register limited very quickly when they tried to build faster and faster 120 00:09:02,070 --> 00:09:05,062 processors. So they had to introduce registry naming 121 00:09:05,062 --> 00:09:09,099 quite early in the Intel imp-, micro-architecture implementations. 122 00:09:09,099 --> 00:09:14,025 And they had many, many more physical registers than architectural registers 123 00:09:14,025 --> 00:09:17,017 very quickly, cuz eight isn't gonna get you very far. 124 00:09:17,017 --> 00:09:20,037 They, they can have, like about 100 in-flight instructions. 125 00:09:20,037 --> 00:09:24,030 And, by definition you can't have that many inflight instructions, if you, 126 00:09:24,030 --> 00:09:28,057 maintain right after write stalling, effectively cuz, you're, you're going to 127 00:09:28,057 --> 00:09:32,008 have to rewrite some register. It's kind of like a, a, pigeon-hole 128 00:09:32,008 --> 00:09:34,076 problem. If you have more than eight instructions, 129 00:09:34,076 --> 00:09:38,075 at least one of those instructions is gonna cause a right after right, 130 00:09:38,075 --> 00:09:41,010 dependency and you're gonna stall the pipe. 131 00:09:41,010 --> 00:09:45,043 So, they're, they're not going to have more than say eight inflight instructions 132 00:09:45,043 --> 00:09:48,027 pretty quickly, if they did not do register renaming. 133 00:09:48,027 --> 00:09:51,077 So, they did register renaming pretty quickly, in their pipelines. 134 00:09:53,008 --> 00:09:56,046 Okay, so this gets us to the, the I chart here. 135 00:09:57,012 --> 00:10:02,640 Let's walk through, basic case, of what's in all these different tables as we 136 00:10:02,640 --> 00:10:10,076 execute our basic, simple code here. On the top we have the four instructions, 137 00:10:10,076 --> 00:10:14,005 two muls, and some adds. This was our original test case. 138 00:10:14,005 --> 00:10:18,060 Note there is all these dependencies through register four we need to worry 139 00:10:18,060 --> 00:10:21,017 about. There's both where you have to write, 140 00:10:21,035 --> 00:10:24,046 write after read and write after write dependencies. 141 00:10:24,075 --> 00:10:33,026 We're gonna execute it quickly here by pulling, as you can see, this add fires 142 00:10:33,026 --> 00:10:39,044 early, or issues early the, the final add. And this is really driven by the 143 00:10:39,044 --> 00:10:43,048 registering ini. So let's, let's take a look at what, what 144 00:10:43,048 --> 00:10:46,044 happens here. We'll try to interpret this. 145 00:10:46,044 --> 00:10:50,063 Here we have cycles. Cycles are also across the top of the 146 00:10:50,063 --> 00:10:53,080 stage. We, we show what's in the decode issue, 147 00:10:53,080 --> 00:10:59,087 write back and commit stage of the pipe. We leave out the execute stages cuz it's 148 00:10:59,087 --> 00:11:04,077 too much to draw here and it's drawn at the top in a different form. 149 00:11:05,015 --> 00:11:08,021 Let's first look at the renamed table. So. 150 00:11:08,021 --> 00:11:13,027 We're at the rename table. Or, we're actually gonna say we only have, 151 00:11:13,049 --> 00:11:19,008 for, for clueless here, let's say we only have seven architectural registers. 152 00:11:19,067 --> 00:11:25,040 But we're going to have let's say. Ten physi, or, eleven physical registers. 153 00:11:25,040 --> 00:11:29,085 So we're gonna have more physical registers than actual registers in this 154 00:11:29,085 --> 00:11:33,070 example. We start off and we say, Okay, well, 155 00:11:33,070 --> 00:11:36,086 register one. If you want to go find architectural 156 00:11:36,086 --> 00:11:40,008 register one, the values in physical register zero. 157 00:11:40,034 --> 00:11:44,008 And we could basically just, you know, we just come up with some allocation. 158 00:11:44,008 --> 00:11:46,025 And the circles mean that it's not pending. 159 00:11:46,025 --> 00:11:49,053 It's not in flight in the pipe. That's just sort of the base case. 160 00:11:49,053 --> 00:11:51,076 Everything is, the, the pipe has been relaxed. 161 00:11:51,076 --> 00:11:55,055 Everything is, is allocated, and we just drew a basic allocation here at the 162 00:11:55,055 --> 00:12:00,050 beginning. Now as we go to execute, some interesting 163 00:12:00,050 --> 00:12:05,059 stuff starts to happen. The first thing that is going to happen 164 00:12:05,059 --> 00:12:11,915 is, we are actually going to, here, issue this instruction, which writes to register 165 00:12:11,915 --> 00:12:16,043 one. We need to rename this, at this point. 166 00:12:16,043 --> 00:12:20,052 Register one will have to be named as something else. 167 00:12:20,052 --> 00:12:24,717 So in this table here, if we look, we register, we rename register one to 168 00:12:24,717 --> 00:12:30,273 physical register seven. Okay, that sounds good. 169 00:12:30,273 --> 00:12:34,503 What happens next. Well we next sit here, and we try to 170 00:12:34,503 --> 00:12:39,911 execute this instruction here, It says mall, and it goes to try to read register 171 00:12:39,911 --> 00:12:44,029 one. When it goes to read register one though, 172 00:12:44,029 --> 00:12:49,752 we can go look at the rename table and say, oh well that's actually in flight, 173 00:12:49,752 --> 00:12:54,925 and it's in physical register seven. So if we go look over here, we can draw 174 00:12:54,925 --> 00:13:00,022 this and say, oh that value is actually in physical register seven, and it's 175 00:13:00,022 --> 00:13:05,748 currently not ready, maybe, and, But P4, the other input, register five, to do, 176 00:13:05,748 --> 00:13:09,641 okay, yeah, register five got renamed to P4, is ready. 177 00:13:09,641 --> 00:13:16,569 So it's ready to go. Okay, let's, one of the other interesting 178 00:13:16,569 --> 00:13:20,822 things that happens here is we can see that as we go to allocate this, we have to 179 00:13:20,822 --> 00:13:24,270 remove it from the free list. So this list here is the list of all the 180 00:13:24,270 --> 00:13:27,571 free registers. We start off with four free registers and 181 00:13:27,571 --> 00:13:30,320 we sort of narrow it down as we start to do rights. 182 00:13:30,320 --> 00:13:33,593 At some point we run out. So I want to make an important note about 183 00:13:33,593 --> 00:13:38,260 this is that when we run out of physical registers we're going to have to stall the 184 00:13:38,260 --> 00:13:40,393 pipe, because we can't do any more renaming. 185 00:13:40,393 --> 00:13:42,494 We can't issue more instructions at that point. 186 00:13:42,494 --> 00:13:47,159 So that's, that's really, that's really important to realize that when you build 187 00:13:47,159 --> 00:13:51,465 your machines you have to have enough physical registers that you don't run out 188 00:13:51,465 --> 00:13:54,550 very often. Now, it's possible that you could still 189 00:13:54,550 --> 00:13:57,018 run out. So let's say you have hundreds of in 190 00:13:57,018 --> 00:14:00,467 flight instructions. And you only have, let's say, 64 physical 191 00:14:00,467 --> 00:14:03,104 registers. You might still run out, But the 192 00:14:03,104 --> 00:14:06,687 probability of that happening, my, might be relatively low. 193 00:14:06,687 --> 00:14:10,933 And the, your utilization, and, you know, you sort of bake into this your CPI. 194 00:14:10,933 --> 00:14:13,786 Your CPI may not be less than one, or may not be low. 195 00:14:13,786 --> 00:14:17,100 So, you know, the probability of that actually happening. 196 00:14:17,100 --> 00:14:21,672 You may not worry about it too much. Another cute little story here is there's 197 00:14:21,672 --> 00:14:26,255 actually been some interesting bugs in processors around the free list. 198 00:14:26,255 --> 00:14:31,334 So there were some alpha processors that actually leaked free list entries in their 199 00:14:31,334 --> 00:14:35,058 register file. So what happened was if you ran a certain 200 00:14:35,058 --> 00:14:39,514 piece of code for a long enough period of time all of a sudden this processor just 201 00:14:39,514 --> 00:14:43,719 ground to a halt cause it was not able to allocate more physical registers and it 202 00:14:43,719 --> 00:14:46,124 ran out. And ends up with fewer physical registers, 203 00:14:46,124 --> 00:14:48,810 architectural registers, and the machine just stopped. 204 00:14:48,810 --> 00:14:53,560 And this was a, sort of well-known bug in, in some of the early Alpha, I think this 205 00:14:53,560 --> 00:14:58,327 was actually in the first, out of, I want to think where was this, I think this was 206 00:14:58,327 --> 00:15:03,733 in the, 21264 had this problem. They, they fixed it. 207 00:15:03,733 --> 00:15:05,351 And, they pulled those chips off the shelf. 208 00:15:05,351 --> 00:15:08,616 And, you know, that's a, that's a really bad thing to, have happen, in your 209 00:15:08,616 --> 00:15:10,037 processor. How embarrassing. 210 00:15:10,037 --> 00:15:14,177 But as I said, if you run out, you're really not going to be able to issue more. 211 00:15:14,177 --> 00:15:16,347 But in this case we made sure we had enough. 212 00:15:16,347 --> 00:15:18,793 So we're not actually going to see any stalls. 213 00:15:18,793 --> 00:15:23,890 And let's look at how things get on the free list here. 214 00:15:23,890 --> 00:15:33,481 Cause that's a little bit interesting. In our reorder buffer, I said we had extra 215 00:15:33,481 --> 00:15:38,077 fields. If you recall, we had the previous 216 00:15:38,077 --> 00:15:43,367 physical register that this was allocated into. 217 00:15:43,367 --> 00:15:49,858 So, if we go look, at this instruction, which is the first instruction, go to 218 00:15:49,858 --> 00:16:00,307 execute that mull, R1 was in P0. So when that instruction commits, we 219 00:16:00,307 --> 00:16:08,505 actually put p0 onto the free list. And we're going to look at a case in a 220 00:16:08,505 --> 00:16:12,085 second why you can't do it earlier. Because it seems like you should be able 221 00:16:12,085 --> 00:16:15,032 to basically de-allocate physical registers earlier. 222 00:16:15,032 --> 00:16:17,094 You know, no one's probably gonna be reading that value. 223 00:16:17,094 --> 00:16:20,022 Why can't you just, you know, get rid of it early? 224 00:16:20,022 --> 00:16:24,008 But we'll look at in a second that a test case that, that, that's, that's a problem 225 00:16:24,008 --> 00:16:29,047 with. Let's see any, any other fun insights 226 00:16:29,047 --> 00:16:36,049 here? That's, that's about it, what I wanted to 227 00:16:36,049 --> 00:16:41,023 get across from, from this diagram. As, as the code continues on we end up 228 00:16:41,023 --> 00:16:44,014 with more and more free physical registers. 229 00:16:44,014 --> 00:16:48,650 One thing one thing I did wanna just to walk through this, understand this a 230 00:16:48,650 --> 00:16:52,448 little bit. Let's say we have this instruction here 231 00:16:52,448 --> 00:16:57,030 which is our one, two, three, four. It's our last instruction that we execute. 232 00:16:57,030 --> 00:17:01,098 Let's go see what it's doing here. So writes architecture register four, so 233 00:17:01,098 --> 00:17:06,066 let's just store that in the reorder buffer cuz we don't know where to go to 234 00:17:06,066 --> 00:17:12,099 the right. We had allocated p10 to that, and we did 235 00:17:12,099 --> 00:17:17,001 that right here, when we actually issued it. 236 00:17:18,098 --> 00:17:25,060 So he pulled it off the free list. And the previous thing that it wrote was 237 00:17:25,060 --> 00:17:31,068 P8, so when in, that ultimately commits, P8 is gonna end up back in our free list. 238 00:17:32,090 --> 00:17:39,027 So that's a, that's a nice little thing, the circles just show when the values are 239 00:17:39,027 --> 00:17:44,031 no longer pending, so they are actually not in the pipes anymore. 240 00:17:45,046 --> 00:17:53,456 And you can see that continuing here, this instruction here, which is the second 241 00:17:53,456 --> 00:17:57,093 multiply. When it commits it's going to free up P3. 242 00:17:57,093 --> 00:18:02,015 So P3 ends up on the list. P5, P5 ends up on the list. 243 00:18:02,015 --> 00:18:09,019 P8, P8 ends up on the list. And then if when we see true read after 244 00:18:09,019 --> 00:18:15,871 writes, for example right here we need to make sure to pick up that correct value. 245 00:18:15,871 --> 00:18:19,396 We do that by looking up in the rename table. 246 00:18:19,396 --> 00:18:23,850 So let's go find that in this chart here. So instruction two. 247 00:18:23,850 --> 00:18:28,760 Is let's see what it's doing here. So, that's gonna be right here. 248 00:18:28,760 --> 00:18:32,603 It's waiting on the eighth to be become ready, in order to issue. 249 00:18:32,603 --> 00:18:36,902 So it's signaling an instruction queue, and this is gonna stall. 250 00:18:36,902 --> 00:18:41,724 It's gonna stall all the way out to here, or to stall to right there. 251 00:18:41,724 --> 00:18:44,727 And that's when it comes out of the instruction. 252 00:18:44,727 --> 00:18:47,561 Okay. So let's look at freeing up physical 253 00:18:47,561 --> 00:18:52,094 registers, and what is a good policy for freeing up physical registers. 254 00:18:53,035 --> 00:18:57,091 So we're gonna have a different piece of code here we're going to look at. 255 00:18:57,091 --> 00:19:01,074 It's gonna be just a bunch of ads. And we're gonna look at. 256 00:19:02,016 --> 00:19:07,059 This code has some. Read after write dependency is in it. 257 00:19:08,072 --> 00:19:18,366 Namely R1 there. And, let's say, we try to go execute this. 258 00:19:18,366 --> 00:19:23,697 Well we, we're gonna, here's some execution order. 259 00:19:23,697 --> 00:19:32,363 We're gonna look at, oh sorry, that one there so, I meant point out for the read 260 00:19:32,363 --> 00:19:36,095 after write. A write after read dependency here. 261 00:19:36,095 --> 00:19:42,077 So let's look at some execution order and see what happens, let's say we allocate 262 00:19:44,068 --> 00:19:52,464 physical register zero, at the beginning somewhere for register one. 263 00:19:52,464 --> 00:19:58,726 And then when we do the commit, we free it up in our, free list. 264 00:19:58,726 --> 00:20:04,866 Well, lo and behold, another instruction, in time, comes along here, and allocates 265 00:20:04,866 --> 00:20:09,531 in the physical register zero. And it goes and writes to it. 266 00:20:09,531 --> 00:20:16,966 And we, like, free it up there. This instruction here which we'd renamed 267 00:20:16,966 --> 00:20:22,672 and earlier we had renamed our one for and we go to try to read this value, goes to, 268 00:20:22,672 --> 00:20:26,344 do the read and it looks in physical register zero. 269 00:20:26,344 --> 00:20:30,594 And I guessed the wrong value. Ooh. 270 00:20:30,594 --> 00:20:38,095 Yeah, we don't, we don't want that. So what's a, what's a good policy here? 271 00:20:38,095 --> 00:20:44,463 Let's say instead, we don't, free up a physical register until someone else goes 272 00:20:44,463 --> 00:20:49,901 to write that physical register. Or our subsequent instruction goes to 273 00:20:49,901 --> 00:20:55,020 write that physical register. Because then we know that, that physical 274 00:20:55,020 --> 00:20:59,483 register is in use, or could be in use by other readers of that value. 275 00:20:59,483 --> 00:21:05,669 So if we look at this case here, let's say we, write, physical register zero. 276 00:21:05,669 --> 00:21:09,647 And then we allocate a different physical register. 277 00:21:09,647 --> 00:21:13,209 Right? We allocate physical register two for this 278 00:21:13,209 --> 00:21:15,437 right here. Register eight. 279 00:21:15,437 --> 00:21:23,071 And then we de-allocate when we go to overwrite register one. 280 00:21:23,071 --> 00:21:30,103 So by doing that, we know when this R1 gets written, that no one else can 281 00:21:30,103 --> 00:21:36,244 possibly use that physical register, that is after this instruction in program word, 282 00:21:36,244 --> 00:21:40,461 because we overwrote it, so the value is no longer visible. 283 00:21:40,461 --> 00:21:45,474 So that's, that's pretty, pretty nice. So that's, that's the a, a, a very good 284 00:21:45,474 --> 00:21:50,387 heuristic or very good way to get this correct; cuz you could just keep the 285 00:21:50,387 --> 00:21:55,387 physical register live until you rewrite the physical, or your rewrite the 286 00:21:55,387 --> 00:21:59,179 architectural register that physical register maps to. 287 00:21:59,179 --> 00:22:03,894 And at that point you can remove it from the number of allocated physical 288 00:22:03,894 --> 00:22:08,726 registers, and put in on the free list. If you do it early, with the out of order 289 00:22:08,726 --> 00:22:12,129 execution pipeline, you know, bad, bad things can happen. 290 00:22:12,129 --> 00:22:18,515 You can go read the wrong values. Okay, so this brings us to a couple 291 00:22:18,515 --> 00:22:24,506 optimizations on register renaming. The biggest one here is you can try to 292 00:22:24,506 --> 00:22:30,029 combine the architectural register file and the physical register file to save 293 00:22:30,029 --> 00:22:36,072 space, and the insight here is, if you go to try to combine these two things you can 294 00:22:36,072 --> 00:22:42,108 store the architectural register value and the physical register value in the same 295 00:22:42,108 --> 00:22:47,522 physical storage location. If that physical register's no longer 296 00:22:47,522 --> 00:22:51,185 pending. So if there's nothing in flight to it and 297 00:22:51,185 --> 00:22:56,737 you don't have to roll back, if you're just going to roll back to the same value 298 00:22:56,737 --> 00:22:59,495 anyway, why, why keep extra space for this? 299 00:22:59,495 --> 00:23:06,006 One, one change you need to do here is, so you're going to remove the architectural 300 00:23:06,006 --> 00:23:10,600 register file. Which you still basically need to know 301 00:23:10,600 --> 00:23:15,393 when you go to do a rollback of some speculative, say you take an interrupt, or 302 00:23:15,393 --> 00:23:20,253 you take a branch miss-predict, you still need to know where to go rollback out of 303 00:23:20,253 --> 00:23:25,448 and we're gonna do that by let's say having a second renaming table here, which 304 00:23:25,448 --> 00:23:29,060 allows us to keep track of just the architectural state. 305 00:23:29,060 --> 00:23:33,634 So we have a speculative renaming table, then we have an architectural renaming 306 00:23:33,634 --> 00:23:36,059 table. It just has pointers in it, instead of 307 00:23:36,059 --> 00:23:41,994 actual values, at the end of the pipe. And what's also nice here, is instead of 308 00:23:41,994 --> 00:23:45,977 copying values, we don't actually have to move something out of the physical 309 00:23:45,977 --> 00:23:48,790 register file into the architectural register file. 310 00:23:48,790 --> 00:23:51,877 Instead we just have to update a pointer in a table now. 311 00:23:51,877 --> 00:23:56,164 And we did the copy, to potentially also make rollback easier, cause we have to up 312 00:23:56,164 --> 00:24:01,113 date pointers now instead of actually copying an enter register file, which can 313 00:24:01,113 --> 00:24:04,624 take awhile or requires lots of ports or something else. 314 00:24:04,624 --> 00:24:08,745 So you, you can have a little table there to do this remapping for you. 315 00:24:08,745 --> 00:24:13,975 And as I say you can typically get away with less space than having for the same 316 00:24:13,975 --> 00:24:18,686 performance than if you were to having two separate structures. 317 00:24:18,686 --> 00:24:25,655 When it downsizes, you, you might need to have more Depending on how you implement 318 00:24:25,655 --> 00:24:29,626 this. You're architectural register file, and 319 00:24:29,626 --> 00:24:32,368 your physical register file are now together. 320 00:24:32,368 --> 00:24:35,954 It may be bigger. So your registered default access might be 321 00:24:35,954 --> 00:24:39,184 a little slower. Something like that could be, could be a 322 00:24:39,184 --> 00:24:42,092 down side, versus having it in two separate partition structures.