1 00:00:03,083 --> 00:00:06,094 Okay. So that brings up the, the question of 2 00:00:06,094 --> 00:00:10,011 register renaming. What is register renaming? 3 00:00:10,033 --> 00:00:17,360 Hopefully some of you skimmed the Thomas UIlo algorithm paper that I assigned, cuz 4 00:00:17,360 --> 00:00:22,014 we're going to be discussing that and the motivation for that work. 5 00:00:23,010 --> 00:00:26,021 Okay. So, what is learning our performance in 6 00:00:26,021 --> 00:00:31,050 these pipelines, these out of order pipelines that we've discussed so far? 7 00:00:31,071 --> 00:00:33,024 Okay. A couple things. 8 00:00:34,047 --> 00:00:37,062 Write after write and write after read dependencies. 9 00:00:37,062 --> 00:00:42,036 Let's, let's talk about these. In write after write dependence, you're 10 00:00:42,036 --> 00:00:46,041 going to write to one register, then you're going to write to the register 11 00:00:46,041 --> 00:00:52,030 again, and in our pipelines we've talked about so far, you're basically gonna stall 12 00:00:52,030 --> 00:00:55,645 the pipe while you're waiting for the first right to commit. 13 00:00:55,645 --> 00:01:00,710 Because we're not able to handle multiple rights in, in the pipeline at the same 14 00:01:00,710 --> 00:01:05,226 time. But these are not fundamental 15 00:01:05,226 --> 00:01:09,190 dependencies. So computer architects put on our thinking 16 00:01:09,190 --> 00:01:12,428 cap and we came up with ways to break these dependencies. 17 00:01:12,428 --> 00:01:16,749 A read after write, that, that also is true for a write after a read. 18 00:01:16,749 --> 00:01:21,466 So, if you do write to let's says register four after a read from register four, 19 00:01:21,466 --> 00:01:24,316 well, there should be nothing wrong with that. 20 00:01:24,316 --> 00:01:28,056 But, if you try to execute the instructions out of order, then you need 21 00:01:28,056 --> 00:01:31,575 to think about that. A read after write is a true dependence, 22 00:01:31,575 --> 00:01:35,797 because you actually need the data, you need the value to go execute the 23 00:01:35,797 --> 00:01:39,609 subsequent instruction. And we're going to call write after read, 24 00:01:39,609 --> 00:01:45,067 write after write and write after read dependencies name dependencies, and we're 25 00:01:45,067 --> 00:01:50,601 going to call a read after write a true dependence that we can't break. 26 00:01:50,601 --> 00:01:52,993 Okay. So let's, let's look at a some example 27 00:01:52,993 --> 00:01:56,825 code here, and see what can go wrong if you just ignore all the name dependencies. 28 00:01:56,825 --> 00:02:00,682 Like I said, they're not true dependencies, so maybe we just don't need 29 00:02:00,682 --> 00:02:03,872 them. Okay, so we have, a different code 30 00:02:03,872 --> 00:02:09,122 sequence here, we have a mul, mul, then two add immediates. 31 00:02:09,122 --> 00:02:15,046 Let's, let's identify some important things in here. 32 00:02:15,046 --> 00:02:22,408 First let's identify the true read after write dependencies. 33 00:02:22,408 --> 00:02:26,004 So I, I put some circles and some arrows here. 34 00:02:26,004 --> 00:02:31,002 So let's take a look here. This writes register one in the mul. 35 00:02:31,002 --> 00:02:34,074 The second mall reads the results of that. Okay? 36 00:02:35,028 --> 00:02:39,094 Let's see here, this add reads register four in the previous instruction and 37 00:02:39,094 --> 00:02:42,094 writes register four, so that's a true dependence. 38 00:02:42,094 --> 00:02:47,004 We can't break those. We make talk at the end of class at the 39 00:02:47,004 --> 00:02:51,095 end of the term, maybe about some ways to break those, but they get pretty, pretty 40 00:02:51,095 --> 00:02:56,064 crazy. So, let's, let's look at the, write after 41 00:02:56,064 --> 00:03:00,033 write dependence. So the first write after write dependence 42 00:03:00,033 --> 00:03:03,033 is here. We're writing register four, and then we 43 00:03:03,033 --> 00:03:09,449 write register four again, but an out-of-order processor, if we try to break 44 00:03:09,449 --> 00:03:14,906 all these dependencies, we can see that we're actually writing to register four 45 00:03:14,906 --> 00:03:18,012 here, and then we write register four here. 46 00:03:18,012 --> 00:03:21,088 And that's, like, out of order. Whoa, what just happened here? 47 00:03:21,088 --> 00:03:24,060 Well, we said we just broke the dependency. 48 00:03:24,060 --> 00:03:27,097 We're not gonna stall the front of the pipe on this. 49 00:03:27,097 --> 00:03:32,090 So if you to execute this on one of our out of order issue pipes that we've, 50 00:03:33,010 --> 00:03:36,066 looked at so far. Let's say this is executing on the in 51 00:03:36,066 --> 00:03:41,557 order fetch, out of order issue, out of order execute, and, right back in out, in 52 00:03:41,557 --> 00:03:46,279 order commit pipe. We can see that we will write to register 53 00:03:46,279 --> 00:03:50,092 four in time before we wrote to register four here. 54 00:03:50,092 --> 00:03:53,078 Now that's going to cause some major problems. 55 00:03:53,078 --> 00:03:58,072 We just wrote the wrong value. Oops. 56 00:03:58,072 --> 00:04:03,030 And the other one here is a write after read dependence. 57 00:04:03,030 --> 00:04:09,996 So here we have a read of register four. And here we have a write of register four. 58 00:04:10,000 --> 00:04:15,412 And because this add got pulled so early in the, in the execution order, we 59 00:04:15,412 --> 00:04:21,069 actually wrote before this instruction had a chance to read the value. 60 00:04:21,069 --> 00:04:28,160 And the reason this instruction got delayed was cuz it was also dependent on a 61 00:04:28,160 --> 00:04:32,081 true dependency here. But it's dependent on two things. 62 00:04:32,081 --> 00:04:39,005 But all of a sudden we wrote register four with the value from this add, and then we 63 00:04:39,005 --> 00:04:42,039 went and read it, and we read the wrong value. 64 00:04:42,099 --> 00:04:48,079 So, we can't just go change and break right after write dependencies and right 65 00:04:48,079 --> 00:04:54,088 after read dependencies very easily. We need to think a little bit harder about 66 00:04:54,088 --> 00:04:57,093 this. One, one last interesting thing that 67 00:04:57,093 --> 00:05:00,098 happens here. This is, this is kind of fun. 68 00:05:01,066 --> 00:05:09,081 We do commit in order in this pipe. But look what happens to register four. 69 00:05:09,083 --> 00:05:15,025 We wrote register four, here, then we write register four again. 70 00:05:16,006 --> 00:05:21,014 And then we commit from physical register four to architectural register four. 71 00:05:21,014 --> 00:05:26,016 So we just committed the wrong state, also, to the architectural register file. 72 00:05:26,016 --> 00:05:29,048 So we're having lots of, lots of problems with here. 73 00:05:29,048 --> 00:05:35,060 It's not just, basic things. So what's the solution to this? 74 00:05:36,030 --> 00:05:41,089 Well solution, as we can start thinking about how to add more registers, so at the 75 00:05:41,089 --> 00:05:47,062 top here is the same example I had in the previous slide so nothing, nothing new 76 00:05:47,062 --> 00:05:53,008 there but I wanted to compare this to, let's say our conservatively stalling 77 00:05:53,008 --> 00:05:56,012 pipeline from performance perspective first. 78 00:05:56,012 --> 00:06:01,480 So here we have our in order, our in order fetch out of order issue out of order 79 00:06:01,480 --> 00:06:07,011 write back and in order commit pipe. This is our most advanced one from last 80 00:06:07,011 --> 00:06:12,056 time. But, it conservatively stalls on write 81 00:06:12,056 --> 00:06:18,529 after write and write after read dependencies, and that's drawn here of 82 00:06:18,529 --> 00:06:23,010 these arrows. So we can't even issue this instruction 83 00:06:23,010 --> 00:06:27,051 until we know that, let's say these two instructions here, which have something to 84 00:06:27,051 --> 00:06:30,089 do with register four, commit. And then we can go to issue this. 85 00:06:30,089 --> 00:06:33,029 Now this might be a little bit conservative. 86 00:06:33,029 --> 00:06:36,002 This might be even a little bit over conservative. 87 00:06:36,002 --> 00:06:40,060 It might be possible to pull this back one or two cycles, maybe to, sort of to the 88 00:06:40,060 --> 00:06:43,033 point where this instructions does the write back. 89 00:06:43,033 --> 00:06:46,779 But one of the challenges there is, you don't necessarily, you can't necessarily 90 00:06:46,779 --> 00:06:50,787 track that very easily inside of your reorder buffer, unless you have something 91 00:06:50,787 --> 00:06:56,041 there that scans for this exact case. It's not going to save you that much 92 00:06:56,041 --> 00:07:00,068 performance either. What, what I'm trying to get across here 93 00:07:00,068 --> 00:07:06,074 though is that the performance of this instruction sequence is actually worse. 94 00:07:06,074 --> 00:07:12,021 It takes longer than our incorrect, but we'll call ideal case here on top. 95 00:07:12,021 --> 00:07:17,738 So let's, let's do one little change to the instruction sequence, highlighted in 96 00:07:17,738 --> 00:07:22,067 red here, and see what happens to our execution. 97 00:07:22,067 --> 00:07:27,073 So we took this ad which wrote to register for and changed it. 98 00:07:27,073 --> 00:07:33,060 We added another register usage here and now we write to register eight. 99 00:07:34,063 --> 00:07:39,017 And lo and behold, it breaks all the write after write dependencies, and it breaks 100 00:07:39,017 --> 00:07:43,021 all the write after read dependencies. And all of a sudden, we get to, our 101 00:07:43,021 --> 00:07:47,007 idealized performance, this, exact, this is the exact same case as that. 102 00:07:47,050 --> 00:07:51,031 But we require another register. Hm. 103 00:07:51,031 --> 00:07:53,081 Well, please add infinite numbers of registers. 104 00:07:53,081 --> 00:07:57,087 So, what is, what's, what's the con of adding infinite numbers of registers? 105 00:07:57,087 --> 00:08:01,038 Anyone have any ideas? So, if we're trying to use more registers 106 00:08:01,038 --> 00:08:04,011 here we might use up all of our architecture registers. 107 00:08:04,011 --> 00:08:07,083 Can we just add more architecture registers to our instruction set? 108 00:08:10,030 --> 00:08:14,029 Yeah so, so it takes up space. So if we look at our registers, we could 109 00:08:14,029 --> 00:08:18,337 have a larger name space for our registers, but if we have 32 registers, 110 00:08:18,337 --> 00:08:21,083 that takes five bits. If we have 128 registers, that's gonna 111 00:08:21,083 --> 00:08:25,054 take, you know, seven bits. And if we have infinite, that will take 112 00:08:25,054 --> 00:08:29,060 infinite number of bits. But what we're gonna talk about in today's 113 00:08:29,060 --> 00:08:34,052 lecture is how to do this in hardware such that you have some more registers in your 114 00:08:34,052 --> 00:08:38,099 physical register space, but not more registers in your architecture register 115 00:08:38,099 --> 00:08:41,682 space. And this is, I should point out this is 116 00:08:41,682 --> 00:08:45,095 not only a register problem. This could also happen with memory. 117 00:08:45,095 --> 00:08:50,795 If you name your memory inappropriately if you have a very small amounts of memory 118 00:08:50,795 --> 00:08:57,011 and you try to reuse it very aggressively, you can get name, naming problems in that 119 00:08:57,011 --> 00:09:00,005 also. But for today we're going to mostly be 120 00:09:00,005 --> 00:09:04,049 focusing on registering. And so I'll define registry name as, well 121 00:09:04,049 --> 00:09:09,041 you're gonna change the naming of the registers in hardware to eliminate these 122 00:09:09,041 --> 00:09:14,001 write after write name dependencies and the write after read dependencies. 123 00:09:15,060 --> 00:09:24,045 Okay, so we're gonna be talking about two major schemes, and there, they are mirrors 124 00:09:24,045 --> 00:09:30,094 of each other and have slightly different, hardware requirements. 125 00:09:31,053 --> 00:09:35,033 What their lot, mostly when you think of them, they're just sort of logically duals 126 00:09:35,033 --> 00:09:38,058 of each other and there's different ways of thinking of the same problem. 127 00:09:39,008 --> 00:09:45,091 So the first scheme we're gonna be looking at is we are going to add pointers at, in 128 00:09:45,091 --> 00:09:53,052 our instruction queue and reorder buffer. To allow us to have different register 129 00:09:53,052 --> 00:09:58,072 names in them and not actually just have architectural register names in those data 130 00:09:58,072 --> 00:10:01,099 structures. The other option, which is, if you, if 131 00:10:01,099 --> 00:10:07,002 anyone actually read the Thomas Ullo algorithm paper, is to actually store the 132 00:10:07,002 --> 00:10:10,043 actual value, the data value, in these data structures. 133 00:10:10,043 --> 00:10:13,072 In the reorder buffer and in the instruction queue. 134 00:10:14,086 --> 00:10:17,859 And, and they look, they look very similar, if you sort of think about it, 135 00:10:17,859 --> 00:10:20,544 and they're gonna have the same performance. 136 00:10:20,544 --> 00:10:24,884 I'll, I'll, I'll give you the sort of end of the novel at the beginning here. 137 00:10:24,884 --> 00:10:28,983 They're gonna have the same performance, they're, they're, they're doing the same, 138 00:10:28,983 --> 00:10:33,265 doing the same thing but they're, they're slightly different in the mechanics 139 00:10:33,265 --> 00:10:37,530 perspective. And, and we are gonna start off by looking 140 00:10:37,530 --> 00:10:42,505 at this first one here, we are gonna have pointers in the instruction queue in the 141 00:10:42,505 --> 00:10:46,978 reorder buffer, Mainly, because we already have pointers in our design, that we 142 00:10:46,978 --> 00:10:50,992 looked at the last time, this in order issue, out of order, or similarly, in 143 00:10:50,992 --> 00:10:56,002 order fetch, out of order issue, out of order executing right back and in order 144 00:10:56,002 --> 00:10:57,020 commit design.