1 00:00:03,072 --> 00:00:08,033 Okay, so now we get to move on to even more complicated processors. 2 00:00:08,093 --> 00:00:12,023 In order issue. Or excuse me, in order front end. 3 00:00:12,023 --> 00:00:14,097 In order issue. Out of order right back. 4 00:00:14,097 --> 00:00:19,622 And in order commit. Okay so this is going to some problems 5 00:00:19,622 --> 00:00:23,825 that we have. The biggest problem it's going to solve, 6 00:00:23,825 --> 00:00:28,034 is it's going to solve a problem of precise exceptions. 7 00:00:28,034 --> 00:00:33,400 We can now have exceptions are all the way at the end, because we're, we're 8 00:00:33,400 --> 00:00:41,057 committing data in order. So let's, let's take a look that should 9 00:00:41,057 --> 00:00:47,563 probably a line there scribble that on your, your own drawings. 10 00:00:47,563 --> 00:00:53,366 Okay, so let's, let's, let's take a look at some other structures we've added to 11 00:00:53,366 --> 00:00:58,212 this diagram to, to make life a little bit more interesting. 12 00:00:58,212 --> 00:01:01,348 Okay, the front end looks pretty much the same. 13 00:01:01,348 --> 00:01:05,692 We, we split the load and the store apart into two separate pites, pipes here. 14 00:01:05,692 --> 00:01:10,596 A load pipe and a store pipe. And the store pipe is shorter because it 15 00:01:10,596 --> 00:01:13,204 just has to basically use to store, we'll say. 16 00:01:13,204 --> 00:01:16,275 Maybe it's two stages, doesn't, doesn't matter that much. 17 00:01:16,275 --> 00:01:19,475 It's not that material in this, in this drawing. 18 00:01:19,646 --> 00:01:24,877 But something interesting to look at here is we added a bunch of extra boxes over 19 00:01:24,877 --> 00:01:27,841 here on the, on the right side of this foil. 20 00:01:27,841 --> 00:01:30,456 So let's, let's, let's define these things. 21 00:01:30,456 --> 00:01:34,639 So we had our architectural register file, which is our committed state to the 22 00:01:34,639 --> 00:01:39,745 processor. And we added a second register file, 23 00:01:39,745 --> 00:01:44,815 typically called a physical register file, or prf. 24 00:01:44,815 --> 00:01:51,653 Sometimes people call this a future file and you'll see that in the literature 25 00:01:51,653 --> 00:01:56,520 there are some papers published about future files and the reason it's called 26 00:01:56,520 --> 00:02:00,591 future file is, it's basically executive speculatively in the future. 27 00:02:00,591 --> 00:02:03,917 The values in here have not been committed to the processor. 28 00:02:03,917 --> 00:02:08,720 They can be thrown it out if you take an exception, if a branch happens for a 29 00:02:08,720 --> 00:02:13,241 variety of reasons. These are speculative, you're not 30 00:02:13,241 --> 00:02:18,943 guaranteed to actually have to keep those. The architecture register filed though is 31 00:02:18,943 --> 00:02:22,278 committed state. Okay, we added, we added two other 32 00:02:22,278 --> 00:02:27,454 structures here, something we call ROB or Re-Order Buffer. 33 00:02:27,454 --> 00:02:36,288 And we added a finished store buffer. So let's, let's talk about the reorder 34 00:02:36,288 --> 00:02:42,007 buffer first. So in this pipeline, we actually want 35 00:02:42,007 --> 00:02:48,522 instructions to basically execute and write the physical register file out of 36 00:02:48,522 --> 00:02:51,589 order. This is an out of order processor. 37 00:02:51,589 --> 00:02:56,213 We'd that to happen. We're basically making the execution and 38 00:02:56,213 --> 00:03:01,846 the write back out of order. But, we want the commit to be in order. 39 00:03:01,846 --> 00:03:07,393 So we need some structure that is going to guarantee that the write back, the write 40 00:03:07,393 --> 00:03:10,481 to the architecture register file happens in order. 41 00:03:10,481 --> 00:03:16,246 And that's what the ROB is going to do. So it's going to keep it's going to keep 42 00:03:16,246 --> 00:03:22,783 completed instructions. And that could come in out of order and 43 00:03:22,783 --> 00:03:30,768 are going to leave in order. So, things come into this out of order, 44 00:03:30,768 --> 00:03:35,891 and they go out of it in order. And this is a reordering structure. 45 00:03:35,891 --> 00:03:42,438 It's typically a table that is sort of ridden well, we'll talk about that in a 46 00:03:42,438 --> 00:03:47,864 second it's ridden in different places in the pipe for a variety, for a couple 47 00:03:47,864 --> 00:03:53,937 different reasons, but you, you typically want to keep track of the instructions in 48 00:03:53,937 --> 00:03:57,489 order somehow. And then when you go to pull out of the 49 00:03:57,489 --> 00:04:00,670 reorder buffer, you want to pull in order out of it. 50 00:04:00,670 --> 00:04:07,234 But the rights in the tracking of the information that happens to it can be out 51 00:04:07,234 --> 00:04:13,049 of order. And the other thing here is this finish 52 00:04:13,049 --> 00:04:16,379 store buffer. The reason we have this finish store 53 00:04:16,379 --> 00:04:21,574 buffer is if we have a store operation, we don't want to have to have the commit 54 00:04:21,574 --> 00:04:26,182 point like here so early in the pipe. Because once you store in the main memory, 55 00:04:26,182 --> 00:04:30,361 it's really hard to go get that back, possibly even impossible, probably is 56 00:04:30,361 --> 00:04:33,117 impossible. You wrote to the main if you, if you had 57 00:04:33,117 --> 00:04:37,229 the old value and you write, overwrite it with the new value, the old value's 58 00:04:37,229 --> 00:04:40,465 forever gone in your main memory. You can't get it back. 59 00:04:40,465 --> 00:04:45,139 So, the solution that is, instead of doing the store here, you have the store happen 60 00:04:45,139 --> 00:04:48,535 later in the pipe. And if you sort of remember what you're 61 00:04:48,535 --> 00:04:52,956 supposed to do, the address and the data that's supposed to be happening. 62 00:04:52,956 --> 00:04:58,590 And, for anybody who, who cares that, that, that store has happened if it hits 63 00:04:58,590 --> 00:05:03,887 this future store buffer. So you probably need to have your loads 64 00:05:03,887 --> 00:05:08,927 check that future store buffer with higher priority than your cache. 65 00:05:08,927 --> 00:05:13,328 Because there could be a store living in that location. 66 00:05:13,598 --> 00:05:18,096 Okay, so that's the, that's the sort of, structures here. 67 00:05:18,096 --> 00:05:21,386 Let's talk about where things get read and written. 68 00:05:21,386 --> 00:05:27,128 This one is really interesting our architectural register file isn't read 69 00:05:27,128 --> 00:05:31,437 anywhere. What's up with that? 70 00:05:31,437 --> 00:05:35,405 Well, we're going to use the physical register file for all the intermediate 71 00:05:35,405 --> 00:05:38,907 values in our pipeline and the architecture register file is only there 72 00:05:38,907 --> 00:05:41,758 if we take some sort of let's say branch or interrupt. 73 00:05:41,758 --> 00:05:46,084 That's the only time we actually need to go take this information and we probably 74 00:05:46,084 --> 00:05:50,510 gonna dump it into a psychical register file or dump it into the future file when 75 00:05:50,510 --> 00:05:53,888 an interrupt happens or when a branch miss predict happens. 76 00:05:53,888 --> 00:05:59,081 But otherwise, it doesn't have to be red. Scoreboard is the same as usual. 77 00:05:59,081 --> 00:06:06,794 Read and writes in your register fetch stage, written at the right backstage and 78 00:06:06,794 --> 00:06:11,949 that's no longer tracking architectural register file registers. 79 00:06:11,949 --> 00:06:16,624 It's now tracking physical register file registers. 80 00:06:16,624 --> 00:06:23,290 Re-order Buffer. This one hell, a whole bunch of different 81 00:06:23,290 --> 00:06:27,811 places, it gets read and written. Primarily, what's going to happen is when 82 00:06:27,811 --> 00:06:32,593 the instruction is issued, so goes from the decode stage to the issue stage, 83 00:06:32,593 --> 00:06:38,011 that's going to allocate a location in the Re-Order Buffer for the entry in the 84 00:06:38,011 --> 00:06:41,077 Re-Order Buffer. And then at the end of the pipe, once the 85 00:06:41,077 --> 00:06:46,090 value completes we have to change some state information in the Re-Order Buffer, 86 00:06:46,090 --> 00:06:51,072 saying, oh, that's, output register for a particular instruction is now ready. 87 00:06:51,072 --> 00:06:56,073 And then, once we actually go to do the commit, we have to basically clean that 88 00:06:56,073 --> 00:07:04,071 instruction out of the reorder buffer. The feature store buffer is written, just 89 00:07:04,071 --> 00:07:10,093 sort of at the end of the pike here and clean when the, actually posting to 90 00:07:10,093 --> 00:07:14,053 memory. It's a, it's a little hard to draw this, 91 00:07:14,053 --> 00:07:18,065 but you, that information. Somehow from the memory system if you have 92 00:07:18,065 --> 00:07:23,055 a, a load that reads from that, it will probably read it either L-0 or L-1 in a 93 00:07:23,055 --> 00:07:27,048 sort of bypassing mode if you will. It'll go check that structure. 94 00:07:27,048 --> 00:07:33,014 We'll talk more about that next class. Okay so here is sort of a basic reorder 95 00:07:33,014 --> 00:07:36,026 buffer. If you go looking some books they have a 96 00:07:36,026 --> 00:07:41,067 lot more data stored in reorder buffer cuz it's kind of minimal reorder buffer you 97 00:07:41,067 --> 00:07:50,003 need for an out of work pipe. And this reorder buffer is used to keep 98 00:07:50,003 --> 00:07:56,188 track of in order committing instructions, but things will be put into it out of 99 00:07:56,188 --> 00:07:59,060 order. So just, let's first talk about sort of 100 00:07:59,060 --> 00:08:02,061 the information here. We keep track of state. 101 00:08:02,061 --> 00:08:07,020 So what do we mean by state? So this is the state of an instruction. 102 00:08:07,020 --> 00:08:12,054 So each one of these entries here is a different in flight instruction in the 103 00:08:12,054 --> 00:08:16,703 pipeline. And we're actually going to store in order 104 00:08:16,703 --> 00:08:22,395 into the reorder buffer and we're going to keep it sort of as a, as a queue. 105 00:08:22,395 --> 00:08:28,910 So this picture here, this state is we'll say -,, - means free, and P means pending 106 00:08:28,910 --> 00:08:35,045 and F means finished probably should not have chose two F words there. 107 00:08:35,045 --> 00:08:40,596 That's a little confusing. But the newest instruction, is if we have 108 00:08:40,596 --> 00:08:46,207 a new instruction execute, it's going to end up here in this entry. 109 00:08:46,207 --> 00:08:53,240 And when an instruction commits, or retires, it's going to remove this entry, 110 00:08:53,240 --> 00:08:58,504 the bottom entry. So we basically have a, sort of circular 111 00:08:58,504 --> 00:09:05,216 buffer running around, with a head and a tail pointer, sort of chasing each other 112 00:09:05,216 --> 00:09:08,376 in this data structure. So, tail, head. 113 00:09:08,679 --> 00:09:15,825 What's interesting about this and why does this cool, is because, let's take a look 114 00:09:15,825 --> 00:09:21,361 at this instruction right here. This instruction has a F, which means 115 00:09:21,361 --> 00:09:25,806 it's, it's finished. That it's not pending in the pipe. 116 00:09:25,806 --> 00:09:32,586 It's hit the reorder buffer, the data is stored in the physical register file, and, 117 00:09:32,586 --> 00:09:38,722 but, instructions that are older than it with it these two instructions are still 118 00:09:38,722 --> 00:09:42,358 pending in the pipe. Let's say these two are multiplies and 119 00:09:42,358 --> 00:09:46,012 this is add. So this add is basically already done. 120 00:09:46,012 --> 00:09:50,603 These two instructions, which are these long laying instructions, are still 121 00:09:50,603 --> 00:09:54,543 pending in the pipe. In this cycle, we cannot commit anything. 122 00:09:54,543 --> 00:10:01,044 So we only commit instructions when the oldest instruction becomes finished. 123 00:10:01,087 --> 00:10:07,049 And that's when we can commit and remove something from the reorder buffer. 124 00:10:07,049 --> 00:10:11,008 Some other things we need to keep track of here. 125 00:10:11,030 --> 00:10:16,035 We have a bit here S for speculative. So what this means is if you have 126 00:10:16,035 --> 00:10:21,094 something like a branch. You mark instructions that are newer in 127 00:10:21,094 --> 00:10:26,008 the branch with a peck, of a speck little bit. 128 00:10:26,008 --> 00:10:30,066 So what this is saying is if that branch mispredicts, it just gives you a 129 00:10:30,066 --> 00:10:35,056 convenient place to go find all the dependence instructions on it to go flush 130 00:10:35,056 --> 00:10:38,057 and kill. So if you have a, if you have, let's say, 131 00:10:38,057 --> 00:10:43,155 one branch is allowed in the pipe at a time and the branch misdpredicts, what you 132 00:10:43,155 --> 00:10:47,380 can do is basically look for all the entries in here that have ones, and just 133 00:10:47,380 --> 00:10:50,438 invalidate them ad hoc, and just flush the entire pipe. 134 00:10:50,438 --> 00:10:55,089 You don't have to worry about there being some value you need to, to worry about. 135 00:10:55,089 --> 00:10:59,562 So it's just a commute way to figure out which instruction is speculative. 136 00:10:59,562 --> 00:11:02,721 And if the branch mispredicted what you have to kill. 137 00:11:02,721 --> 00:11:06,353 Stores. We'll be talking about this in a few more 138 00:11:06,353 --> 00:11:09,431 slides later. But store bit, what we're really going to 139 00:11:09,431 --> 00:11:13,799 do is this is going to say. If this instruction is a store it knows 140 00:11:13,799 --> 00:11:19,026 that we need to do something else with it. We need to do something with the future 141 00:11:19,026 --> 00:11:24,063 store buffer when it gets to the end of the pipe sort of a meeting place to put 142 00:11:24,063 --> 00:11:27,045 it. And here is the actual business the 143 00:11:27,045 --> 00:11:31,519 business part of the reorder buffer. V, which says that the instruction 144 00:11:31,519 --> 00:11:37,200 actually writes a register and then finally once the instruction goes to the 145 00:11:37,200 --> 00:11:43,709 end of the pipe, it is going to fill in a location in here which is the physical 146 00:11:43,709 --> 00:11:48,096 register file entry that is the destination of that value. 147 00:11:48,096 --> 00:11:54,847 So this basically allows the pipeline to know where to go find the actual value. 148 00:11:54,847 --> 00:11:57,588 We don't actually store the actual values in here. 149 00:11:57,588 --> 00:12:02,652 We just store a pointer in the physical register file, because it's fewer bits. 150 00:12:02,652 --> 00:12:07,361 And this can tell us, oh, well go, go look, let's say, when this when this 151 00:12:07,361 --> 00:12:12,462 instruction here which is already finished is ready to go retire or it's ready to go 152 00:12:12,637 --> 00:12:17,048 commit, go look in physical register file number seven or something like that. 153 00:12:17,048 --> 00:12:20,037 And it goes and pulls that value out from there. 154 00:12:22,089 --> 00:12:31,003 So, so a good discussion of this is in the Shin Lapousky book that is sort of 155 00:12:31,003 --> 00:12:34,561 supplementary for this class. Okay. 156 00:12:34,561 --> 00:12:40,413 So, let's, let's talk about the anyone actually have questions first before we 157 00:12:40,413 --> 00:12:44,920 move on, 'cuz reorder buffer is, is a key data structure here, and it's a 158 00:12:44,920 --> 00:12:48,403 complicated one. Okay great. 159 00:12:48,403 --> 00:12:53,545 Next, next structure we added was the finished store buffer. 160 00:12:53,545 --> 00:12:58,055 And this could actually be multiple entries but for this pipe let's say 161 00:12:58,055 --> 00:13:01,465 there's only one. So we are only allowed to have one store 162 00:13:01,465 --> 00:13:05,024 pending in this pipeline cuz it makes life a little bit easier. 163 00:13:05,024 --> 00:13:09,796 Things you sort of need to actually have here is you need to have both the address 164 00:13:09,796 --> 00:13:15,011 and the data whether it's valid. Probably the op code the op code will tell 165 00:13:15,011 --> 00:13:19,035 you if it's store byte store word sort of data width types of things. 166 00:13:19,035 --> 00:13:25,495 And that's most of what I wanted to say here if we, this, this is what I was 167 00:13:25,495 --> 00:13:32,471 saying before if you allow multiple loads and stores in the pipe at the same time, 168 00:13:32,471 --> 00:13:39,045 you're going to have to bypass from the finished store buffer to the loads. 169 00:13:39,045 --> 00:13:42,043 And possibly in stores, if they has to be right combined. 170 00:13:42,043 --> 00:13:46,060 So, if the user stored a different parts of a word, you may have to bypass that, 171 00:13:46,060 --> 00:13:50,050 depending on how the pipe works. Or, you can assume that there is only one 172 00:13:50,050 --> 00:13:54,077 memory instruction valid in the pipe at a time, you'll have one of these entries, 173 00:13:54,077 --> 00:13:57,022 and no loads can happen while a store happens. 174 00:13:57,022 --> 00:14:01,012 That's not very good performance. People probably would not have actually 175 00:14:01,012 --> 00:14:03,095 built that, but that's, that's something to think about. 176 00:14:03,095 --> 00:14:06,003 Okay. So, now we get some more pipe line 177 00:14:06,003 --> 00:14:09,599 diagrams or on those pipeline diagrams. And we're going to see how this is 178 00:14:09,599 --> 00:14:13,046 different, and, and what happens in the reorder buffer. 179 00:14:14,304 --> 00:14:20,013 First thing I wanted to say here is this little, little r, that you see show up in 180 00:14:20,013 --> 00:14:24,049 these diagrams. That means that we've written the reorder 181 00:14:24,049 --> 00:14:29,031 buffer, but we're not ready to commit. So from here to there, we basically have 182 00:14:29,031 --> 00:14:34,044 this add has written the reorder buffer, we're waiting for it to commit at the end 183 00:14:34,044 --> 00:14:37,063 of the pipe. But we can only commit in order, so you 184 00:14:37,063 --> 00:14:40,075 can sort of see these Cs are all lined up in time. 185 00:14:40,075 --> 00:14:46,020 So we're only able to commit from left to right and we can't reorder those, those Cs 186 00:14:46,020 --> 00:14:53,009 relative to another, another C. Let's see, what did I want to say here. 187 00:14:53,009 --> 00:14:59,410 That was the main thing The dependency is the same, it's the same code we've looked 188 00:14:59,410 --> 00:15:04,737 at before. That's what I wanted to say. 189 00:15:04,737 --> 00:15:08,044 Which one is that? Yes, it's this one. 190 00:15:09,079 --> 00:15:15,052 Okay, so here we have this ad writes register twelve. 191 00:15:15,055 --> 00:15:22,096 Right there. This add goes in read's register twelve so 192 00:15:22,096 --> 00:15:29,461 we have a read after write happening. What's interesting here, is this read 193 00:15:29,461 --> 00:15:34,091 after write that's happening. The write happens there. 194 00:15:34,091 --> 00:15:41,014 The read happens, let's say here. That data's in a bypass anywhere, or it's 195 00:15:41,014 --> 00:15:44,039 not in the forwarding logic of the, of the processor. 196 00:15:45,060 --> 00:15:52,022 That value is actually in the physical register file. 197 00:15:52,022 --> 00:15:56,064 So this is kinda showing an example here that data, when you're doing the bypass, 198 00:15:56,064 --> 00:16:01,022 can come from bypass network locations, it can come from the physical register file, 199 00:16:01,192 --> 00:16:04,031 and that, those are sort of two places it can come from. 200 00:16:04,031 --> 00:16:06,951 But you don't, you can, you can, everything else actually in here, 201 00:16:06,951 --> 00:16:11,363 surprisingly, is basically coming from bypass, except for that one location. 202 00:16:11,363 --> 00:16:15,974 So bypasses end up being really important. But you can have data coming from the 203 00:16:15,974 --> 00:16:23,519 physical register file. So could the C be here, could this C move 204 00:16:23,519 --> 00:16:25,669 over one. So let's come in, in order. 205 00:16:25,669 --> 00:16:30,666 And we only have to, we can only commit one thing at a time in, in this basic 206 00:16:30,666 --> 00:16:33,171 pipe. More complex pipes, we're going to allow 207 00:16:33,171 --> 00:16:37,033 multiple commits at the same time. When we start to mix super scalars with 208 00:16:37,033 --> 00:16:41,322 out of order, at the end of today's talk, we're going to be able to think about 209 00:16:41,322 --> 00:16:44,248 trying to commit multiple things at the same time. 210 00:16:44,248 --> 00:16:47,909 But we can't really do out of order. So this has to be monotonically going that 211 00:16:47,909 --> 00:16:50,974 way. Brief example here. 212 00:16:50,974 --> 00:16:56,189 This is kinda, kinda fun. This is trying to show different entries 213 00:16:56,189 --> 00:17:00,706 into the order buffer and when those things get allocated. 214 00:17:00,706 --> 00:17:07,158 And largely what's going to happen is for a destination, so let's say instruction 215 00:17:07,158 --> 00:17:11,505 zero here allocates the reorder buffer and R1 becomes active. 216 00:17:11,505 --> 00:17:16,672 And it's a long a long way to multiply. It doesn't show up at, the, the circles 217 00:17:16,672 --> 00:17:19,764 here mean that they instruction is finished. 218 00:17:19,764 --> 00:17:24,157 It's gone to the end of the pipe, and it's ready to go. 219 00:17:24,157 --> 00:17:30,470 You could have other things, like this is an add that happens to register eleven. 220 00:17:30,473 --> 00:17:34,988 It allocates, it finishes early, but it doesn't commit till late. 221 00:17:34,988 --> 00:17:40,532 So it has to stay in the reorder buffer. So it takes up space in the reorder 222 00:17:40,532 --> 00:17:43,221 buffer. And, you can sort of see other examples 223 00:17:43,221 --> 00:17:46,126 that these, these adds here finish relatively quickly. 224 00:17:46,126 --> 00:17:49,211 But they can't, they have to wait to commit in order. 225 00:17:49,211 --> 00:17:54,227 And they're basically dependent on this instruction here, committing before they 226 00:17:54,227 --> 00:17:58,588 can go commit. So it's a nice little structure that can 227 00:17:58,588 --> 00:18:07,894 track all those things. Okay let's look at commit points and if 228 00:18:07,894 --> 00:18:15,043 exceptions occur. We are going have the serve, same example 229 00:18:15,043 --> 00:18:23,151 we had before. The mall here is going along and it write 230 00:18:23,151 --> 00:18:29,041 backs to the physical register file. Now, you'll say, woah, it wrote the 231 00:18:29,041 --> 00:18:32,023 register file. How can it take an exception at this 232 00:18:32,023 --> 00:18:34,077 point. If I was to make an exception it wasn't 233 00:18:34,077 --> 00:18:38,048 supposed to write the register file, but we have two register files. 234 00:18:38,048 --> 00:18:41,908 So, it writes the speculative state register file or the future file or the 235 00:18:42,079 --> 00:18:46,000 physical register file. And this slash here means we don't 236 00:18:46,000 --> 00:18:49,081 actually commit that instruction. So, commit doesn't, it doesn't happen. 237 00:18:51,018 --> 00:18:56,031 Now we get to go look at, sort of other in-flight instructions to see what's 238 00:18:56,031 --> 00:19:00,078 going, what's going on here. Can these other in-flight instructions 239 00:19:00,078 --> 00:19:05,072 potentially write information out of order where current commit point be? 240 00:19:06,070 --> 00:19:12,477 Well, here's this add that before, in the previous example, wrote to the register 241 00:19:12,477 --> 00:19:17,285 file, and now it writes to the physical register file, but does not write the 242 00:19:17,285 --> 00:19:21,515 architectural register file. Instead, it enters the reorder buffer 243 00:19:21,515 --> 00:19:26,402 here, denoted by the little r, and just sits there until it actually gets the 244 00:19:26,402 --> 00:19:31,638 chance to commit an order. But, that doesn't get a chance to commit 245 00:19:31,638 --> 00:19:34,749 because a previous instruction kills, kills it. 246 00:19:34,749 --> 00:19:39,579 And kill because it takes an exception and kills everything. 247 00:19:39,579 --> 00:19:44,510 And then you can go and start some new instruction here. 248 00:19:44,510 --> 00:19:51,421 Let's say that is the exception handler. And fetch, fetch that, you know out here. 249 00:19:51,678 --> 00:19:59,094 One, one interesting about this example actually that I want to say is, sort of in 250 00:19:59,094 --> 00:20:05,012 this transition, lots of stuff, lots of state has to change in the machine. 251 00:20:05,012 --> 00:20:10,487 You've take an exception, the architectural register file is correct, 252 00:20:10,487 --> 00:20:16,081 the physical register file potentially has many incorrect information, er, many 253 00:20:16,081 --> 00:20:20,778 incorrect values in it. So, on this transition, what's really 254 00:20:20,778 --> 00:20:24,973 going to happen is you're going to copy all of the state of the architectural 255 00:20:24,973 --> 00:20:29,119 register, all registers, over on top of the physical register file. 256 00:20:29,119 --> 00:20:33,827 So you basically roll back all of your speculative state in machine, in one fell 257 00:20:33,827 --> 00:20:37,623 swoop. Obviously that can maybe be a little 258 00:20:37,623 --> 00:20:39,954 expensive. But you don't take off that often. 259 00:20:39,954 --> 00:20:42,321 You do take, take branches relatively often. 260 00:20:42,321 --> 00:20:45,533 We'll talk about that in a second. But what's nice that's, that's logically 261 00:20:45,533 --> 00:20:48,156 what's happening. Sometimes people will actually co-mingle 262 00:20:48,156 --> 00:20:51,175 the architecture register file or the physical register file. 263 00:20:51,175 --> 00:20:54,440 And they just sort of keep pointers to different pieces of information. 264 00:20:54,440 --> 00:20:58,404 So you don't actually have to sort of roll back information, you just sort of change 265 00:20:58,404 --> 00:21:00,981 the pointers. But for right now, let's model it as two 266 00:21:00,981 --> 00:21:06,029 complete separate register files where you copy all the state from the architecture 267 00:21:06,029 --> 00:21:12,113 register file to the physical register file on some form of roll back on an 268 00:21:12,113 --> 00:21:15,451 exception or a branch. Branches. 269 00:21:15,451 --> 00:21:18,784 So, what do, how do, how we make the branch latency better? 270 00:21:18,784 --> 00:21:22,200 What, what do we do out of branch first of all? 271 00:21:22,200 --> 00:21:24,490 So, sort of ignore these bottom examples here. 272 00:21:24,490 --> 00:21:30,436 This is a different code sequence that we have looked at, its not the multiply, add 273 00:21:30,436 --> 00:21:33,725 multiply, add code sequence. Instead this is a branch. 274 00:21:33,725 --> 00:21:37,677 So, we have a branch. The branch commits. 275 00:21:37,677 --> 00:21:42,835 We know the branch is good. But these instructions here, are the fall 276 00:21:42,835 --> 00:21:46,384 through case for the branch. This instruction here is the target for 277 00:21:46,384 --> 00:21:48,442 the branch. So, we need to squash all these 278 00:21:48,442 --> 00:21:52,054 instructions in the reorder buffer. Conveniently we have a bit in the reorder 279 00:21:52,054 --> 00:21:56,085 buffer that says all the things that were dependent on the branch, if the branch is 280 00:21:56,085 --> 00:21:59,935 misspeculated, just remove them from the reorder buffer and basically throw 281 00:21:59,935 --> 00:22:04,068 everything out of our reorder, or throw those entries out of the reorder buffer, 282 00:22:04,068 --> 00:22:09,698 invalidate them in the reorder buffer. What gets a little interesting here is 283 00:22:09,698 --> 00:22:16,311 when do we start to execute target? Well, let's say we compute the branch 284 00:22:16,311 --> 00:22:23,607 information here in the execute stage, and we can sort of re-direct the fetch stage, 285 00:22:23,607 --> 00:22:27,548 That's okay, but the squash is a little bit odd. 286 00:22:27,548 --> 00:22:32,484 Because what this really says, from a pipeline perspective, is that you have to 287 00:22:32,484 --> 00:22:36,407 invalidate multiple entries in the reorder buffer in one cycle. 288 00:22:36,407 --> 00:22:41,157 And this, to some extent, is a structural hazard on the reorder buffer. 289 00:22:41,157 --> 00:22:46,450 You might need, you know, many, many ports into that register, into that, reorder 290 00:22:46,450 --> 00:22:51,976 buffer, or you need to at least keep the valid bits in some other extremely highly 291 00:22:51,976 --> 00:22:57,904 ported structure. You could think about doing something even 292 00:22:57,904 --> 00:23:01,553 more interesting where you kill instructions early. 293 00:23:01,553 --> 00:23:08,469 So the difference between this picture and this picture is once we compute and figure 294 00:23:08,469 --> 00:23:13,478 out that the branch is taken, we just instantaneously squash all these 295 00:23:13,478 --> 00:23:16,521 instructions, and we change the re-order buffer. 296 00:23:16,521 --> 00:23:21,616 Or we, we write to the reorder buffer, killing all the speculative instructions. 297 00:23:21,616 --> 00:23:25,701 Now if you note, this doesn't actually help performance in this case. 298 00:23:25,701 --> 00:23:30,485 Places where this can help performance is if you have an out of order processor 299 00:23:30,677 --> 00:23:35,432 with, that's a super scalar processor? You could think they could try to put 300 00:23:35,432 --> 00:23:40,550 other instructions in these locations in the pipe or try to restart earlier or have 301 00:23:40,550 --> 00:23:45,210 other things go on in the pipe and you're just using less resources in the pipe. 302 00:23:45,210 --> 00:23:49,478 So this is gonna be the highest performance case, this is, sort of, going 303 00:23:49,478 --> 00:23:53,446 to be medium performance. Low performance, you can have a way that 304 00:23:53,446 --> 00:23:57,533 you don't actually have to add extra ports to your reorder buffer. 305 00:23:57,533 --> 00:24:02,406 And way you can do that is you let the inflate instructions that are dead 306 00:24:02,406 --> 00:24:08,098 continue going down the pipe until they get to the commit stage and only then you 307 00:24:08,098 --> 00:24:13,533 clean them out of the pipe. And you clean out the reorder buffer. 308 00:24:13,533 --> 00:24:18,384 So, you, sort of, are waiting for these special instructions to reach the commit 309 00:24:18,384 --> 00:24:22,136 stage and squash them there. In this example the performance of all 310 00:24:22,136 --> 00:24:26,244 three of these are the same, but I will say, this is going to be the lowest 311 00:24:26,244 --> 00:24:31,702 performance if you have a more complicated code sequence, cuz you are basically using 312 00:24:31,702 --> 00:24:36,304 up a lot of pipeline resources, you're using entries in the reorder buffer, 313 00:24:36,304 --> 00:24:41,775 you're using locations in the pipes that you could try to reuse for something else. 314 00:24:41,775 --> 00:24:45,966 Okay. So as we said, we sort of have these three 315 00:24:45,966 --> 00:24:55,493 different cases, in increasing complexity but you get some performance. 316 00:24:55,493 --> 00:25:03,323 I'm sorry, in decreasing complexity but increasing performance. 317 00:25:03,323 --> 00:25:07,018 So, so, I think one thing that definitely comes up. 318 00:25:07,018 --> 00:25:12,051 And this is probably going to make this multi-ported issue come up, is if you have 319 00:25:12,051 --> 00:25:15,046 multiple branches in the pipe at the same time. 320 00:25:16,007 --> 00:25:20,079 Then, the simple case of just moving, the top pointer's not really going to work 321 00:25:20,079 --> 00:25:25,075 because you might miss-predict one of the branches but not the other branch, that's 322 00:25:25,075 --> 00:25:31,050 going to, mess you up a little bit. Okay. 323 00:25:31,050 --> 00:25:45,011 So, lets keep moving on here avoiding stalls due to store misses. 324 00:25:45,066 --> 00:25:53,504 Okay, so you've got a store in the pipe. It takes a cache miss, and now it's 325 00:25:53,504 --> 00:25:56,808 clogging up the commit point of the processor. 326 00:25:56,808 --> 00:26:02,864 Because, depending on how you want to look at this, maybe you don't want to commit 327 00:26:02,864 --> 00:26:08,439 until that store has actually reached main memory, cuz that's where you're going to 328 00:26:08,439 --> 00:26:13,024 call commit for that store. So you can actually pull it out of the 329 00:26:13,251 --> 00:26:18,086 future store buffer, cuz it's able to actually to sort of commit that you, you 330 00:26:18,086 --> 00:26:22,348 try to pull it out of the future store buffer and write it to main memory. 331 00:26:22,348 --> 00:26:27,316 Or you write it to your cache it doesn't, it misses your cache and takes a couple of 332 00:26:27,316 --> 00:26:29,981 extra cycles. So we'll see like this, here's a store 333 00:26:29,981 --> 00:26:34,467 word and let's say it takes a few extra cycles here, three extra cycles stalling 334 00:26:34,467 --> 00:26:37,880 to actually to go and write the level two cache we'll say. 335 00:26:37,880 --> 00:26:42,052 Or pull in the data from the level two cache into the L1 cache and merge there. 336 00:26:42,052 --> 00:26:47,094 So there's, there's a way to solve that. And, and what, what's bad about this, is 337 00:26:47,094 --> 00:26:52,095 because we're doing in or commit, it pushes out the rest of these instructions 338 00:26:52,095 --> 00:26:55,029 later. And that, that's kind of bad. 339 00:26:55,029 --> 00:27:00,738 So, what you can think about doing is adding an extra stage in the pipe and just 340 00:27:00,738 --> 00:27:06,942 allowing the store to miss and basically moving past the commit station this store 341 00:27:06,942 --> 00:27:11,017 has committed. You, sort of, mark it down and say, well 342 00:27:11,017 --> 00:27:15,005 it's committed, I don't have to worry about this anymore. 343 00:27:15,005 --> 00:27:20,053 And you basically can decouple the ends of the pipe here or the store actually 344 00:27:20,053 --> 00:27:25,066 happening to memory until later. And all you do is, you just have commit in 345 00:27:25,066 --> 00:27:30,059 order. You can pull back these things earlier. 346 00:27:31,423 --> 00:27:40,271 This looks like a typo this should probably be back one and then you can, you 347 00:27:40,271 --> 00:27:45,331 can commit in order, and have that store sort of still outstanding out to main 348 00:27:45,331 --> 00:27:48,071 memory. One important thing you need to do here, 349 00:27:48,071 --> 00:27:52,681 as I, as I've said before, if you let another load into the pipe, or a store 350 00:27:52,681 --> 00:27:58,254 into the pipe, you're going to have to bypass out of this data structure, and 351 00:27:58,254 --> 00:28:03,240 that data structure now, back to the load stage of the pipe or the store stage of 352 00:28:03,240 --> 00:28:06,059 the pipe. And that, that adds extra wires into your, 353 00:28:06,059 --> 00:28:12,068 out of your processor. But we basically decoupled store committal 354 00:28:12,068 --> 00:28:18,091 from or it's technically committed once it gets past this point. 355 00:28:18,091 --> 00:28:24,663 But it's not in main memory. But it's, to everyone else and to the, 356 00:28:24,663 --> 00:28:27,661 the, the processor it looks like it's been committed. 357 00:28:27,661 --> 00:28:33,038 Cuz you can, you try to go read the value and it's it looks like it's committed.