1 00:00:03,280 --> 00:00:09,424 Okay, now we, now we get to move on to the meat of today, we're going to talk about 2 00:00:09,424 --> 00:00:12,727 non-blocking caches. Also known as out of order memory systems. 3 00:00:12,727 --> 00:00:16,723 Also known as lock up free caches. So I think the first paper that actually 4 00:00:16,723 --> 00:00:19,281 published on this called the Lockup Free Cache. 5 00:00:19,281 --> 00:00:22,424 Lots of people call these things non-blocking caches today. 6 00:00:22,584 --> 00:00:26,847 And if you think about it from a memory perspective it's an out of order memory 7 00:00:26,847 --> 00:00:32,658 system. What is non-blocking cache allow you to 8 00:00:32,658 --> 00:00:36,185 do? Well, it enables you to have subsequent 9 00:00:36,185 --> 00:00:42,977 memory operations occurring from the main process or pipeline even when you have a 10 00:00:42,977 --> 00:00:45,656 miss that was earlier in your instruction sequence. 11 00:00:45,656 --> 00:00:49,910 So all pipelines we looked at up to this point, you take it, even our out of order 12 00:00:49,910 --> 00:00:53,744 pipelines, we talked about them as basically saying a cache miss you just 13 00:00:53,744 --> 00:00:57,736 sort of stopped the pipe because we couldn't deal with having memory coming 14 00:00:57,736 --> 00:01:00,782 out of order even to our out of order processor pipelines. 15 00:01:00,782 --> 00:01:03,145 We didn't have enough bits to track all that. 16 00:01:03,145 --> 00:01:07,137 Well now we're going to talk about structures that allow us to track out of 17 00:01:07,137 --> 00:01:13,553 order memory. So, two major things this allows you to 18 00:01:13,553 --> 00:01:17,654 do. It allows you to have a cache hit under a 19 00:01:17,654 --> 00:01:22,120 miss, and it allows you to have a miss under miss. 20 00:01:22,520 --> 00:01:26,531 So what do I mean by miss under-miss? Well, what I mean by miss under-miss is 21 00:01:26,531 --> 00:01:28,114 you, do access. Like say a load. 22 00:01:28,114 --> 00:01:30,120 It some address. It takes a cache miss. 23 00:01:30,960 --> 00:01:35,788 And you have a load, a secondary load but you keep executing your program. 24 00:01:35,788 --> 00:01:41,477 You do another load and that takes a cache miss also and you are able to process both 25 00:01:41,477 --> 00:01:46,769 these things if you have a non-blocking cache which allows you to have miss under 26 00:01:46,769 --> 00:01:50,203 miss. One of the big points I want to get across 27 00:01:50,203 --> 00:01:53,806 today is that this is not only for out of order processors. 28 00:01:53,806 --> 00:01:56,615 We talked a lot about out of order processors. 29 00:01:56,615 --> 00:02:01,440 Believe it or not, you can actually hook up a non blocking cache to an in order 30 00:02:01,440 --> 00:02:04,127 processor. Or even a VLIW in order processor. 31 00:02:04,127 --> 00:02:06,570 Now, how do you go about doing that? Well. 32 00:02:06,570 --> 00:02:11,510 Two, two major ways that you can do that one is when you take the cache miss, you 33 00:02:11,510 --> 00:02:15,114 mark the variable, or you mark the register as not being there. 34 00:02:15,114 --> 00:02:18,078 So when you go to actually read the data, you block. 35 00:02:18,078 --> 00:02:21,856 So we'll show an example of that in, in few slides up from now. 36 00:02:21,856 --> 00:02:25,983 But, really what I was trying to get a across is you can have either of 37 00:02:25,983 --> 00:02:28,540 processors with out of order memory systems. 38 00:02:28,540 --> 00:02:32,725 You can have out of order processor without out of order memory systems. 39 00:02:32,725 --> 00:02:36,285 Both these things are, are possible. Okay. 40 00:02:36,285 --> 00:02:42,280 So, challenge list couple, couple big challenges here. 41 00:02:44,740 --> 00:02:48,067 Hurry up. A couple of big challenges. 42 00:02:48,344 --> 00:02:56,016 If you have multiple out of order misses, well, the memory system is going to return 43 00:02:56,016 --> 00:03:00,185 data sort of out of order. You have to deal with it somehow. 44 00:03:00,185 --> 00:03:04,305 It's possible you might end up in a different memory bank, and the data comes 45 00:03:04,305 --> 00:03:06,926 back, in different orders, that you, sent it out in. 46 00:03:06,926 --> 00:03:10,672 And this gets hard, to deal with. You, you sent out a cache miss for x, y, 47 00:03:10,672 --> 00:03:14,256 and z and they come back in z, y, and x, order, or something like that. 48 00:03:14,256 --> 00:03:18,109 And you need to make sure you're delivering, the right data to the right 49 00:03:18,109 --> 00:03:21,640 location in the cache, and the right data to the right, instruction. 50 00:03:21,640 --> 00:03:25,331 So we're going to need some big associative table to go figure this out. 51 00:03:26,280 --> 00:03:32,699 Second problem, major challenge here is lots of times you're going to have a load 52 00:03:32,699 --> 00:03:39,040 and a store to the address that you want to take a load and store miss to later. 53 00:03:40,227 --> 00:03:44,988 So what do I mean by that? Well, it's pretty that you are going to be 54 00:03:44,988 --> 00:03:49,875 doing loads sequentially through memory. And if the first load in a cash line 55 00:03:49,875 --> 00:03:52,847 misses, The second move's also going to miss, but 56 00:03:52,847 --> 00:03:57,814 they're on the same cache line. You don't want to send two of those out to 57 00:03:57,814 --> 00:04:01,920 the memory system at the same time cuz you might confuse the memory system. 58 00:04:02,900 --> 00:04:08,130 Worse, let's say if you have a load to one address and a store to another address. 59 00:04:08,130 --> 00:04:10,745 But the they are both in the same cache line. 60 00:04:10,745 --> 00:04:15,511 The load might go out to memory, you start gaining new memory, you do the store the 61 00:04:15,511 --> 00:04:20,335 store goes out to memory and you actually store on the main memory somehow but you 62 00:04:20,335 --> 00:04:25,159 bring in the original load data and they sort of pass in transit or something like 63 00:04:25,159 --> 00:04:29,692 that, the load data, passes the store data, and all of a sudden you are going to 64 00:04:29,692 --> 00:04:34,052 have the wrong data in your cache. You don't have it updated, and the reason 65 00:04:34,052 --> 00:04:38,155 that ends up happening the store didn't have anyplace to merge into. 66 00:04:38,155 --> 00:04:42,526 It couldn't actually go deposit it's data into the cache, for instance. 67 00:04:42,526 --> 00:04:47,708 Okay, so how do we go about handling this. But before we do that, let's, let's look 68 00:04:47,708 --> 00:04:53,134 at a, a timeline here. Time goes from left to right on this 69 00:04:53,134 --> 00:04:56,785 graph. At the top here we have a blocking cache. 70 00:04:56,785 --> 00:05:02,829 The blocking cache you are happily running the cpu you do a load we'll say or store 71 00:05:02,829 --> 00:05:08,291 doesn't matter in these systems, and you take cache miss, and the most basic 72 00:05:08,291 --> 00:05:14,190 blocking cache you're going to wait, and wait for the cache line to get filled in. 73 00:05:14,190 --> 00:05:20,013 And then you can return the data and then keep running the cpu, there's no overlap 74 00:05:20,013 --> 00:05:23,706 happening here. But we want to go fast, we want to go 75 00:05:23,706 --> 00:05:27,115 faster. If we have a bound blocking cache, we can 76 00:05:27,115 --> 00:05:32,360 take a hit under a miss. So what that means is, we're running 77 00:05:32,360 --> 00:05:37,720 along, we take a cache miss. This goes out to main memory. 78 00:05:38,500 --> 00:05:46,512 But we don't stop the CPU from executing. So let's say it is a, a load for register 79 00:05:46,512 --> 00:05:50,848 five. Well as long as no one looks at register 80 00:05:50,848 --> 00:05:55,720 five . Does the processor care that we took cache miss to register five? 81 00:05:57,280 --> 00:06:00,139 Probably not. Now there might be some complexities 82 00:06:00,139 --> 00:06:05,000 around if that load to register five takes a trap or something like that, or takes an 83 00:06:05,000 --> 00:06:07,630 interrupt. Then it might care, but let's say it 84 00:06:07,630 --> 00:06:12,548 doesn't for, for, for One of the reasons that this is actually usually safe to do 85 00:06:12,548 --> 00:06:16,323 is that you've already done all the memory checks and you've done. 86 00:06:16,323 --> 00:06:20,783 You're sort of passed the commit point to that point of the processor, because 87 00:06:20,783 --> 00:06:25,015 you're already pretty late in the processor pipe by the time you know you 88 00:06:25,015 --> 00:06:27,360 get a cache miss. So it's not too harmful. 89 00:06:27,360 --> 00:06:32,338 But then, you keep executing here and you take, you exit your cache again you do a 90 00:06:32,338 --> 00:06:37,289 different load, you get hit. That's great, you just keep executing. 91 00:06:37,289 --> 00:06:41,080 And it's only when you go to use the data do you stall. 92 00:06:41,560 --> 00:06:47,420 And effectively we've executed, we've overlapped computation with our mispenalty 93 00:06:48,420 --> 00:06:54,508 And this, this is, this is pretty nice. That's something else we can do is misses 94 00:06:54,508 --> 00:06:58,539 under misses. So a miss under miss, we're executing 95 00:06:58,539 --> 00:07:03,108 along here, we take cache miss and we send out something out to the main memory 96 00:07:03,108 --> 00:07:07,735 system to go get the data but we don't stop executing, we keep executing, we take 97 00:07:07,735 --> 00:07:11,205 another miss and we send that out to the main memory system. 98 00:07:11,205 --> 00:07:15,080 At some point, maybe we actually go to look at the data or possibly. 99 00:07:15,520 --> 00:07:20,547 Maybe we don't actually look at the data until here and the see if you just keeps 100 00:07:20,547 --> 00:07:25,037 executing the whole time. It overlaps multiple memory accesses with 101 00:07:25,037 --> 00:07:27,920 computation. So this can be really powerful. 102 00:07:28,880 --> 00:07:32,140 And you can do this, as I said, with in-order processor. 103 00:07:32,700 --> 00:07:39,565 One thing I do want to say is, usually you have a limited number, of outstanding 104 00:07:39,565 --> 00:07:42,606 memory accesses. Some small integer. 105 00:07:42,606 --> 00:07:49,297 Maybe like four or eight This diminisher returns as you add more and more. 106 00:07:49,558 --> 00:07:56,075 But usually this isn't sort of, thousands of outstanding memory accesses. 107 00:07:56,075 --> 00:08:00,160 So, let's, let's look at this, the structure here. 108 00:08:00,620 --> 00:08:03,293 There's a couple different names for this thing. 109 00:08:03,460 --> 00:08:07,304 It depends what school of computer architecture design you come from. 110 00:08:07,304 --> 00:08:11,816 If you are from the alpha or the Digital Equipment Corporation design philosophy 111 00:08:11,816 --> 00:08:14,713 you're going to call this thing a miss address file. 112 00:08:14,713 --> 00:08:18,891 If you come from the Intel school of things you're going to call it a Miss 113 00:08:18,891 --> 00:08:22,456 Status Handling Register. Miss Status Handling Register actually 114 00:08:22,456 --> 00:08:26,076 predates Intel goes back to I believe Control Data Corp. 115 00:08:26,244 --> 00:08:30,756 There's a paper from Control Data Corp. On Miss Status Handling Registers back in 116 00:08:30,756 --> 00:08:33,040 the like, I don't know, late'60s early'70s. 117 00:08:33,040 --> 00:08:39,509 Let's look in, inside of this thing and, and so understand whats, whats going on. 118 00:08:39,509 --> 00:08:46,227 You would have some small number of miss status handling registers probably an 119 00:08:46,227 --> 00:08:50,140 unregistered file. And you have a valid bit. 120 00:08:50,580 --> 00:08:55,584 Your going to have a block address. Now this block address is not the address 121 00:08:55,584 --> 00:09:01,265 and load of the store, it's the address and load of the store it's the address of 122 00:09:01,265 --> 00:09:04,943 the cache line That awaience level in the store. 123 00:09:04,943 --> 00:09:09,915 Because, well when you use this structure for us, we're going to use this to check 124 00:09:09,915 --> 00:09:14,887 subsequent memory accesses or subsequent misses against previous ones that have 125 00:09:14,887 --> 00:09:18,903 been going on. And we may have multiple sort of in flight 126 00:09:18,903 --> 00:09:23,747 here, we don't actually need the address of the, the load or the store. 127 00:09:23,747 --> 00:09:28,720 We need the address of the entire line cuz we need to check the entire line. 128 00:09:29,581 --> 00:09:33,720 Cuz that's what's in play. We have a bit here that says whether it's 129 00:09:33,720 --> 00:09:37,168 issued or not. Now, why, why we have that is just because 130 00:09:37,168 --> 00:09:41,996 you took a miss, doesn't mean it's actually out of the memory system or going 131 00:09:41,996 --> 00:09:45,320 to the memory system yet. So you sort of fill this in. 132 00:09:46,120 --> 00:09:50,023 And it's sits around there, until you have some time to go and talk to the rain 133 00:09:50,023 --> 00:09:52,099 memory. And hopefully that happens quickly. 134 00:09:52,099 --> 00:09:55,854 Some architectures you may not even need this you might just stall, until it 135 00:09:55,854 --> 00:09:59,363 actually goes out to main memory. But once issued you, you take that bit. 136 00:09:59,363 --> 00:10:03,119 And, and what this, what this allows you to do, it allows you to have multiple 137 00:10:03,119 --> 00:10:06,528 misses, which are not issued. So if you have a miss, under a miss, under 138 00:10:06,528 --> 00:10:10,531 a miss and it happened really quickly, you can fill up this table quickly, and not 139 00:10:10,531 --> 00:10:14,336 actually have to wait for it to stream out to main memory although request out to 140 00:10:14,336 --> 00:10:16,404 memory yet. That's half of it. 141 00:10:16,404 --> 00:10:19,549 And then we have a bunch of load store entries. 142 00:10:19,549 --> 00:10:23,631 Now with these load store entries they also have a valid bit. 143 00:10:23,631 --> 00:10:27,513 And they have a pointer back to one of these entries here. 144 00:10:27,513 --> 00:10:32,733 So this is a number which says sort of which entry you are in the miss status 145 00:10:32,733 --> 00:10:37,400 registry file. And this is for individual loads or stores 146 00:10:37,400 --> 00:10:42,605 that are occurring. So what allows, allows it to happen is if 147 00:10:42,605 --> 00:10:50,152 you have a load miss to address zero and you've a load miss to address I don't know 148 00:10:50,152 --> 00:10:56,326 ten And these both are in the same cash line, you can fill in this table with two 149 00:10:56,326 --> 00:11:01,297 entries and they'll actually both point back to the same miss status handling 150 00:11:01,297 --> 00:11:04,420 register location, they'll have different offsets. 151 00:11:04,420 --> 00:11:09,490 And the, the destination field, you'll fill, you'll fill in which register on the 152 00:11:09,490 --> 00:11:13,918 processor they're destined for. So what we're going to do is we're going 153 00:11:13,918 --> 00:11:18,731 to use these table such that when a load comes back, or excuse me, when a cache 154 00:11:18,731 --> 00:11:24,058 line comes back from main memory, we'll be able to check in these two tables and do 155 00:11:24,058 --> 00:11:27,459 two things. One, we'll be able to fill it into a cache 156 00:11:27,459 --> 00:11:30,733 somewhere. And two, we'll be able to return the data 157 00:11:30,733 --> 00:11:37,144 to the correct destinations.. And we'll be able to find which piece of 158 00:11:37,144 --> 00:11:40,240 data we need, given the offset in the cache line. 159 00:11:42,120 --> 00:11:48,174 That, that, that's a mouthful to say. So this is a little bit of a complicated 160 00:11:48,174 --> 00:11:54,388 data structure here to actually work out. Now I want to point out where the 161 00:11:54,388 --> 00:12:01,240 associativity, associative matches versus indexed matches happen in these tables. 162 00:12:01,240 --> 00:12:07,932 So one of the, the things you might notice is that this block address, we're going to 163 00:12:07,932 --> 00:12:12,390 have to check every subsequent miss. Against this block address. 164 00:12:12,390 --> 00:12:16,573 So we're going to take the high, higher order bits of the address, check it 165 00:12:16,573 --> 00:12:19,637 against this. And the checks that, we don't add a new 166 00:12:19,637 --> 00:12:22,759 entry here. Instead we just merge it into a currently 167 00:12:22,759 --> 00:12:26,000 pending one. But we do have to add new load store entry 168 00:12:26,000 --> 00:12:33,217 for that, which points back over here. When a memory transaction comes back from 169 00:12:33,217 --> 00:12:38,179 main memory, we're going to look back in this table and say, uh-oh, 170 00:12:38,179 --> 00:12:43,600 Here's the one that I, I had issued. I need to clear it out of the table. 171 00:12:44,680 --> 00:12:48,857 And given the number of the location I'm clearing out a table. 172 00:12:48,857 --> 00:12:54,248 I'm going to associatively check that against all of the load store entries and 173 00:12:54,248 --> 00:12:58,493 wake up the ones that match. So, if this is entry number one and 174 00:12:58,493 --> 00:13:03,749 there's multiple number ones in here. All of the ones that have number ones in 175 00:13:03,749 --> 00:13:07,051 here, I need to return that data to the registers. 176 00:13:07,051 --> 00:13:12,037 Now when I return the data to the register, I have to mark the register as 177 00:13:12,037 --> 00:13:19,407 being available again. And the type here just basically says, you 178 00:13:19,407 --> 00:13:25,400 know, is it word, half word, bytes, and is it a load versus a store. 179 00:13:25,780 --> 00:13:33,436 One other thing I wanted to say is destination here. For loads it's going to 180 00:13:33,436 --> 00:13:37,203 be a register identifier. Now this might be a register identifier 181 00:13:37,203 --> 00:13:42,013 versus a virtual register identifier or a optical register identifier if you had a 182 00:13:42,013 --> 00:13:46,823 registry neighbor. So you have a out of order processor here, this might get more 183 00:13:46,823 --> 00:13:49,720 complicated. This is typically a physical register 184 00:13:49,720 --> 00:13:52,502 identifier, not a artificial register identifier. 185 00:13:52,502 --> 00:13:56,756 For stores. You also effectively need to track 186 00:13:56,756 --> 00:14:00,985 something in this table. Because you need to merge and store with 187 00:14:00,985 --> 00:14:04,888 the background data. So, this going to, basically, add an entry 188 00:14:04,888 --> 00:14:08,335 in here. It's going to sound to main memory, go get 189 00:14:08,335 --> 00:14:11,978 the background data. And when that comes back, it's going to 190 00:14:11,978 --> 00:14:18,073 deposit that into our cache. And we need to point at some other buffer. 191 00:14:18,073 --> 00:14:22,652 This is very similar, we had that future store buffer. 192 00:14:22,652 --> 00:14:26,331 You can have an array for those, for instance. 193 00:14:26,331 --> 00:14:32,381 And it'll tell you which one to go play the store against our cache width. 194 00:14:32,381 --> 00:14:37,777 And you can merge the return data with the, the store data, that we've stored 195 00:14:38,022 --> 00:14:40,525 locally. So this is kinda fun. 196 00:14:40,525 --> 00:14:43,093 We can have memory coming back out of order. 197 00:14:43,093 --> 00:14:45,720 We can have memory being issued out of order. 198 00:14:46,520 --> 00:14:51,427 Lot's of, lot's of fun, fun toys here and one of the fun things we can do is we, we 199 00:14:51,427 --> 00:14:56,636 have ability cuz the associated check here to check to make sure that the data coming 200 00:14:56,636 --> 00:15:01,059 back or assuming subsequent requests hit or miss here and, and will merge. 201 00:15:01,059 --> 00:15:06,087 So we don't have to generate more memory traffic or cause strange things happening 202 00:15:06,087 --> 00:15:11,115 where you have responses coming back and new requests going out from the same line 203 00:15:11,115 --> 00:15:16,445 and you might lose some data in flight. I should put up this is only one 204 00:15:16,445 --> 00:15:19,910 implementation. Lot's of times, what would people do is 205 00:15:19,910 --> 00:15:24,711 in, in this miss status handling register, there will actually be a tag field because 206 00:15:24,711 --> 00:15:29,634 main memory will not necessarily keep track of the entire address when you get a 207 00:15:29,634 --> 00:15:33,342 response back from main memory. Instead, it'll just have a tag. 208 00:15:33,342 --> 00:15:38,386 So, instead of checking against the block address you might check against a smaller 209 00:15:38,386 --> 00:15:41,375 tag. That's one way people sort of get make 210 00:15:41,375 --> 00:15:45,281 this a little bit easier. Another way most people do this is the 211 00:15:45,281 --> 00:15:50,223 tag, if you have a small number of cores might actually be which entry you are in 212 00:15:50,223 --> 00:15:53,579 this table so you don't even have to associative check. 213 00:15:53,579 --> 00:15:58,339 That's sort of optimization on this. You still need to do associative check 214 00:15:58,339 --> 00:16:03,160 when future loads and stores that miss go out to routinely check this table. 215 00:16:04,020 --> 00:16:08,078 I think I, I think I swapped through most of this, already. 216 00:16:08,292 --> 00:16:12,849 On cache miss you have to check the table for a matched address. 217 00:16:12,849 --> 00:16:19,257 If found you allocate a new entry that points to the miss handled entry register. 218 00:16:19,257 --> 00:16:24,811 If not, you have to both a load entry load store entry, and miss status handling 219 00:16:24,811 --> 00:16:28,221 register entry. One thing I did want to point out here is 220 00:16:28,221 --> 00:16:32,123 if you run out of miss status handling registers or load storer entries, that's 221 00:16:32,123 --> 00:16:35,088 not the end of the world. You can just stop the processor. 222 00:16:35,088 --> 00:16:39,250 So let's say you have eight of them and you run out of all eight, you have eight 223 00:16:39,250 --> 00:16:42,424 outstanding memory transactions. You can just stop for awhile. 224 00:16:42,424 --> 00:16:46,118 It's going to come back at some point. One of the loads, or one of the main 225 00:16:46,118 --> 00:16:50,020 memory accesses are going to come back. So you can just stall for a little bit. 226 00:16:50,280 --> 00:16:55,281 I need a return, need to find the load in store that's waiting for it. 227 00:16:55,498 --> 00:16:59,920 Going back to what, Berchun said it's very possible that a, 228 00:17:00,880 --> 00:17:05,750 The loading store that was waiting on it might've actually disappeared in that time 229 00:17:05,750 --> 00:17:08,417 period. Well, at least for a load, cuz you might 230 00:17:08,417 --> 00:17:12,417 have write after write occurs. That's okay, you still want to fill it in 231 00:17:12,417 --> 00:17:16,302 your cache at that point. And of course you can have multiple loads 232 00:17:16,302 --> 00:17:20,585 and stores. When the catch line is completely returned 233 00:17:20,585 --> 00:17:26,072 and you finished checking against all the load and store entries you would 234 00:17:26,072 --> 00:17:31,340 deallocate both load and store entries and mustache handling registries. 235 00:17:33,560 --> 00:17:39,143 Okay, so little bit of oops, fun here with in order machines so you can sort of see 236 00:17:39,143 --> 00:17:43,515 how this relatively logically fits into an out of order pipeline. 237 00:17:43,515 --> 00:17:48,829 If you were to fit this into an in order pipeline, you can its not too hard, you 238 00:17:48,829 --> 00:17:52,798 can actually add a scoreboard for each individual register. 239 00:17:52,798 --> 00:17:57,911 And when I say scoreboard, you're not tracking where the data is coming from, 240 00:17:57,911 --> 00:18:02,018 instead there's a special bit saying, This register is out to lunch. 241 00:18:02,018 --> 00:18:06,443 This register is out of the memory system. If you try to go access it, just wait or 242 00:18:06,443 --> 00:18:08,979 just stall. And then when the memory, cuz it's a 243 00:18:08,979 --> 00:18:13,296 variable length sort of thing, so your scoreboard can have this will be ready in 244 00:18:13,296 --> 00:18:17,020 five seconds." You don't know. It's out of the memory, the memory system. 245 00:18:17,540 --> 00:18:21,312 Unload miss here. You can mark the, you mark the destination 246 00:18:21,312 --> 00:18:24,780 register as busy. When it come back, you mark it available 247 00:18:24,780 --> 00:18:29,100 and uninstall the processor. But if no one actually went to go use that 248 00:18:29,100 --> 00:18:33,786 register, in the mean time while it was out of main memory, the main processor 249 00:18:33,786 --> 00:18:39,640 never stalls and no ones ever the wiser. Okay, so, I wanted to, we're almost out of 250 00:18:39,640 --> 00:18:43,777 time, so I wanted to, pick it up here for a second. 251 00:18:43,777 --> 00:18:49,058 So, non-blocking, caches, they can effectively increase, the, bandwidth to 252 00:18:49,058 --> 00:18:51,793 your, lower levels of caches, your sort of, L1's. 253 00:18:51,971 --> 00:18:56,668 The only thing they can do is they can increase the bandwidth because they can 254 00:18:56,668 --> 00:19:00,472 merge, misses to your cache. Now you probably would have actually 255 00:19:00,472 --> 00:19:05,049 gotten that anyway, if you had a blocking cache, but, the missed cache handling 256 00:19:05,049 --> 00:19:09,686 register basically, allows you to have multiple cache misses merge into one 257 00:19:09,686 --> 00:19:13,571 transaction. Your missed penalty is obviously lower 258 00:19:13,571 --> 00:19:17,895 because, going back to this picture here, we've overlapped the missed penalty of 259 00:19:17,895 --> 00:19:18,880 other useful work.