1 00:00:03,670 --> 00:00:07,884 OK. So let's take a look at the problems with 2 00:00:07,884 --> 00:00:14,690 having a bus and actually, the bigger problems with having a cache. 3 00:00:15,930 --> 00:00:19,530 So here we show a little bit of an example here which has two CPUs. 4 00:00:23,470 --> 00:00:26,730 And two caches. And we have mean memory. 5 00:00:26,730 --> 00:00:30,280 And then, everything sits on a bus. Now, why do we want a cache? 6 00:00:30,280 --> 00:00:32,680 Well, we've discussed this in great detail in 7 00:00:32,680 --> 00:00:35,505 this class that, caches make your program go faster. 8 00:00:35,505 --> 00:00:37,980 because you don't have to go communicate with the distant memory. 9 00:00:37,980 --> 00:00:42,490 To go access some piece of data that you've accessed either temporarily. 10 00:00:42,490 --> 00:00:45,639 Or temporarily, recently or has spatial locality 11 00:00:45,639 --> 00:00:48,840 with some other reference that you've done recently. 12 00:00:50,668 --> 00:00:55,750 So, let's, let's take a look at what, what happens in this basic, basic example. 13 00:00:55,750 --> 00:01:00,834 So, let's say, at address A here, we start off 14 00:01:00,834 --> 00:01:05,918 with everyone having the value of 100 in their 15 00:01:05,918 --> 00:01:11,188 respective caches and main memory. So address 16 00:01:11,188 --> 00:01:15,920 A has value 100. Now, let's say, 17 00:01:15,920 --> 00:01:20,309 let's suppose, that CPU-1 updates 18 00:01:20,309 --> 00:01:24,073 address A to value 200. Okay? 19 00:01:24,073 --> 00:01:28,681 So, let's look at this in, in two cases. The first 20 00:01:28,681 --> 00:01:33,323 is in the write-back case, so all the caches here are write-back. 21 00:01:33,323 --> 00:01:43,480 So CPU-1 Updates this to value 200. But this is a write-back cache. 22 00:01:43,480 --> 00:01:48,940 Okay, so what happens to the values 23 00:01:48,940 --> 00:01:54,460 in memory and in cache-2? Hmm. 24 00:01:54,460 --> 00:01:58,810 Well, all of a sudden we have a stale value. 25 00:02:00,720 --> 00:02:05,310 The newest values here, is going to be 200, but here 26 00:02:05,310 --> 00:02:08,880 in memory and here in the cache 2, we have 27 00:02:08,880 --> 00:02:10,558 the old value. 28 00:02:10,558 --> 00:02:12,790 So, if CPU-2 tries to go read address A, 29 00:02:12,790 --> 00:02:15,510 it's going to continually just get the old value. 30 00:02:16,610 --> 00:02:22,290 Likewise. Main memory has an out of date value. 31 00:02:22,290 --> 00:02:23,620 Now where does this become a problem? 32 00:02:23,620 --> 00:02:29,460 Well CPU-2 never sees the value that got updated for address A. 33 00:02:29,460 --> 00:02:35,172 More to the point, what happens if CPU-2 tries to go update cache-2 34 00:02:35,172 --> 00:02:40,250 here with a value 300, we'll say. Well now we got a big problem. 35 00:02:40,250 --> 00:02:43,252 Now we have three different values and this is 36 00:02:43,252 --> 00:02:48,000 going to cause problems for both our memory consistency model. 37 00:02:48,000 --> 00:02:50,814 But also just to use this system, you know, how do 38 00:02:50,814 --> 00:02:54,600 you know when the value has been updated and is shown. 39 00:02:54,600 --> 00:02:59,179 And we can see that it is because we have, caches here that this problem exists. 40 00:03:00,340 --> 00:03:04,683 And it's because we have two caches that we get 41 00:03:04,683 --> 00:03:09,340 stale values either in other caches or in main memory. 42 00:03:11,678 --> 00:03:13,880 okay, so question comes up. 43 00:03:13,880 --> 00:03:15,314 Maybe this is just a problem with write-back. 44 00:03:15,314 --> 00:03:18,086 Write-back caches, yeah they're optimization. 45 00:03:18,086 --> 00:03:20,354 You know, they, they break only when they 46 00:03:20,354 --> 00:03:23,960 need and they can store dirty values inside themselves. 47 00:03:23,960 --> 00:03:27,400 But maybe we should be using write-through caches instead. 48 00:03:27,400 --> 00:03:30,560 because then, at least everything everything goes out to main memory. 49 00:03:30,560 --> 00:03:32,760 So, let's take a look at the same example here. 50 00:03:34,040 --> 00:03:38,130 So reset, we have 100, 100 and 100. 51 00:03:38,130 --> 00:03:43,132 Now CPU-1 goes and does a write to address A with 52 00:03:43,132 --> 00:03:48,170 value 200. But now it's at least write-through. 53 00:03:48,170 --> 00:03:51,650 Okay, so that writes 200 here and 200 here. 54 00:03:51,650 --> 00:03:59,390 So the question comes up, does cache-2 see this and CPU-2 see this update. 55 00:04:01,910 --> 00:04:04,450 No. We have no mechanism to do this. 56 00:04:04,450 --> 00:04:08,760 And this is going to motivate why we want to build cache coherence protocols. 57 00:04:09,770 --> 00:04:11,210 So we do the update here we're going to 58 00:04:11,210 --> 00:04:13,932 have basically value 200, 200 here because we wrote through. 59 00:04:13,932 --> 00:04:18,162 But CPU-2 will never see that updated value because it 60 00:04:18,162 --> 00:04:22,220 already has address A with value 100 in its cache. 61 00:04:24,428 --> 00:04:28,140 so, two questions here, do these stale values matter? 62 00:04:29,140 --> 00:04:32,200 Yes, the stale values are definitely going to matter. 63 00:04:32,200 --> 00:04:34,400 So, what happens here? 64 00:04:34,400 --> 00:04:37,136 Well, you will never see the updates in the 65 00:04:37,136 --> 00:04:41,399 other CPU, and hence there's basically no way to communicate. 66 00:04:43,110 --> 00:04:44,110 Second question here. 67 00:04:44,110 --> 00:04:47,430 What is the view of shared memory for programming? 68 00:04:47,430 --> 00:04:49,504 Well, our question 69 00:04:49,504 --> 00:04:56,336 here is, you need to have some notion of when a store to a particular 70 00:04:56,336 --> 00:05:03,170 address or particular value shows up in another processor. 71 00:05:03,170 --> 00:05:06,122 And right now if we look at this case here there's 72 00:05:06,122 --> 00:05:11,100 no mechanism for the other processor to ever see that updated value. 73 00:05:11,100 --> 00:05:14,925 So even if CPU-2 does a million reads, it'll never 74 00:05:14,925 --> 00:05:21,146 get that updated value because it already has address A with value 100 in its cache. 75 00:05:21,146 --> 00:05:24,556 So it could it there and read, it could sit there and do a write 76 00:05:24,556 --> 00:05:28,820 that might update main memory but I will still never see the update from CPU-1. 77 00:05:28,820 --> 00:05:33,440 And that's problematic because how do you build programming models for that? 78 00:05:33,440 --> 00:05:36,466 And more to the point, this affects your 79 00:05:36,466 --> 00:05:40,827 consistency models, so let's take a look at write-back 80 00:05:40,827 --> 00:05:45,760 caches with sequential consistency. So this 81 00:05:45,760 --> 00:05:50,590 is using the same example we used from class last, 82 00:05:50,590 --> 00:05:55,582 last time. So just a recap, we have two threads that 83 00:05:55,582 --> 00:06:01,306 are sharing memory space and we're going to call one T1, 84 00:06:01,306 --> 00:06:06,701 the other T2. And the T1 stores 85 00:06:06,701 --> 00:06:13,185 1 to X and 11 to Y. And concurrently exectuting 86 00:06:13,185 --> 00:06:19,660 T2 loads Y into a register and then stores that register out to Y prime. 87 00:06:19,660 --> 00:06:25,490 So a different memory address. And then it loads X into R2 and stores R2 88 00:06:25,490 --> 00:06:32,420 into X prime, a different memory address them X different memory address 89 00:06:32,420 --> 00:06:34,510 than X. That's correct. 90 00:06:34,510 --> 00:06:36,530 So what's going on here? 91 00:06:36,530 --> 00:06:39,463 Well you notice that we, we purposely sort of made a tricky case here. 92 00:06:39,463 --> 00:06:44,413 We made Program 1 or Thread 1 write 93 00:06:44,413 --> 00:06:48,936 X and then Y. And then Thread 2 read Y and then write a 94 00:06:48,936 --> 00:06:55,310 different value, Y prime, and then read X and then write a different value, X prime. 95 00:06:55,310 --> 00:06:57,514 So we've basically purposely flipped those 96 00:06:57,514 --> 00:06:58,600 two values. 97 00:06:58,600 --> 00:07:02,710 And let's take a look what happens with a write-back cache. 98 00:07:02,710 --> 00:07:05,593 And signify the virtue of having a bus 99 00:07:05,593 --> 00:07:10,435 with a right back cache, violates sequential consistency. 100 00:07:10,435 --> 00:07:16,255 Okay, let's, we said all in sequential consistency, all execution 101 00:07:16,255 --> 00:07:22,172 orderings and interweavings where we did not order instructions, 102 00:07:22,172 --> 00:07:27,822 need to be valid. So we can choose a purposely 103 00:07:27,822 --> 00:07:33,270 complex or purposely hard case here. 104 00:07:33,270 --> 00:07:35,270 So, we're going to, we're going to basically do this. 105 00:07:35,270 --> 00:07:39,836 So we're going to have Thread 1, execute first. 106 00:07:39,836 --> 00:07:43,268 So to completion, so what's going to happen is we're 107 00:07:43,268 --> 00:07:47,690 going to have two caches, cache-1, cache-2 and main memory. 108 00:07:47,690 --> 00:07:54,560 And, we're going to look to see what's in these values, as, as time goes on here. 109 00:07:54,560 --> 00:07:57,900 And, time's going to move, down on this graph. 110 00:07:57,900 --> 00:08:01,880 So the first thing that happens is, memory x and memory y. 111 00:08:01,880 --> 00:08:05,420 Memory x has 0 and y has 10 in it. 112 00:08:05,420 --> 00:08:09,510 And, that's just the, the initial conditions of this problem. 113 00:08:11,050 --> 00:08:11,560 Okay. 114 00:08:11,560 --> 00:08:14,410 So, T1 gets executed. 115 00:08:14,410 --> 00:08:20,940 Well, so that writes 1 and 11 to cache-1. 116 00:08:20,940 --> 00:08:27,450 Now note, because this is a write-back cache, we have not updated memory here. 117 00:08:27,450 --> 00:08:29,286 So it has a stale value, and this 118 00:08:29,286 --> 00:08:32,390 is going to cause us problems in our consistency model. 119 00:08:32,390 --> 00:08:37,070 In our sequential consistence sequentially consistent consistency model. 120 00:08:38,780 --> 00:08:40,020 Okay, what happens next? 121 00:08:40,020 --> 00:08:48,250 Well, let's say X1, or excuse me, X and Y are in different cache lines. 122 00:08:48,250 --> 00:08:50,960 So they are able to write back independently. 123 00:08:50,960 --> 00:08:56,438 So, we're going to say that cache-1, writes back Y but not X. 124 00:08:56,438 --> 00:09:00,650 Mmm. 125 00:09:00,650 --> 00:09:03,569 You guys should start seeing if there's going to be a problem here. 126 00:09:03,569 --> 00:09:05,549 So, in main 127 00:09:05,549 --> 00:09:13,770 memory, we have Y has 11 and X has 0. Ooo. 128 00:09:13,770 --> 00:09:18,300 that's a, that's a sort of final case we don't want to see. 129 00:09:18,300 --> 00:09:24,262 But let's keep moving forward here. Okay, so then we run Thread 2, 130 00:09:24,262 --> 00:09:30,180 or Thread T2 here to completion. So it reads the values from main memory, 131 00:09:30,180 --> 00:09:31,879 brings it into its cache. 132 00:09:33,470 --> 00:09:35,971 Does the updates here, but it doesn't actually 133 00:09:35,971 --> 00:09:38,060 do it right back yet of these new values. 134 00:09:41,160 --> 00:09:47,544 So, then let's say, cache-1 writes back 135 00:09:47,544 --> 00:09:53,500 x, so now main memory says, 1 and 11. 136 00:09:53,500 --> 00:09:57,030 Hm. That, that's, that's not good. 137 00:09:57,030 --> 00:10:02,610 But at least that's sort of what program T1 would do if you normally executed. 138 00:10:02,610 --> 00:10:06,310 But, but, what's not good about this is, cache-2 here 139 00:10:06,310 --> 00:10:09,150 is not consistent with main memory. 140 00:10:09,150 --> 00:10:13,548 You can see that X has 1 here but this cache says that X has 0. 141 00:10:13,548 --> 00:10:20,040 And then finally cache-2 does a write-back of X prime and Y prime. 142 00:10:22,090 --> 00:10:26,620 So it's going to write back what the values it had which were 0 and 11. 143 00:10:26,620 --> 00:10:32,871 Now you may recall from the previous class that, 144 00:10:32,871 --> 00:10:39,010 this was a sequentially inconsistent memory output. 145 00:10:39,010 --> 00:10:42,880 This is never supposed, supposed to happen in a sequentially consistent system. 146 00:10:42,880 --> 00:10:47,279 So all of the sudden what we did is we took a, a memory system 147 00:10:47,279 --> 00:10:52,380 we introduced caches and by virtue of having these caches We've took 148 00:10:52,380 --> 00:10:57,810 a system which would potentially implement sequential consistency, and we broke it. 149 00:10:57,810 --> 00:11:01,490 We broke it because of the stale values in caches. 150 00:11:01,490 --> 00:11:03,063 So this is for the write-back case. 151 00:11:03,063 --> 00:11:08,840 So, write-back caches can basically cause sequentially inconsistent values. 152 00:11:08,840 --> 00:11:12,420 To end up in main memory when you're done and likewise in caches. 153 00:11:14,810 --> 00:11:20,039 Okay, let's walk through a case here, sort of similar thing but for write 154 00:11:20,039 --> 00:11:23,193 through caches, because write through caches we 155 00:11:23,193 --> 00:11:26,710 are to find are not particularly better either. 156 00:11:28,290 --> 00:11:32,350 But to get this case correct, we need to be a little bit more contrived. 157 00:11:32,350 --> 00:11:37,390 So, same program here: cache-1, memory, cache-2. 158 00:11:37,390 --> 00:11:39,917 And at the beginning of 159 00:11:39,917 --> 00:11:45,237 time, we're going to say that cache-2, in the 160 00:11:45,237 --> 00:11:50,557 value x here has 0 which is consistent 161 00:11:50,557 --> 00:11:55,351 with main memory. So some reason cache-2, let's 162 00:11:55,351 --> 00:12:00,523 say, read value X in a previous program a long time ago for some other reason. 163 00:12:00,523 --> 00:12:04,699 So cache-2 has X being 0. 164 00:12:04,699 --> 00:12:09,500 And what we're going to show is this x being zero in cache-2. 165 00:12:09,500 --> 00:12:16,730 He's going to cause some stale value to live on beyond the expected time. 166 00:12:16,730 --> 00:12:22,084 So, let's say Thread 1 executes. So Thread 1 executes and updates X 167 00:12:22,084 --> 00:12:26,869 with 1, Y with 11, and it's write through, so this gets pushed to main memory. 168 00:12:29,350 --> 00:12:35,200 So if two caches are on a bus and there is no communication happening on that bus. 169 00:12:35,200 --> 00:12:37,900 Modules are just sort of communicating with main memory. 170 00:12:37,900 --> 00:12:45,510 So the other cache doesn't react. Okay, that's, that's okay. 171 00:12:45,510 --> 00:12:49,820 Let's see what happens now. T2 executes, Thread 2 executes. 172 00:12:49,820 --> 00:12:54,426 Well, it reads from main memory, but you'll notice here 173 00:12:54,426 --> 00:12:58,770 that it doesn't have to read X 0, or X excuse me. 174 00:13:00,490 --> 00:13:06,309 Because X is already sitting in its cache, it has the stale value here. 175 00:13:07,910 --> 00:13:08,410 Hmm. 176 00:13:09,610 --> 00:13:11,270 Well why, why does it not have the need to do anything? 177 00:13:11,270 --> 00:13:15,600 Well it's because we haven't said that we need to actually change how a bus works. 178 00:13:15,600 --> 00:13:18,770 We just said you can go to read something, you can pull your cache. 179 00:13:18,770 --> 00:13:19,550 And if people pull 180 00:13:19,550 --> 00:13:22,410 things into their cache there's no way to ever take things out of a 181 00:13:22,410 --> 00:13:24,074 cache unless you, know, it falls out 182 00:13:24,074 --> 00:13:26,210 because of capacity issues or conflict issues. 183 00:13:28,560 --> 00:13:35,526 Well it goes and reads this and it does a write let's say to x prime 184 00:13:35,526 --> 00:13:43,210 and y prime, and here we get value 0 and 1 for X prime and Y prime. 185 00:13:43,210 --> 00:13:47,352 And this causes sequential consistency to 186 00:13:47,352 --> 00:13:52,830 break this is a non-sequentially consistent execution. 187 00:13:52,830 --> 00:13:53,792 So just because 188 00:13:53,792 --> 00:13:56,530 you have right through caches on the bus does 189 00:13:56,530 --> 00:14:01,559 not guarantee that you're going to have sequentially consistent execution. 190 00:14:02,720 --> 00:14:03,630 Hm. 191 00:14:03,630 --> 00:14:05,582 Okay, so now the question is, we, we 192 00:14:05,582 --> 00:14:09,260 spent all of last class talking about sequential consistency. 193 00:14:09,260 --> 00:14:12,326 What good is this for if, if our by by putting a 194 00:14:12,326 --> 00:14:15,895 cache into it, all of a sudden we break that whole model. 195 00:14:15,895 --> 00:14:18,540 What's...what's the solution to this? 196 00:14:20,210 --> 00:14:25,488 Well, we're going to introduce an idea here, called Cache Coherence 197 00:14:25,488 --> 00:14:31,190 and we're going to contrast that with Memory Consistency models. 198 00:14:31,190 --> 00:14:36,140 So, or be consis, memory consistent somehow. 199 00:14:36,140 --> 00:14:40,890 So, we're going to define a Cache Coherence protocol, as 200 00:14:40,890 --> 00:14:45,640 something that insures that, let's say, all rights 201 00:14:45,640 --> 00:14:48,300 by one processor are some point in 202 00:14:48,300 --> 00:14:52,420 the future eventually seen by another processor. 203 00:14:53,640 --> 00:14:56,105 And, we're going to say that a, you 204 00:14:56,105 --> 00:14:59,250 typically have some sort of cache coherence 205 00:14:59,250 --> 00:15:03,415 model, or cache coherence system, underlying system, 206 00:15:03,415 --> 00:15:07,360 which allows you to maintain some consistency model. 207 00:15:07,360 --> 00:15:10,870 So in effect here, what you're doing is cache 208 00:15:10,870 --> 00:15:15,640 coherence protocols are the underlying implementation which 209 00:15:15,640 --> 00:15:19,278 allows you to have these stronger guarantees. 210 00:15:19,278 --> 00:15:21,540 And the stronger guarantees are what makes the 211 00:15:21,540 --> 00:15:24,560 programmer's life easier as we discussed in last lecture. 212 00:15:24,560 --> 00:15:30,110 A programmer wants some sort of guarantee, and we discussed sequential consistency as 213 00:15:30,110 --> 00:15:36,166 one of the ways to go about doing that. Okay, so we're going to 214 00:15:36,166 --> 00:15:40,990 spend the whole rest of the class talking about how to go build a reasonable cache 215 00:15:40,990 --> 00:15:46,580 coherence protocol. And as I said, memory consistency models 216 00:15:46,580 --> 00:15:52,060 are just the rules that the cache coherence protocol, tries to, observe. 217 00:15:53,060 --> 00:15:55,283 And, an important thing here is, you can 218 00:15:55,283 --> 00:15:59,170 have different cache coherence protocols and different consistency models. 219 00:15:59,170 --> 00:16:01,618 So just because you have a cache coherent 220 00:16:01,618 --> 00:16:06,390 system, doesn't mean you have, for instance, sequential consistency. 221 00:16:06,390 --> 00:16:12,700 In fact sequential consistency is a very strict form of a consistency model. 222 00:16:12,700 --> 00:16:15,010 There are much looser models out there. 223 00:16:15,010 --> 00:16:19,930 In fact most processors go buy, commercial multi-processorers that 224 00:16:19,930 --> 00:16:25,290 you go buy, are not sequential. They do have cache coherence. 225 00:16:25,290 --> 00:16:27,568 But they, those cache coherence, 226 00:16:27,568 --> 00:16:32,620 implements something in, that is typically looser than sequential consistency. 227 00:16:32,620 --> 00:16:35,170 And people will actually define different consistency models. 228 00:16:35,170 --> 00:16:39,410 And we talked about a few of those things last time, like total store ordering. 229 00:16:39,410 --> 00:16:42,880 weak consistency models. Things, things like that. 230 00:16:43,940 --> 00:16:49,598 So it's a combination of the cache coherence protocol 231 00:16:49,598 --> 00:16:50,213 [COUGH] 232 00:16:50,213 --> 00:16:56,978 implementing a sequential consistency model which allows you to 233 00:16:56,978 --> 00:17:04,144 actually go build useful software systems on multi-processors.