1 00:00:04,100 --> 00:00:08,972 So we're going to introduce this notion of snoopy cache coherence 2 00:00:08,972 --> 00:00:13,696 or snoopy cache. And this is work by Jim Goodman and one of 3 00:00:13,696 --> 00:00:19,220 his students who's now I think a UC Irvine faculty member. 4 00:00:19,220 --> 00:00:23,980 And they had this idea that have the cache 5 00:00:23,980 --> 00:00:29,215 watch. Or what they'll call is snoop either 6 00:00:29,215 --> 00:00:36,406 DMA transfers or later it was extended to transactions from other processors. 7 00:00:36,406 --> 00:00:41,180 Other caches. And then do the right thing. 8 00:00:41,180 --> 00:00:43,580 Now, what do we mean by, do the right thing? 9 00:00:44,710 --> 00:00:47,124 Well, I'm purposely being a little bit 10 00:00:47,124 --> 00:00:49,538 vague there, because there's sort of two or 11 00:00:49,538 --> 00:00:54,070 two classes of ideas that you can, you can do for doing the right thing. 12 00:00:54,070 --> 00:01:00,010 But the most basic, do the right thing, is If a cache sees a memory transaction 13 00:01:00,010 --> 00:01:04,880 go by for an address it has inside the cache, it needs to take some action. 14 00:01:05,990 --> 00:01:07,930 Now what is that action, could be? 15 00:01:07,930 --> 00:01:11,155 Well, maybe the most basic thing it could do is if it 16 00:01:11,155 --> 00:01:15,470 sees that transaction go by and it knows that it has an address. 17 00:01:15,470 --> 00:01:18,260 Which is going across the bus and someone is trying to do a. 18 00:01:18,260 --> 00:01:19,180 Memory transaction for 19 00:01:19,180 --> 00:01:21,450 an address, which it's caching. 20 00:01:21,450 --> 00:01:25,430 It might need to at least tell the other entity that something is going on. 21 00:01:25,430 --> 00:01:28,350 That it has a, the more up to date copy, somehow. 22 00:01:28,350 --> 00:01:32,590 And we're going to look at in today's class two different 23 00:01:32,590 --> 00:01:38,100 protocols, or two different classes of protocols to handle this problem. 24 00:01:39,650 --> 00:01:44,210 Now, if we go look at this the, the implementation of this we have a 25 00:01:44,210 --> 00:01:47,260 processor and we have a cache. And what do we have in a cache? 26 00:01:47,260 --> 00:01:53,219 Well, we have the data array, which actually has the data and then you 27 00:01:53,219 --> 00:01:59,520 also have a tag and state array. So the tags have the, the tag match logic. 28 00:01:59,520 --> 00:02:03,770 And it also has the actual tags. the upper bits if you will of the address. 29 00:02:05,010 --> 00:02:10,570 And the state array has whether the data's dirty and whether it's valid. 30 00:02:10,570 --> 00:02:15,380 and usually sort of least recently used or most recently used information. 31 00:02:16,820 --> 00:02:21,750 And the insight that the Snoopy cache work had was that instead of 32 00:02:21,750 --> 00:02:27,150 having only one port into the tag array, the tag and state array. 33 00:02:27,150 --> 00:02:29,460 Instead you add a second port. 34 00:02:29,460 --> 00:02:36,370 So the second port here is attached to the memory bus. 35 00:02:36,370 --> 00:02:42,290 And this second port has to watch all transactions on the bus. 36 00:02:42,290 --> 00:02:45,965 And if it sees a transaction, which has an 37 00:02:45,965 --> 00:02:51,540 address that is it has in its cache already. 38 00:02:51,540 --> 00:02:54,640 It has to somehow signal that there's a case where there's 39 00:02:54,640 --> 00:02:58,390 going to be sterile data and something needs to be done. 40 00:02:58,390 --> 00:03:01,558 So that might include removing the removing the data from 41 00:03:01,558 --> 00:03:02,860 its own cache. 42 00:03:02,860 --> 00:03:05,558 That might include it sending the data across 43 00:03:05,558 --> 00:03:08,895 the bus and telling the other entity data processor 44 00:03:08,895 --> 00:03:13,168 we'll say it or DMA agent is trying to read the data to wait a little bit. 45 00:03:13,168 --> 00:03:15,352 And we're going to look at that over the next 46 00:03:15,352 --> 00:03:18,620 few slides of what different things can be done there. 47 00:03:18,620 --> 00:03:22,400 But what's the downside to something like this snooping protocol? 48 00:03:22,400 --> 00:03:26,630 Well, every entity, every processor or every cache 49 00:03:26,630 --> 00:03:31,880 that sits on the shared bus has to watch all memory transactions. 50 00:03:31,880 --> 00:03:33,820 So they have to watch everything. 51 00:03:33,820 --> 00:03:38,230 And, the problem with this is the processor wants to access this cache. 52 00:03:38,230 --> 00:03:41,350 So it wants to be able to do operations can currently while 53 00:03:41,350 --> 00:03:44,760 the stuff happening at the bus, which is not related to it. 54 00:03:44,760 --> 00:03:49,200 So this requires our Snoopy cache tags to be dual ported. 55 00:03:49,200 --> 00:03:51,675 So, there is two ports to this tag which 56 00:03:51,675 --> 00:03:55,299 makes it bigger and potentially weighs power etcetera. 57 00:03:58,420 --> 00:03:58,780 Okay. 58 00:03:58,780 --> 00:04:04,765 So let's take a look at our shared memory multiprocessor, now that 59 00:04:04,765 --> 00:04:11,130 we are thinking about having snooping caches or snoopy caches. 60 00:04:11,130 --> 00:04:15,108 So in this picture here we now have a shared memory bus, in the 61 00:04:15,108 --> 00:04:20,110 middle we have three processors on the left and we have three snoopy caches. 62 00:04:21,270 --> 00:04:22,885 So now, whenever, 63 00:04:22,885 --> 00:04:29,155 let's say, processor 3, tries to do a transaction, it needs to, let's 64 00:04:29,155 --> 00:04:34,906 say broadcast onto the bus that it's trying to read a certain address. 65 00:04:34,906 --> 00:04:37,930 The other cache needs to be notified that they need 66 00:04:37,930 --> 00:04:41,620 to check their own tags then do the right thing. 67 00:04:41,620 --> 00:04:45,100 So what is do the right thing mean? 68 00:04:45,100 --> 00:04:47,420 So let's, let's, let's take a look at that. 69 00:04:49,905 --> 00:04:55,097 And we're going to broadly put snoopy cache coherence 70 00:04:55,097 --> 00:05:01,360 protocols into two different categories here. 71 00:05:02,700 --> 00:05:05,710 The first one we're going to call is update protocls. 72 00:05:05,710 --> 00:05:10,050 And the second one is going to be invalidation protocols. 73 00:05:11,790 --> 00:05:14,476 So what's, what's the difference? Well, 74 00:05:14,476 --> 00:05:20,170 we're trying to maintain the data to be consistent amongst all of the caches. 75 00:05:20,170 --> 00:05:22,090 So we're trying to get rid of this tail data. 76 00:05:24,330 --> 00:05:26,409 The first thing you do is you can try to do 77 00:05:26,409 --> 00:05:31,220 a write update or what sometimes people will call broadcast based protocols. 78 00:05:31,220 --> 00:05:35,290 So the basic idea is whenever you do a write, 79 00:05:35,290 --> 00:05:39,859 to your cache, you also write that on to the bus. 80 00:05:41,150 --> 00:05:43,020 And everyone is listening on the bus. 81 00:05:43,020 --> 00:05:44,190 Everyone's snooping on the bus. 82 00:05:45,830 --> 00:05:49,786 And if you let's say processor one does a write 83 00:05:49,786 --> 00:05:51,890 to address five. 84 00:05:51,890 --> 00:05:56,154 If processor two has address five in its cache, it's going to 85 00:05:56,154 --> 00:06:01,009 see the write and it's going to take the updated data and update itself. 86 00:06:02,170 --> 00:06:04,414 So this is going to this is sort of the 87 00:06:04,414 --> 00:06:08,010 moral equivalent of the write through case we saw before. 88 00:06:08,010 --> 00:06:11,373 But now on the bus when someone does a write, everyone updates their 89 00:06:11,373 --> 00:06:15,208 local caches by listening on the bus for the particular address and listening 90 00:06:15,208 --> 00:06:18,580 for the new updated data. So its a broadcast. 91 00:06:18,580 --> 00:06:25,906 So every write you do that transitions or every write you do sort of through 92 00:06:25,906 --> 00:06:33,040 a memory transaction is going to update all of the other caches in the system. 93 00:06:33,040 --> 00:06:34,230 Now, why is this good? 94 00:06:34,230 --> 00:06:37,896 Well, it guarantees that when another processor tries 95 00:06:37,896 --> 00:06:40,626 to do a read of, say, that address five, 96 00:06:40,626 --> 00:06:45,420 it now longer has a stale value. It has now the most up to date value. 97 00:06:45,420 --> 00:06:48,400 Because we broadcast it when we update it in place. 98 00:06:48,400 --> 00:06:51,420 We'll talk about that in a little more detail in a second. 99 00:06:51,420 --> 00:06:54,657 Second thing you can do is what is actually more 100 00:06:54,657 --> 00:06:56,815 commonplace today is called an 101 00:06:56,815 --> 00:07:00,760 invalidation protocol or write invalidate protocol. 102 00:07:01,910 --> 00:07:05,042 And a write invalidate protocol, 103 00:07:05,042 --> 00:07:12,030 whenever you do a write, you invalidate all other copies of a piece of data. 104 00:07:12,030 --> 00:07:17,574 And by invalidating all other cache copies, you've effectively 105 00:07:17,574 --> 00:07:22,680 remove the possibility that you could have stale data. 106 00:07:22,680 --> 00:07:26,560 So, let's look at this, these two protocols in more detail here. 107 00:07:27,570 --> 00:07:30,222 So, first of all we'll look at 108 00:07:30,222 --> 00:07:35,580 a write update based protocol or a broadcast based protocol. 109 00:07:35,580 --> 00:07:36,850 And we're going to look at two cases. 110 00:07:36,850 --> 00:07:42,990 A, write miss to the cache and a read miss to the cache. 111 00:07:45,050 --> 00:07:48,220 Okay. First of all, a write miss. 112 00:07:48,220 --> 00:07:50,710 So you go to a write miss. 113 00:07:50,710 --> 00:07:54,987 Well, if your miss in your cache and doing it right, you 114 00:07:54,987 --> 00:08:00,730 tell everyone else in the system that you are doing the write. 115 00:08:00,730 --> 00:08:04,074 And all other processors which are listening 116 00:08:04,074 --> 00:08:07,280 to the bus, update their copies in place. 117 00:08:08,340 --> 00:08:10,134 So, you broadcast on the bus, 118 00:08:10,134 --> 00:08:11,960 I'm running to address five. 119 00:08:11,960 --> 00:08:16,690 Somebody has address five in their cache, they update their copy internally. 120 00:08:18,570 --> 00:08:19,754 Okay. 121 00:08:19,754 --> 00:08:28,750 that, that sounds not, not not so, not so bad but there's a lot of bandwidths there. 122 00:08:28,750 --> 00:08:32,530 You're basically broadcasting rights to everybody. 123 00:08:32,530 --> 00:08:35,049 let's look at the read side. 124 00:08:36,690 --> 00:08:44,980 On the read miss case here, you know that main memory is always up to date. 125 00:08:44,980 --> 00:08:50,930 Because you are forced to basically write through to main memory. 126 00:08:50,930 --> 00:08:53,310 So you just go to main memory without reading this. 127 00:08:53,310 --> 00:08:56,020 And you don't have to even, sort of, check other caches. 128 00:08:56,020 --> 00:08:57,219 On let's go to read hit. 129 00:08:58,524 --> 00:09:01,818 a read hit your cache and you know that the data is in your cache and 130 00:09:01,818 --> 00:09:04,950 you know the data is up to date, so you don't have to worry. 131 00:09:07,090 --> 00:09:07,620 Okay. 132 00:09:07,620 --> 00:09:14,276 So let's say up, update protocol, let's take a look at a right invalidate 133 00:09:14,276 --> 00:09:20,480 protocol, what happens on a write miss and a, and a read miss. 134 00:09:20,480 --> 00:09:26,048 So on a write miss in a invalidation based protocol, you're going to 135 00:09:26,048 --> 00:09:32,288 actually invalidate all of the other caches, which have that address in them 136 00:09:32,288 --> 00:09:37,850 before you're allowed to do the right. Now how does this work? 137 00:09:37,850 --> 00:09:41,126 Well, on that shared multi-drop bus we screen, 138 00:09:41,126 --> 00:09:43,690 I'm going to do a write to address 5. 139 00:09:43,690 --> 00:09:47,540 And everyone else because they have a snoopy cache, is listening 140 00:09:47,540 --> 00:09:51,000 and they hear someone is doing a write to address 5. 141 00:09:51,000 --> 00:09:56,920 And what they do is invalidate their data. They remove it from their cache. 142 00:09:56,920 --> 00:09:57,360 And by virtue 143 00:09:57,360 --> 00:09:59,846 of them removing it from their cache, You can 144 00:09:59,846 --> 00:10:04,180 no longer have a stale value in the, their cache. 145 00:10:04,180 --> 00:10:06,420 Now, I will want to point out here though that, 146 00:10:06,420 --> 00:10:08,730 one of the things you might have to do is. 147 00:10:08,730 --> 00:10:10,930 When you go to do the invalidate, you might actually 148 00:10:10,930 --> 00:10:14,050 have to write back that data to main memory somehow. 149 00:10:14,050 --> 00:10:17,520 Because, if you have a, let's say a right back cache. 150 00:10:17,520 --> 00:10:22,460 And a different processor has a dirty piece of data let's 151 00:10:22,460 --> 00:10:29,280 say for address five and then processor one tries to write that data. 152 00:10:29,280 --> 00:10:32,340 Well, at some point you need to, sort of, merge the data in a cash line. 153 00:10:33,470 --> 00:10:36,300 Processor 2 might have to go and invalidate. 154 00:10:36,300 --> 00:10:39,110 But at least it knows that it has to do this invalidation. 155 00:10:39,110 --> 00:10:42,950 It has to do the invalidation before processor 1 does the write. 156 00:10:45,160 --> 00:10:49,700 But we can do this all on our Snoopy shared bus. 157 00:10:51,740 --> 00:10:55,233 Okay. What happens on a read miss? 158 00:10:55,233 --> 00:11:01,570 well, if no one else has the copy this is easy. 159 00:11:01,570 --> 00:11:06,020 You scream on the bus, you scream in the room saying does anyone have address 5? 160 00:11:06,020 --> 00:11:09,320 And then if it's an empty room no one 161 00:11:09,320 --> 00:11:14,300 yells back and no other caches have that data. 162 00:11:14,300 --> 00:11:16,764 But conversely if someone 163 00:11:16,764 --> 00:11:23,960 has a dirty copy of that data, they need to write it back to main memory. 164 00:11:23,960 --> 00:11:27,440 And you need to read that most up to date copy. 165 00:11:27,440 --> 00:11:30,364 So you need to flush the data and potentially 166 00:11:30,364 --> 00:11:34,130 invalidate the data depending on the cache coherence protocol. 167 00:11:34,130 --> 00:11:36,410 And we're going to look at a couple different cases of that. 168 00:11:38,560 --> 00:11:42,394 So before I move off this, I just want to, sort of, point out when 169 00:11:42,394 --> 00:11:45,447 write invalidate protocols versus write update or 170 00:11:45,447 --> 00:11:48,630 broadcast protocols are good and vice versa. 171 00:11:48,630 --> 00:11:54,290 So most processors you'll see today use some sort of invalidation protocol. 172 00:11:54,290 --> 00:11:56,875 And generally, those are those actually are going to work 173 00:11:56,875 --> 00:12:00,030 better because there's going to be less bandwidth across your bus. 174 00:12:00,030 --> 00:12:03,679 Because you only need to communicate across a bus 175 00:12:03,679 --> 00:12:08,460 when you take a cache miss or write or, or read. 176 00:12:08,460 --> 00:12:15,880 conversely update protocols you basically have to broadcast all, all your updates. 177 00:12:15,880 --> 00:12:19,444 But the, the, in cases where update-based protocol actually 178 00:12:19,444 --> 00:12:23,300 wins As if you have many readers and one writer. 179 00:12:24,450 --> 00:12:28,684 So let's say you have five processors which are reading the output of 180 00:12:28,684 --> 00:12:30,730 one processor via memory. 181 00:12:30,730 --> 00:12:33,830 So you have some sort of producer consumer relationship. 182 00:12:33,830 --> 00:12:38,454 In an update base protocol, you can have that one processor just write And it will 183 00:12:38,454 --> 00:12:42,270 broadcast and push the data to the five processors that are trying to do the read. 184 00:12:43,710 --> 00:12:46,635 In the invalidation protocol what's going to happen is 185 00:12:46,635 --> 00:12:50,080 you're basically going to have to have invalidates bouncing the 186 00:12:50,080 --> 00:12:54,110 data back and forth and back and forth every time you do a read or a write. 187 00:12:54,110 --> 00:12:58,786 So there's actually could be more bus communication in the validation protocol. 188 00:12:58,786 --> 00:13:03,520 For, one reader or excuse me, one writer, multi-reader cases. 189 00:13:05,170 --> 00:13:08,490 But for the rest of today, we're going to focus on 190 00:13:08,490 --> 00:13:14,220 invalidation or right invalidate, right invalidate protocols on a snoopy bus. 191 00:13:23,190 --> 00:13:26,821 Okay. So let's take a look at the extra 192 00:13:26,821 --> 00:13:31,987 information you need to add into a cache in order to 193 00:13:31,987 --> 00:13:37,030 go implement something like a write and validate 194 00:13:37,030 --> 00:13:42,599 protocol. And we're going to look at a base 195 00:13:42,599 --> 00:13:48,560 case here which is called a MSI protocol. 196 00:13:48,560 --> 00:13:52,390 This is one of the more, the basic ones that you'll see. 197 00:13:52,390 --> 00:13:55,820 and there's some extra bits that are added here. 198 00:13:55,820 --> 00:13:57,520 So let's look at the tag information. 199 00:13:57,520 --> 00:14:03,800 So, we have a tag for your processor and typically you have a valid bit. 200 00:14:03,800 --> 00:14:06,070 When you talk about having valid bits or dirty bits. 201 00:14:06,070 --> 00:14:09,080 Valid and dirty bits in something like a processor. 202 00:14:09,080 --> 00:14:13,686 as I was on the cache, excuse me. So, each cache line has this 203 00:14:13,686 --> 00:14:19,200 information on a per cache line basis. And instead of having, 204 00:14:19,200 --> 00:14:24,553 let's say, one valid bit and one dirty bit, we're going 205 00:14:24,553 --> 00:14:29,445 to use these two bits to encode a state in a state machine. 206 00:14:29,445 --> 00:14:36,810 And we're going to add, so, so we have two bits so we can code four things. 207 00:14:36,810 --> 00:14:38,730 We can definitely encode three things 208 00:14:38,730 --> 00:14:39,408 in two bits. 209 00:14:39,408 --> 00:14:43,808 So we're going to actually look at a protocol here that we're going to 210 00:14:43,808 --> 00:14:48,840 call the MSI protocol for this, that's named for the state names. 211 00:14:48,840 --> 00:14:55,455 So we have M, which is modified, S, which is shared and I, which is invalid. 212 00:14:55,455 --> 00:14:58,995 And we're going to say that invalid is 213 00:14:58,995 --> 00:15:03,420 basically whether the data is valid or not. 214 00:15:03,420 --> 00:15:04,059 If you're 215 00:15:04,059 --> 00:15:07,183 in the invalid state it is, it is invalid if you are in 216 00:15:07,183 --> 00:15:10,970 any of the outer states here, the data is valid in your cache. 217 00:15:12,880 --> 00:15:15,548 But now instead of having just a dirty bit, which tells 218 00:15:15,548 --> 00:15:18,490 you whether the data is dirty or not in your cache. 219 00:15:18,490 --> 00:15:23,270 Instead we're going to have two other states here, modified and shared. 220 00:15:23,270 --> 00:15:29,512 And what we're going to see is by adding these states We can guarantee some 221 00:15:29,512 --> 00:15:32,148 level of cache coherence. 222 00:15:32,148 --> 00:15:35,130 And we're going to implement a cache coherence protocol on top of this. 223 00:15:35,130 --> 00:15:41,370 And, we're going to remove some of the communication across 224 00:15:41,370 --> 00:15:47,870 the bus by remembering whether the date is widely 225 00:15:47,870 --> 00:15:53,860 shared or whether we have the sole copy. So, what are these three states. 226 00:15:55,120 --> 00:15:58,684 I is invalid, that's pretty self explanatory there, the data 227 00:15:58,684 --> 00:16:02,540 in your cache is invalid, you're not allowed to read it. 228 00:16:02,540 --> 00:16:05,162 If you try to access somebody who has an invalid bit 229 00:16:05,162 --> 00:16:07,870 set in your state bits, you are going to take a cache miss. 230 00:16:09,360 --> 00:16:11,728 And then your going to have to transition to one of these other two states. 231 00:16:11,728 --> 00:16:16,400 M is modified. 232 00:16:16,400 --> 00:16:20,261 So modified means that the data in the cache has 233 00:16:20,261 --> 00:16:26,230 been modified relative to what is in main, main memory. 234 00:16:26,230 --> 00:16:28,670 So, this is dirty data. 235 00:16:28,670 --> 00:16:34,220 And then, we're going to have this other state here, called shared or S for shared. 236 00:16:34,220 --> 00:16:39,832 Well, what is, what is shared? Well, S or shared state here is going 237 00:16:39,832 --> 00:16:44,890 to mean that you only have a read copy of the data. 238 00:16:46,040 --> 00:16:49,610 And someone else may also have a read-only copy of the data, 239 00:16:49,610 --> 00:16:53,570 hence the term shared. So it's shared amongst multiple people. 240 00:16:53,570 --> 00:16:57,854 But, we're going to say that when you're in the shared state you are not allowed to 241 00:16:57,854 --> 00:17:02,908 do a write. And one of the things I wanted to point 242 00:17:02,908 --> 00:17:11,910 out about this is that each cache line In each cache has this state machine there. 243 00:17:11,910 --> 00:17:17,860 So, different caches have their own copies of this state information. 244 00:17:17,860 --> 00:17:20,030 And the state information only relates to 245 00:17:20,030 --> 00:17:23,480 that particular cache or that particular processor. 246 00:17:23,480 --> 00:17:26,252 And the state diagram we're going to build up 247 00:17:26,252 --> 00:17:29,306 here is only in relationship to processor 1. 248 00:17:29,306 --> 00:17:33,906 And we'll see that because it's a snoopy cache, you might 249 00:17:33,906 --> 00:17:37,586 be in a state and then you might see a transaction 250 00:17:37,586 --> 00:17:39,370 go across the bus. 251 00:17:39,370 --> 00:17:44,130 And you might have to take some action to keep memory coherent. 252 00:17:44,130 --> 00:17:47,640 So, let's walk through this basic state diagram here. 253 00:17:47,640 --> 00:17:50,664 We'll see that when we're done we'll actually have the 254 00:17:50,664 --> 00:17:54,260 data be coherent when we go to implement something like this. 255 00:17:54,260 --> 00:18:00,810 So what's our first arc here? Well, let's say we have a read miss. 256 00:18:02,020 --> 00:18:02,956 You are, 257 00:18:02,956 --> 00:18:08,690 we are, in the invalidate state and we do a read miss. 258 00:18:09,710 --> 00:18:13,490 And, this arc here is basically is from invalid or it is from invalid, I 259 00:18:13,490 --> 00:18:15,821 just didn't draw it looping around because 260 00:18:15,821 --> 00:18:18,900 other wise the, the diagram gets too complex. 261 00:18:18,900 --> 00:18:20,570 So, you do a read miss. 262 00:18:22,120 --> 00:18:24,636 You shout onto the, onto the bus, I want 263 00:18:24,636 --> 00:18:27,966 to read some address or processor one here shouts onto 264 00:18:27,966 --> 00:18:30,760 the bus and says, I want to read an address. 265 00:18:32,060 --> 00:18:32,500 Okay. 266 00:18:32,500 --> 00:18:35,880 So, something happens on the bus. 267 00:18:35,880 --> 00:18:38,726 The other processors might have to do transitions. 268 00:18:38,726 --> 00:18:41,242 But ultimately, you're going to get the data 269 00:18:41,242 --> 00:18:43,510 or get the cache line from main memory. 270 00:18:43,510 --> 00:18:49,360 And your bringing into the shared state. So, you now have a read only copy. 271 00:18:49,360 --> 00:18:51,290 You can read this from your cache as much as you want. 272 00:18:51,290 --> 00:18:54,120 And you don't have to communicate on the bus anymore then. 273 00:18:54,120 --> 00:18:55,279 Because you just have to do a cache hit. 274 00:18:57,040 --> 00:19:00,400 Now, if you're right though, things get a little more complex. 275 00:19:02,680 --> 00:19:06,268 So, so this is, let's, let's take a look at 276 00:19:06,268 --> 00:19:11,030 the case here, that you have the data in shared state. 277 00:19:11,030 --> 00:19:15,110 But some other processor tries to do a read. 278 00:19:15,110 --> 00:19:17,510 So they shot on to the bus, I want to do 279 00:19:17,510 --> 00:19:20,800 a read of the address that is in the shared state. 280 00:19:22,810 --> 00:19:26,720 Well, if we go look at the state diagram here, that's okay. 281 00:19:26,720 --> 00:19:27,810 It's shared. 282 00:19:27,810 --> 00:19:30,294 We can have a read only copy and someone else 283 00:19:30,294 --> 00:19:32,990 can also have a read only copy at the same time. 284 00:19:35,690 --> 00:19:39,620 So, we just transition from S to S, we stay in the S state. 285 00:19:40,840 --> 00:19:42,345 We don't have to do anything when someone 286 00:19:42,345 --> 00:19:44,330 else reads out, some other processor reads the data. 287 00:19:46,640 --> 00:19:51,476 Now, let's say we're in the shared state and some other processor yells 288 00:19:51,476 --> 00:19:55,710 out onto the bus, saying I have an intent to do a write. 289 00:19:55,710 --> 00:19:59,344 I want to do a write to address that you have in your cache 290 00:19:59,344 --> 00:20:04,030 in the shared state or processor one has the cache in the shared state. 291 00:20:05,310 --> 00:20:11,748 Well, we said that stale data causes the incoherence of, of data and 292 00:20:11,748 --> 00:20:17,344 that's what we were trying to prevent. So, if you see some 293 00:20:17,344 --> 00:20:23,670 other processor trying to do a write, you just drop the data from your cache. 294 00:20:23,670 --> 00:20:27,500 Now, because you'll only have a read-only copy, you don't have to write this back. 295 00:20:27,500 --> 00:20:29,640 You don't even have to tell anybody about this. 296 00:20:29,640 --> 00:20:32,840 You just have to guarantee when they say, I want 297 00:20:32,840 --> 00:20:36,050 to do it right, that you transition from S to I. 298 00:20:37,070 --> 00:20:39,632 Now what this means, though, is if processor one 299 00:20:39,632 --> 00:20:41,706 wants to go read that data in the future, 300 00:20:41,706 --> 00:20:43,414 it's going to have to get a copy of it 301 00:20:43,414 --> 00:20:45,699 again, because the copy it had is now invalid. 302 00:20:48,410 --> 00:20:49,382 Okay. 303 00:20:49,382 --> 00:20:53,594 let's say you have the data in the shared state 304 00:20:53,594 --> 00:20:57,410 but now you want to do a write to that data. 305 00:21:00,400 --> 00:21:02,710 Can you write to it while it's in the shared state? 306 00:21:04,100 --> 00:21:05,460 No. 307 00:21:05,460 --> 00:21:06,085 Why? 308 00:21:06,085 --> 00:21:07,470 because someone else may have a copy of it. 309 00:21:07,470 --> 00:21:10,782 But you're in the shared state, that means it's shared 310 00:21:10,782 --> 00:21:13,940 and someone else could have a stale copy of that data. 311 00:21:15,100 --> 00:21:18,412 So you go to do a, a write, what you have to do before you 312 00:21:18,412 --> 00:21:20,572 transition your state is you need to 313 00:21:20,572 --> 00:21:23,370 broadcast your intent to write to that line. 314 00:21:23,370 --> 00:21:25,750 Or processor one has to broadcast 315 00:21:25,750 --> 00:21:29,720 onto the Snoopy bus that it wants to write to that line. 316 00:21:30,840 --> 00:21:31,090 Okay. 317 00:21:31,090 --> 00:21:33,816 So it says, I want to do a write, I want to do a write! 318 00:21:33,816 --> 00:21:36,516 And before transitions it needs to wait or give all the 319 00:21:36,516 --> 00:21:41,120 other processors time to invalidate and potentially write back the data. 320 00:21:41,120 --> 00:21:43,460 And different processors implement this in different ways. 321 00:21:43,460 --> 00:21:46,580 Some processors just wait a certain amount of time. 322 00:21:46,580 --> 00:21:49,850 There's a waiting period so you broadcast saying. 323 00:21:49,850 --> 00:21:50,786 I have an intent 324 00:21:50,786 --> 00:21:53,954 to write, but I'm not going to do the write until I 325 00:21:53,954 --> 00:21:57,940 some period of time in the future, it's like a waiting period. 326 00:21:57,940 --> 00:22:03,032 That gives enough time for the imposters to snoop the transaction, invalidate 327 00:22:03,032 --> 00:22:07,644 the data, and write it back to main memory, if they have copied. 328 00:22:07,644 --> 00:22:11,340 Or, let's say everyone else only has read-only copies of the data. 329 00:22:14,380 --> 00:22:17,188 They need a transition from S to I. 330 00:22:17,188 --> 00:22:18,781 So I want to point out here that, 331 00:22:18,781 --> 00:22:21,318 you know, all different, the different caches in 332 00:22:21,318 --> 00:22:24,920 the system don't have to have the data in the same state at the same time. 333 00:22:24,920 --> 00:22:27,880 This is a per-cache state. 334 00:22:27,880 --> 00:22:30,760 So, when you have an intent to write, you broadcast this 335 00:22:30,760 --> 00:22:34,920 information, and everyone else has to invalidate it from their cache. 336 00:22:34,920 --> 00:22:37,028 So whether they have it in S or I, they 337 00:22:37,028 --> 00:22:39,408 need to somehow or, or S or M, they need to 338 00:22:39,408 --> 00:22:40,440 transition to I. 339 00:22:41,730 --> 00:22:46,700 And only then can you transition to M and actually do it right. 340 00:22:46,700 --> 00:22:48,230 Now, I said there's other ways that you 341 00:22:48,230 --> 00:22:50,992 can go about doing this besides a waiting period. 342 00:22:50,992 --> 00:22:54,944 one potential thing that people do is they'll actually. 343 00:22:54,944 --> 00:22:57,784 broadcast their intents to write but instead of 344 00:22:57,784 --> 00:23:02,400 actually doing the write instead of having waiting period. 345 00:23:02,400 --> 00:23:04,650 They wait for an acknowledgement from all of the other 346 00:23:04,650 --> 00:23:11,008 caches somehow. that, that's functionally the same. 347 00:23:11,008 --> 00:23:14,284 you can either wait and sort of have a proof knowing that you've waited long 348 00:23:14,284 --> 00:23:18,270 enough or you can wait until everyone has responded back saying, yep I'm, I'm done. 349 00:23:21,380 --> 00:23:22,970 Okay. So what's this modified state? 350 00:23:22,970 --> 00:23:24,140 What does this give you? 351 00:23:24,140 --> 00:23:26,641 Well, if you're in the modified state, you can 352 00:23:26,641 --> 00:23:29,650 write the data And you can read the data. 353 00:23:29,650 --> 00:23:33,660 So, let's say processor 1 does a read or write. 354 00:23:33,660 --> 00:23:35,130 We just stay in the modified state. 355 00:23:35,130 --> 00:23:37,260 And you can do as many reads and writes as you want. 356 00:23:37,260 --> 00:23:39,235 And you never have to communicate on the bus. 357 00:23:39,235 --> 00:23:40,030 You have the sole copy. 358 00:23:40,030 --> 00:23:46,390 You have the, the, the token if you will. You can, you can read it, 359 00:23:46,390 --> 00:23:49,210 you can write it, no one, you're guaranteed that no one else has a copy. 360 00:23:52,170 --> 00:23:52,630 Okay. 361 00:23:52,630 --> 00:23:55,480 Let's say you're in the m state and 362 00:23:55,480 --> 00:24:00,110 some other processor broadcasts their intent to write. 363 00:24:02,430 --> 00:24:05,750 You're in the modified state, you have dirty data. 364 00:24:05,750 --> 00:24:07,420 You've done a write to this data. 365 00:24:07,420 --> 00:24:09,530 Main memory is out of date. 366 00:24:09,530 --> 00:24:13,999 So, the most basic thing you can do is invalidate 367 00:24:13,999 --> 00:24:18,530 your copy and give that copy back to main memory. 368 00:24:18,530 --> 00:24:20,760 You'll write back that date to main memory. 369 00:24:20,760 --> 00:24:24,360 And then the other processor will go and read that data 370 00:24:24,360 --> 00:24:28,190 from main memory and pull it in in their modified state. 371 00:24:28,190 --> 00:24:30,450 So you will transition from M to I. 372 00:24:30,450 --> 00:24:32,570 And they will transition, let's say from I to M. 373 00:24:35,400 --> 00:24:39,970 Okay. one more arc in this diagram here. 374 00:24:39,970 --> 00:24:45,026 What happens if you're in the modified state and some other processor tries 375 00:24:45,026 --> 00:24:48,910 to do a read or broadcast an intent to read across the bus? 376 00:24:49,980 --> 00:24:52,828 Well, what's nice here is you don't 377 00:24:52,828 --> 00:24:57,144 actually have to transition to the invalidation state. 378 00:24:57,144 --> 00:24:59,920 And the reason is you have the most up-to-date copy of the data. 379 00:25:01,110 --> 00:25:03,762 They need to see the most up-to-date copy of the data so that 380 00:25:03,762 --> 00:25:07,270 they don't get any stale data and you don't have any stale data. 381 00:25:07,270 --> 00:25:12,680 But you don't have to transition down to the invalidation state. 382 00:25:12,680 --> 00:25:15,639 You, you, you potentially could try to go down to the state. 383 00:25:15,639 --> 00:25:19,019 But it's going to, require you to actually go pull the data 384 00:25:19,019 --> 00:25:21,960 back in if you if you wanted to read in the future. 385 00:25:21,960 --> 00:25:26,140 So instead, in the MSI protocol, if another 386 00:25:26,140 --> 00:25:31,560 processor tries to do a read, you do the write back first. 387 00:25:31,560 --> 00:25:35,000 After you've done the write back, you transition to the shared state. 388 00:25:35,000 --> 00:25:38,498 And they pull it in in a read-only copy and you have a read-only 389 00:25:38,498 --> 00:25:42,550 copy so you both will transition to the shared state or the S state here. 390 00:25:49,260 --> 00:25:53,946 Okay. So, let's look at this in a little bit 391 00:25:53,946 --> 00:25:58,804 more detail here. yes, one more, one more arc 392 00:25:58,804 --> 00:26:03,644 I missed. Write miss so if you're in the 393 00:26:03,644 --> 00:26:09,138 modified state or sorry, you're in the invalid 394 00:26:09,138 --> 00:26:14,378 state. And you want to do it right 395 00:26:14,378 --> 00:26:15,530 to align. 396 00:26:18,150 --> 00:26:20,058 We didn't draw this arc but there's an arc 397 00:26:20,058 --> 00:26:22,300 that goes from invalid all the way around to here. 398 00:26:23,490 --> 00:26:27,108 And on this write miss, you broadcast your intent to write, you 399 00:26:27,108 --> 00:26:31,430 have to wait for everyone to do their invalidations up to main memory. 400 00:26:31,430 --> 00:26:33,839 And in the base case you pull that data in 401 00:26:33,839 --> 00:26:37,320 from main memory and you enter into the modified state. 402 00:26:39,370 --> 00:26:39,730 Okay. 403 00:26:39,730 --> 00:26:42,740 So, let's take a look at a basic two processor example here. 404 00:26:42,740 --> 00:26:46,580 Walking through a set of reads and a set of writes. 405 00:26:46,580 --> 00:26:49,450 And watch everything that happens here. 406 00:26:49,450 --> 00:26:52,810 So we're going to have processor 1 on the top, processor 2 on the bottom. 407 00:26:52,810 --> 00:26:56,970 And we're going to walk through different arc here as processor 1 does read and 408 00:26:56,970 --> 00:27:01,330 processor 2 does read and processor 1 does write and processor 2 does write. 409 00:27:02,400 --> 00:27:04,636 So, let's say the first thing 410 00:27:04,636 --> 00:27:08,280 that happen is processor 1 does a read. 411 00:27:08,280 --> 00:27:10,744 This is all to the same address or all to 412 00:27:10,744 --> 00:27:14,600 the same cache line or addresses within a cache line. 413 00:27:14,600 --> 00:27:15,570 So processor 1 does a read. 414 00:27:16,950 --> 00:27:20,180 We'll assume that all the data starts out invalid in all the caches. 415 00:27:20,180 --> 00:27:22,271 So usually when you reset your computer system, 416 00:27:22,271 --> 00:27:24,760 everyone's caches are invalid, how could they not be? 417 00:27:27,210 --> 00:27:30,030 So you're going to do a read miss, it's going to go up to main 418 00:27:30,030 --> 00:27:34,640 memory and you're going to bring the data into the shared state in processor one. 419 00:27:34,640 --> 00:27:37,370 Processor 2 is going to leave this data invalid for right now. 420 00:27:39,170 --> 00:27:39,910 Okay. 421 00:27:39,910 --> 00:27:45,660 Now processor one does it right. Well, it broadcasts its intent to write 422 00:27:45,660 --> 00:27:50,230 Processor 2 doesn't have this data and no one else in the system has the data. 423 00:27:50,230 --> 00:27:52,393 So no one, no one screams 424 00:27:52,393 --> 00:27:56,750 back I have the data on the bus. please wait. 425 00:27:56,750 --> 00:28:01,350 So instead processor one just goes ahead and transitions into the modified state. 426 00:28:01,350 --> 00:28:06,020 And it, in this case her it even has the most up to date date. 427 00:28:06,020 --> 00:28:09,590 So, it won't even have to go out to main memory potential. 428 00:28:12,310 --> 00:28:18,218 Okay. Then processor 1 does let's say Processor 429 00:28:18,218 --> 00:28:23,810 1 can do reason writes past this point. now processor 2 does a read. 430 00:28:25,290 --> 00:28:26,778 Well, if processor 2 does a read now 431 00:28:26,778 --> 00:28:29,568 we're going to have two different state transitions happening. 432 00:28:29,568 --> 00:28:31,540 We're going to see processor one do a state 433 00:28:31,540 --> 00:28:34,378 transition and processor two do a state transition. 434 00:28:34,378 --> 00:28:37,482 So processor 2 is in the invalid state 435 00:28:37,482 --> 00:28:39,392 right now. 436 00:28:39,392 --> 00:28:43,329 It's going to transition to the shared state on the read miss. 437 00:28:46,100 --> 00:28:52,870 At the same time because processor 1 has it in the modified state. 438 00:28:52,870 --> 00:29:00,070 It's going to have to in write back the data and transition to the shared state. 439 00:29:00,070 --> 00:29:02,930 So, both these processors are now in the shared state, so they both have read 440 00:29:02,930 --> 00:29:03,326 [INAUDIBLE] 441 00:29:03,326 --> 00:29:03,730 copies. 442 00:29:03,730 --> 00:29:11,704 And the most up to date copy has been pushed back or ridden back. 443 00:29:11,704 --> 00:29:12,890 Okay. 444 00:29:12,890 --> 00:29:15,910 So now let's say processor 2 does it right. 445 00:29:17,840 --> 00:29:20,910 It has in the shared state, this has the shared state. 446 00:29:20,910 --> 00:29:28,462 Well, when you go to do this right, the we're going to see an intent 447 00:29:28,462 --> 00:29:32,480 to write coming out of processor 2. 448 00:29:32,480 --> 00:29:34,928 And this is actually going to cause processor 449 00:29:34,928 --> 00:29:37,580 1 here to transition from shared to invalid. 450 00:29:38,720 --> 00:29:41,960 So, you know, this is kind of odd here, processor 2 451 00:29:41,960 --> 00:29:46,940 is doing something that's causing processor 1 states machine to transition. 452 00:29:46,940 --> 00:29:50,436 And it's because they're on the shared bus processor 453 00:29:50,436 --> 00:29:53,120 2 shouts, I'm going to do a write soon. 454 00:29:54,670 --> 00:30:01,196 Processor 1 sees this and it transitions. Once you know that processor 455 00:30:01,196 --> 00:30:07,530 1 has finished the transition, processor 2 now can transition to the modified state. 456 00:30:07,530 --> 00:30:12,205 And actually continue to do reads or writes to that line. 457 00:30:12,205 --> 00:30:14,340 Okay. 458 00:30:14,340 --> 00:30:16,886 So, let's say processor 1 wants to do a read 459 00:30:16,886 --> 00:30:19,767 to this line, this line's getting a lot of traffic 460 00:30:19,767 --> 00:30:20,670 to it. 461 00:30:20,670 --> 00:30:23,750 Processor 2 just wrote it and it's in the modified state. 462 00:30:24,840 --> 00:30:29,115 Processor 1's in the invalid state, all of a sudden what's going to 463 00:30:29,115 --> 00:30:34,620 happen is, it's going to want to try to transition to the shared state. 464 00:30:34,620 --> 00:30:36,650 But this modified state is going to cause a problem. 465 00:30:36,650 --> 00:30:45,230 So we're going to have to wait for the, we're going to have to wait 466 00:30:45,230 --> 00:30:49,663 for the right back to a curve here from 467 00:30:49,663 --> 00:30:54,934 modified to shared. So it's going to, invalidate that data 468 00:30:54,934 --> 00:31:00,269 and then finally, processor 1 is going to be able to bring into the shared state. 469 00:31:02,510 --> 00:31:02,880 Okay. 470 00:31:02,880 --> 00:31:05,360 So let's say processor one now does a read. 471 00:31:05,360 --> 00:31:09,790 So, it's in the shared state. This is in the shared state. 472 00:31:13,160 --> 00:31:17,725 We see the intent to write coming out of processor 1. 473 00:31:17,725 --> 00:31:23,289 So processor 2 is going to transition from shared to invalid and 474 00:31:23,289 --> 00:31:29,270 only then are you allowed to transition from shared to modified. 475 00:31:29,270 --> 00:31:33,098 And then as we get close to the end here let's say 476 00:31:33,098 --> 00:31:38,290 processor 2 tries to do a write to round everything out here. 477 00:31:38,290 --> 00:31:40,946 So we have processor 1 in the modified state. 478 00:31:40,946 --> 00:31:44,426 Processor 2 is invalid and it's going to try to go 479 00:31:44,426 --> 00:31:49,432 this way for the arc will probably come like this. 480 00:31:49,432 --> 00:31:55,450 at the same time processor 1 is going to have to invalidate this data, somehow. 481 00:31:55,450 --> 00:31:58,476 So we see the, the right try to happen and 482 00:31:58,476 --> 00:32:03,820 what's going to happen is, when it sees the intent to write. 483 00:32:03,820 --> 00:32:06,238 Processor 1's going to have to transition from 484 00:32:06,238 --> 00:32:08,780 modify to invalidate and write back that data. 485 00:32:08,780 --> 00:32:13,180 And only then can the rightness occur and processor 2 bring 486 00:32:13,180 --> 00:32:17,910 it into the write state or the modified state, excuse me. 487 00:32:19,490 --> 00:32:24,260 And finally, processor 1 can try to do a write. 488 00:32:25,810 --> 00:32:28,928 Now, why, why are we doing this? Well, it's to get this last 489 00:32:28,928 --> 00:32:33,278 arc in here so we want to transition over all the arcs in this diagram. 490 00:32:33,278 --> 00:32:36,630 So processor 1 tries to do a write to this address. 491 00:32:36,630 --> 00:32:37,790 It has it invalid. 492 00:32:37,790 --> 00:32:40,706 Processor 2 has it in modified and you'll 493 00:32:40,706 --> 00:32:44,430 see that it will broadcast its intent to write. 494 00:32:44,430 --> 00:32:46,590 And while it's doing this broadcast it needs to 495 00:32:46,590 --> 00:32:49,630 move from modified to invalidate, write back the data. 496 00:32:49,630 --> 00:32:51,250 And then processor 1 497 00:32:51,250 --> 00:32:51,700 [COUGH] 498 00:32:51,700 --> 00:32:56,350 brings in the background data and can do a write to that line. 499 00:33:04,700 --> 00:33:07,560 Okay. So a little bit of an observation here. 500 00:33:07,560 --> 00:33:11,735 So we've gone through the basic MSI protocol. 501 00:33:11,735 --> 00:33:17,510 But one of the big ideas here is that 502 00:33:17,510 --> 00:33:23,783 If a line is in the modified state in one cache, there can only 503 00:33:23,783 --> 00:33:29,977 be one copy of that data. And this is the important 504 00:33:29,977 --> 00:33:34,633 invariant that allows the MSI Snoopy based invalidation 505 00:33:34,633 --> 00:33:39,752 protocol for work. So this allows memory to stay 506 00:33:39,752 --> 00:33:44,944 coherent because there's only one copy of writable 507 00:33:44,944 --> 00:33:49,550 data anywhere. And if you, sort of, work through this and 508 00:33:49,550 --> 00:33:55,950 I recommend you do work through this this case a little bit more carefully. 509 00:33:55,950 --> 00:34:00,390 You'll see that you might have multiple people in the shared state. 510 00:34:00,390 --> 00:34:04,683 Or multiple processors, multiple processors caches that can 511 00:34:04,683 --> 00:34:08,810 share data but those are only read-only copies. 512 00:34:08,810 --> 00:34:10,550 So they can read it, they can read it, they can read it. 513 00:34:10,550 --> 00:34:16,070 Well, when someone tries to do a write, you invalidate all of the shares first. 514 00:34:17,300 --> 00:34:21,084 Then you can do the write. And if someone else tries to go read the 515 00:34:21,084 --> 00:34:25,020 data, you need to transition from modified to shared. 516 00:34:25,020 --> 00:34:25,650 Now, why do we do that? 517 00:34:25,650 --> 00:34:29,820 Why can't we leave someone in modified and someone else in shared? 518 00:34:29,820 --> 00:34:32,632 Well, it's because by virtue of being in the 519 00:34:32,632 --> 00:34:36,970 modified state, that processor can continue to do further writes. 520 00:34:38,380 --> 00:34:42,162 So if you continue further right, the cache, which brought it in let's 521 00:34:42,162 --> 00:34:46,220 say the shared state, the read state wouldn't be able to see those updates. 522 00:34:46,220 --> 00:34:49,273 So we guarantee the invariant for this whole system 523 00:34:49,273 --> 00:34:51,870 is that if a line is in the modified state. 524 00:34:51,870 --> 00:34:54,915 Then no other cache has a copy of the data or 525 00:34:54,915 --> 00:34:58,720 no other entity in the system has a copy of the data. 526 00:35:00,590 --> 00:35:01,500 That's really important. 527 00:35:04,410 --> 00:35:09,310 Okay. So let's try to enhance this MSI protocol. 528 00:35:12,370 --> 00:35:14,230 Why do we want to enhance the MSI protocol? 529 00:35:14,230 --> 00:35:17,496 Well, as you might imagine MSI protocol actually 530 00:35:17,496 --> 00:35:21,090 you have to communicate across the buzz relatively often. 531 00:35:21,090 --> 00:35:23,538 We're going to look at a couple from scheme 532 00:35:23,538 --> 00:35:27,400 that reduce the communication traffic across the bus. 533 00:35:27,400 --> 00:35:31,640 So you can either have a, a bus that clocks at a slower speed. 534 00:35:31,640 --> 00:35:34,715 Or you could have more processors on this bus 535 00:35:34,715 --> 00:35:38,015 as you think about better ways to actually implement 536 00:35:38,015 --> 00:35:39,950 a validation base protocol. 537 00:35:41,620 --> 00:35:47,600 And the first one we're going to look at is called MESI. 538 00:35:47,600 --> 00:35:51,500 And sometimes people call this the Illinois protocol because 539 00:35:51,500 --> 00:35:55,950 this, as, comes from University of Illinois Jack Patel. 540 00:35:57,140 --> 00:36:00,428 His research group. And this dates back to, to 1984. 541 00:36:00,428 --> 00:36:03,508 And there's actually, four states so instead 542 00:36:03,508 --> 00:36:05,818 of having three states in the, as we 543 00:36:05,818 --> 00:36:09,650 saw inside the MSI protocol, the insight here was. 544 00:36:09,650 --> 00:36:13,565 If we're using two steep bits to implement three 545 00:36:13,565 --> 00:36:17,670 states well, we have a fourth state we could use. 546 00:36:17,670 --> 00:36:20,750 You know, two to the two is four so we could even do four states. 547 00:36:20,750 --> 00:36:22,880 Instead of three states, for the same number of bets. 548 00:36:24,050 --> 00:36:27,230 Now, the question is, what states do you want to add, you know? 549 00:36:27,230 --> 00:36:28,850 Just because you have four states doesn't 550 00:36:28,850 --> 00:36:30,890 mean your protocol is going to be better. 551 00:36:30,890 --> 00:36:38,070 Well, we're going to have an invalid state, a shared state. 552 00:36:38,070 --> 00:36:38,270 Okay. 553 00:36:38,270 --> 00:36:43,734 Those look similar as what we had before. And now, we're going 554 00:36:43,734 --> 00:36:48,928 to have a new state called the exclusive state or exclusive 555 00:36:48,928 --> 00:36:54,329 but unmodified. And this is very similar to modified and 556 00:36:54,329 --> 00:36:59,744 exclusive state, the M state. And in fact, the 557 00:36:59,744 --> 00:37:05,294 insight here is we're going to take the state diagram from MSI 558 00:37:05,294 --> 00:37:10,658 and take the modified state and split it into two states. 559 00:37:10,658 --> 00:37:13,770 M and E. 560 00:37:13,770 --> 00:37:19,260 Well, modified and exclusive. Now why do we want to do this? 561 00:37:19,260 --> 00:37:24,306 Well, what we're going to see is that we're going to be able to reduce the 562 00:37:24,306 --> 00:37:30,090 communication on the bus, in the case where a processor reads a piece of data. 563 00:37:30,090 --> 00:37:34,270 And no one else reads or rights that data. 564 00:37:34,270 --> 00:37:37,620 And then that same processor later goes to write that data. 565 00:37:39,010 --> 00:37:41,272 So if you look at MSI when you go into 566 00:37:41,272 --> 00:37:44,800 a read, you always brought it into the shared state. 567 00:37:46,420 --> 00:37:50,161 In MESI or MESI protocol, when you go to do a read but 568 00:37:50,161 --> 00:37:56,180 no one else has a copy, you're going to bring it into the exclusive state. 569 00:37:56,180 --> 00:37:58,871 And this allows you to, you're not allowed to 570 00:37:58,871 --> 00:38:01,830 do a write when it's in the exclusive state. 571 00:38:01,830 --> 00:38:04,503 But, you can upgrade from exclusive to 572 00:38:04,503 --> 00:38:08,421 modified without having to communicate across the bus. 573 00:38:08,421 --> 00:38:08,883 But, you can upgrade from exclusive to 574 00:38:08,883 --> 00:38:09,630 modified without having to communicate across the bus. 575 00:38:09,630 --> 00:38:12,000 And this is an important difference 576 00:38:12,000 --> 00:38:14,844 than in the MSI protocol because here in the 577 00:38:14,844 --> 00:38:17,925 MSI protocol if you had something in the shared 578 00:38:17,925 --> 00:38:20,084 state And you want to go to a write to 579 00:38:20,084 --> 00:38:22,250 it, you need to broadcast your intent to write. 580 00:38:22,250 --> 00:38:26,080 And everyone needs to have a Snoop transaction occur on this. 581 00:38:26,080 --> 00:38:30,540 But in the MESI protocol, you don't need that extra work there. 582 00:38:30,540 --> 00:38:34,091 You know that you have the exclusive read only copy of the data so 583 00:38:34,091 --> 00:38:37,106 when you go to do a right you can, you can just go do 584 00:38:37,106 --> 00:38:39,570 the rights. And we're going to go walk through this. 585 00:38:39,570 --> 00:38:40,880 the state diagram now. 586 00:38:40,880 --> 00:38:44,840 And it's going to look pretty similar to the MSI protocol. 587 00:38:44,840 --> 00:38:51,130 And all of the arcs in this diagram of are in relationship to processor 1. 588 00:38:51,130 --> 00:38:54,028 And like I said before, different processors can 589 00:38:54,028 --> 00:38:56,730 have the same cache line in different states. 590 00:38:57,830 --> 00:38:58,250 Okay. 591 00:38:58,250 --> 00:39:02,240 So, let's take a look at this. Step one, you have a read miss. 592 00:39:03,650 --> 00:39:13,440 It's a, a, a read miss but note I have a little word next to here. 593 00:39:13,440 --> 00:39:17,320 So this is going from invalidate out of read miss. 594 00:39:17,320 --> 00:39:18,918 And we're going to see that there's going to 595 00:39:18,918 --> 00:39:20,880 be two arcs coming out of invalidate. 596 00:39:20,880 --> 00:39:21,510 On a read miss. 597 00:39:21,510 --> 00:39:23,398 One that goes into the shared state and 598 00:39:23,398 --> 00:39:26,110 that's actually going to go into the exclusive state. 599 00:39:28,120 --> 00:39:29,635 Now, what does this mean? 600 00:39:29,635 --> 00:39:34,193 Well, on a read miss, processor 1 here is going to 601 00:39:34,193 --> 00:39:38,070 shout out onto the bus, I want to do a read. 602 00:39:41,120 --> 00:39:46,193 On the bus, if someone responds back saying, I have a read or I have a 603 00:39:46,193 --> 00:39:51,692 shared copy of that data or I have a copy of that data, which is read only. 604 00:39:51,692 --> 00:39:55,517 What's going to happen is your gong to transition to the shared 605 00:39:55,517 --> 00:40:00,010 state here, because you don't want to invalidate the other caches data. 606 00:40:01,080 --> 00:40:03,816 And this is actually going to be in contrast to, 607 00:40:03,816 --> 00:40:06,264 if you tried to read miss, but no one else 608 00:40:06,264 --> 00:40:07,630 has a copy right now. 609 00:40:09,030 --> 00:40:10,440 Because if no one has a copy, we're 610 00:40:10,440 --> 00:40:12,940 actually going to transition straight to this exclusive state. 611 00:40:14,930 --> 00:40:16,030 we'll get there in a second. 612 00:40:16,030 --> 00:40:20,618 So, let's say some other processor tries to do a read and you have it in the 613 00:40:20,618 --> 00:40:25,190 shared state. Well, you just stay in the shared state. 614 00:40:26,550 --> 00:40:31,344 But you might need to respond back saying that yes you you 615 00:40:31,344 --> 00:40:33,700 have a copy of that that data. 616 00:40:34,784 --> 00:40:37,742 this is, this is important so that the other processor knows where to 617 00:40:37,742 --> 00:40:41,208 enter into the shared state or exclusive state when the're going to do read miss. 618 00:40:44,020 --> 00:40:44,952 Okay. 619 00:40:44,952 --> 00:40:48,486 let's say you're in the shared state and you 620 00:40:48,486 --> 00:40:51,950 want to do a write to this piece of data. 621 00:40:51,950 --> 00:40:56,710 Well, this looks similar, the exact same thing as what happens in the MSI protocol. 622 00:40:56,710 --> 00:41:00,490 You try to do the write, you broadcast your intent to write, you 623 00:41:00,490 --> 00:41:04,200 have to give everyone a chance to either respond to say that they 624 00:41:06,270 --> 00:41:09,460 Have a copy and they have to invalidate their copies and write back 625 00:41:09,460 --> 00:41:13,360 to main memory and then after that you transposition from the modified state. 626 00:41:13,360 --> 00:41:14,400 And now you can do your write. 627 00:41:16,160 --> 00:41:20,156 And just like as before, if you're in the modified state you can 628 00:41:20,156 --> 00:41:24,180 do a read or write to the line and you stay in the modified state. 629 00:41:25,770 --> 00:41:31,386 Okay, so finally now we come to this interesting case that you, 630 00:41:31,386 --> 00:41:35,943 you read miss. But instead of the data being already 631 00:41:35,943 --> 00:41:41,820 shared in other caches or already being in somebody else' s cache. 632 00:41:41,820 --> 00:41:44,669 You do a remiss, you're in invalid you come 633 00:41:44,669 --> 00:41:48,460 into here, but the date is not shared anywhere. 634 00:41:48,460 --> 00:41:50,360 So the data's not shared. 635 00:41:50,360 --> 00:41:51,080 Well, what does this mean? 636 00:41:51,080 --> 00:41:54,590 Well, you do the read miss here and you can come into the exclusive state. 637 00:41:56,270 --> 00:41:56,486 Now why 638 00:41:56,486 --> 00:41:56,930 is this good? 639 00:41:56,930 --> 00:42:00,760 Well, you have the only readable copy in the system. 640 00:42:00,760 --> 00:42:03,103 And why this is useful is if you try to do a 641 00:42:03,103 --> 00:42:07,640 write, you don't have to go communicate at on the bus anymore. 642 00:42:07,640 --> 00:42:10,492 So you're in the exclusive state, let's say proster 643 00:42:10,492 --> 00:42:13,140 one tries to read, that's fine, you stay here. 644 00:42:14,400 --> 00:42:18,600 Now, as, before, sort of, the exclusive state 645 00:42:18,600 --> 00:42:22,510 functionally looks very similar to the shared state. 646 00:42:22,510 --> 00:42:25,180 You only have a read copy, readable copy of this data. 647 00:42:25,180 --> 00:42:28,690 If you try to do a write, you need to do something. 648 00:42:28,690 --> 00:42:32,806 But this is the big difference here, when you go to 649 00:42:32,806 --> 00:42:37,540 do this write, you don't have to tell anybody about it. 650 00:42:39,020 --> 00:42:41,414 As far as everyone else is concerned, you had 651 00:42:41,414 --> 00:42:44,730 an exclusive copy and now you have a modified copy. 652 00:42:44,730 --> 00:42:46,930 You just transitioned your state diagram. 653 00:42:46,930 --> 00:42:47,830 You transitioned 654 00:42:47,830 --> 00:42:51,260 the state bits here to being in the modified state and do the write. 655 00:42:53,360 --> 00:42:54,870 So, you transition and then do the write. 656 00:42:57,050 --> 00:42:58,080 Now, why is this good? 657 00:42:58,080 --> 00:43:02,360 As I said, you don't have to communicate on the bus to do this transition. 658 00:43:02,360 --> 00:43:05,910 Versus, if you were in the MSI protocol where you only have shared. 659 00:43:05,910 --> 00:43:07,389 You need to go to first broadcaster 660 00:43:07,389 --> 00:43:09,500 or intent to write, which takes bus bandwidth. 661 00:43:09,500 --> 00:43:11,210 It also takes time. 662 00:43:11,210 --> 00:43:12,390 And then new transition. 663 00:43:12,390 --> 00:43:15,630 So this is trying to save us in this one case here. 664 00:43:16,715 --> 00:43:18,310 okay. 665 00:43:18,310 --> 00:43:20,130 Let's fill in the rest of the arcs here. 666 00:43:20,130 --> 00:43:22,262 The rest of the arcs look somewhat similar except 667 00:43:22,262 --> 00:43:25,255 when we sort of come out of the exclusive state and you'll see why. 668 00:43:25,255 --> 00:43:31,320 The write-miss case from invalid. 669 00:43:31,320 --> 00:43:32,690 This looks the same as MSI. 670 00:43:32,690 --> 00:43:34,330 You take a write miss here. 671 00:43:34,330 --> 00:43:39,042 You're going to broadcast your intent to write, wait for everyone to write back 672 00:43:39,042 --> 00:43:42,630 all the data and validate and then you in turn to the modified state. 673 00:43:42,630 --> 00:43:43,265 [COUGH] 674 00:43:43,265 --> 00:43:49,370 Now, here's the little bit more interesting art here. 675 00:43:50,390 --> 00:43:53,440 Let's say processor 1 has the data in the exclusive state. 676 00:43:54,520 --> 00:43:56,740 And then some other processor wants to do a read. 677 00:43:58,178 --> 00:44:01,978 Well, processor 1 needs to say I, I have a copy of that so, 678 00:44:01,978 --> 00:44:07,528 the other processor can't enter into the exclusive state when it tries to read. 679 00:44:07,528 --> 00:44:08,378 Instead it's going to 680 00:44:08,378 --> 00:44:10,700 come in this arc here with some shared data. 681 00:44:10,700 --> 00:44:14,804 Into the shared state but at the same time, processor 1 needs to 682 00:44:14,804 --> 00:44:20,180 transition from being in the exclusive state to being in the shared state. 683 00:44:20,180 --> 00:44:20,890 And why is this? 684 00:44:20,890 --> 00:44:25,197 Well, if some other processor has it in the shared state and you were to 685 00:44:25,197 --> 00:44:28,409 stay in the exclusive state, you could potentially 686 00:44:28,409 --> 00:44:30,980 try to do a write to that data. 687 00:44:30,980 --> 00:44:33,659 While the other person has a read-only copy of that data and you'll 688 00:44:33,659 --> 00:44:35,870 get the data out of sync and you'll have stale values. 689 00:44:37,600 --> 00:44:41,640 So you need to, you need to somehow fix that. 690 00:44:41,640 --> 00:44:44,970 And how we fix that is when someone else tries to do a write. 691 00:44:44,970 --> 00:44:46,994 Or excuse me, when someone else tries to do a read to the 692 00:44:46,994 --> 00:44:50,600 data and another processor tries to read, we transition from exclusive to shared. 693 00:44:50,600 --> 00:44:54,990 And the other processor brings the data in as Shared into their cache. 694 00:44:56,770 --> 00:44:57,200 Okay. 695 00:44:57,200 --> 00:44:58,712 Let's flush out the rest of 696 00:44:58,712 --> 00:45:00,598 the, the arch's here. 697 00:45:00,598 --> 00:45:04,510 this is similar to what we see before here if you're in the shared state. 698 00:45:04,510 --> 00:45:07,110 Some you see in other processors intent to right 699 00:45:07,110 --> 00:45:10,336 go across the bus for the line that you have. 700 00:45:10,336 --> 00:45:13,430 You just have to invalidate it. 701 00:45:13,430 --> 00:45:15,868 Thankfully you don't have to write anything back here 702 00:45:15,868 --> 00:45:19,000 because you are only have read a link copy. 703 00:45:19,000 --> 00:45:21,650 Similar, sort of, thing here for the exclusive state. 704 00:45:21,650 --> 00:45:24,040 You're in the exclusive state. You only have a read a link copy. 705 00:45:24,040 --> 00:45:27,230 You know you have the only copy in the system, though. 706 00:45:27,230 --> 00:45:31,040 But if some other processor tries to do a write. 707 00:45:31,040 --> 00:45:32,420 You need to do an invalid. 708 00:45:34,630 --> 00:45:35,459 Okay. 709 00:45:35,459 --> 00:45:40,310 let's see, is there anything else There are few more arcs. 710 00:45:40,310 --> 00:45:44,934 Modified this looks similar to what we saw before, if you are in the modified 711 00:45:44,934 --> 00:45:47,450 state, then a another processor tries to do 712 00:45:47,450 --> 00:45:50,180 a read, you transition into the shared state. 713 00:45:52,410 --> 00:45:54,280 Now, and you have to write back the data. 714 00:45:54,280 --> 00:45:57,080 But what you'll notice here is you can't 715 00:45:57,080 --> 00:46:01,050 transition to the exclusive state in this case. 716 00:46:01,050 --> 00:46:05,475 And in fact, the only entry point into the exclusive state is this read 717 00:46:05,475 --> 00:46:09,989 miss case here when the data is not shared or in another cache already. 718 00:46:11,430 --> 00:46:16,370 So when another processor tries to do a read, you have to write back the data. 719 00:46:16,370 --> 00:46:17,450 You transition to the shared 720 00:46:17,450 --> 00:46:19,710 state but this is the same as what we did in the MSI protocol. 721 00:46:22,690 --> 00:46:24,390 a couple of other things here. 722 00:46:24,390 --> 00:46:28,394 Another processor tries to do a write, you'll see their 723 00:46:28,394 --> 00:46:33,100 broadcast of their intent to write if you're in modified state. 724 00:46:33,100 --> 00:46:35,333 We need to clearly do an invalidate 725 00:46:35,333 --> 00:46:37,910 because only one cache can have dirty data. 726 00:46:39,050 --> 00:46:43,720 And so you see it, you write it back and you transition to the invalidation state. 727 00:46:43,720 --> 00:46:48,430 And that's, that's, that's the sum of this protocol here. 728 00:46:48,430 --> 00:46:53,920 Now what invariance do we have? What we said in 729 00:46:53,920 --> 00:46:59,100 the MSI protocol that you can only 730 00:46:59,100 --> 00:47:04,910 have one cache or one cache line for an 731 00:47:04,910 --> 00:47:04,911 [INAUDIBLE]. 732 00:47:04,911 --> 00:47:11,827 Or one line only in the modified state in one cache at a time for a 733 00:47:11,827 --> 00:47:19,600 particular cache slide. So there's something similar here. 734 00:47:19,600 --> 00:47:22,170 Well, yes, but it's a little bit different. 735 00:47:22,170 --> 00:47:25,366 Because we took the modified state and we 736 00:47:25,366 --> 00:47:30,320 split it into two similar, sort of, invariant here. 737 00:47:30,320 --> 00:47:36,468 Except now we say that only a, a given cache line can only be 738 00:47:36,468 --> 00:47:43,280 in either modified or exclusive in one cache at any given time. 739 00:47:43,280 --> 00:47:46,010 And, this guarantees that we never have stale data. 740 00:47:46,010 --> 00:47:49,420 And guarantees that we have a useful cache coherence protocol. 741 00:47:54,100 --> 00:48:00,094 Okay. next class we're going to expand this and 742 00:48:00,094 --> 00:48:06,658 talk about a few other cache coherence protocols. 743 00:48:06,658 --> 00:48:10,970 One we're going to talk about is what they used in the original AMD Opteron. 744 00:48:10,970 --> 00:48:13,820 Processors where they actually had more states. 745 00:48:13,820 --> 00:48:17,540 They added another state here called the own state. 746 00:48:17,540 --> 00:48:20,390 And they went from MESI to MOESI. 747 00:48:20,390 --> 00:48:24,790 And this own state is a optimization which allows you to 748 00:48:24,790 --> 00:48:30,029 do cache to cache transfers and effectively track who has the most 749 00:48:31,298 --> 00:48:35,385 up to date written data versus having to always write everything out 750 00:48:35,385 --> 00:48:39,100 to main memory and do memory based out to main memory transfers. 751 00:48:39,100 --> 00:48:42,126 And, we'll also talk a little bit about 752 00:48:42,126 --> 00:48:45,508 MESIF, which is something they use in some of 753 00:48:45,508 --> 00:48:49,602 the Intel processors, where they will also split 754 00:48:49,602 --> 00:48:54,210 the shared state into another state beyond MESIF. 755 00:48:54,210 --> 00:48:54,830 So let's stop here for today.