1 00:00:04,060 --> 00:00:10,883 Okay, so, now that we've gone through the beginning excersises of what a directory 2 00:00:10,883 --> 00:00:14,979 based distributed shared memory machine looks like. 3 00:00:14,979 --> 00:00:20,360 Let's talk about how to actually figure out where the directory is. 4 00:00:22,520 --> 00:00:25,600 So you have an address. And usually these systems you don't want 5 00:00:25,600 --> 00:00:29,258 to do it on the physical address spaces. You're not going to want to do this on 6 00:00:29,258 --> 00:00:32,098 virtual addresses. You don't want to have to run this, 7 00:00:32,098 --> 00:00:35,467 This is because you're sharing data between lots of different systems. 8 00:00:35,467 --> 00:00:37,826 At this point you're sort of, out of the system bus. 9 00:00:37,826 --> 00:00:41,195 Your address is no longer virtual. You've gone, you've gone through the 10 00:00:41,195 --> 00:00:44,757 translation look inside buffer or then MMU and you've figured out what the 11 00:00:44,757 --> 00:00:51,360 physical address is. So, 12 00:00:51,360 --> 00:00:54,810 to figure out what the directory is, or sometimes called the, the if in a 13 00:00:54,810 --> 00:01:02,613 distributed memory machine the home node. Or is the it's a number of which one of 14 00:01:02,613 --> 00:01:10,196 these directories to go to. And there's a lot of different ways to do 15 00:01:10,196 --> 00:01:12,608 this. But one of the more common ones, is to 16 00:01:12,608 --> 00:01:19,160 just use some bits out of the address. So you take the number of directories in 17 00:01:19,160 --> 00:01:22,495 the system. Take the log base two of that. 18 00:01:22,495 --> 00:01:26,603 And then, you take that number of bits to be the home node number. 19 00:01:26,603 --> 00:01:30,205 So when you take a cache miss, and it's not in your cache. 20 00:01:30,205 --> 00:01:34,755 And you need to go figure out and do the load of that data, we'll say. 21 00:01:34,755 --> 00:01:39,747 You send a message and the message ID and the destination of that message will 22 00:01:39,747 --> 00:01:43,792 actually be the home node. And hopefully, your interconnect knows 23 00:01:43,792 --> 00:01:48,580 how to route the data to that directory. Now, 24 00:01:48,580 --> 00:01:51,980 taking the high outer bits has some benefits. 25 00:01:51,980 --> 00:01:55,975 Lets, lets take a look at that. As we discussed already in a, in a 26 00:01:55,975 --> 00:02:00,095 non-linear form memory access architecture, the OS can control the 27 00:02:00,095 --> 00:02:04,034 placement. I can do this because based on these high 28 00:02:04,034 --> 00:02:09,298 order bits, you can actually determine where, which node in the system or which 29 00:02:09,298 --> 00:02:14,630 directory in the system you're going to. So you can actually basically allocate 30 00:02:14,630 --> 00:02:19,766 memory, allocate your stack, allocate your instruction space, based on the 31 00:02:19,766 --> 00:02:22,350 physical address and the OS commands that. 32 00:02:22,350 --> 00:02:27,089 because the OS has absolute authority over where physical addresses get doled 33 00:02:27,089 --> 00:02:32,260 out to. Downside is a directory or a home node 34 00:02:32,260 --> 00:02:38,036 can become a hot spot. So let's say all of a sudden, all of the 35 00:02:38,036 --> 00:02:44,225 processors in your system try to access one page of memory. 36 00:02:44,225 --> 00:02:48,426 There's like a, a hot page which has all the locks in the system. 37 00:02:48,426 --> 00:02:54,440 And, you're in some threaded program and you have to access those locks a lot. 38 00:02:54,440 --> 00:02:57,922 Well, if you look at that, that's all going to be down here. 39 00:02:57,922 --> 00:03:00,581 It's going to be sort of low order addresses. 40 00:03:00,581 --> 00:03:05,077 It might be from sort of here down, whatever your page size is will say. 41 00:03:05,077 --> 00:03:09,320 So even, even if you're not having false sharing, anything like that. 42 00:03:10,740 --> 00:03:15,147 You typically would try to pack all the data onto a page or a structure or 43 00:03:15,147 --> 00:03:19,908 something like that, and it's pretty hard to interleave it based on the very high 44 00:03:19,908 --> 00:03:24,374 order bits of your, of your address. And especially considering a program has 45 00:03:24,374 --> 00:03:28,370 effectively no control of the high order bits of a physical address, 46 00:03:28,370 --> 00:03:33,592 that's managed by the OS. So if you do this, one node can become a 47 00:03:33,592 --> 00:03:37,004 hot spot, because these are all alias to the same directory. 48 00:03:37,004 --> 00:03:41,687 So all, all the messaging traffic goes to one node and this almost starts to turn 49 00:03:41,687 --> 00:03:44,810 back into a bus. Now, we have one directory, all traffic 50 00:03:44,810 --> 00:03:47,816 has to go there. It's a little better cause we don't 51 00:03:47,816 --> 00:03:52,095 necessarily need to invalid all other locations, but the directory and the 52 00:03:52,095 --> 00:03:55,160 bandwidth in the directory starts to become critical. 53 00:03:56,860 --> 00:04:03,143 Hm, well that's a tough one. The flip side is you can start to try to 54 00:04:03,143 --> 00:04:07,562 have the low order bits determine where your directory is, 55 00:04:07,562 --> 00:04:13,651 or which home node you're using. So, you still have the, the offset within 56 00:04:13,651 --> 00:04:17,615 a cache line. But then you have the number the, the 57 00:04:17,615 --> 00:04:21,554 bits of the physical address that can determine what home node your going to be 58 00:04:21,554 --> 00:04:25,528 the low order bits. Well, this ends up being very well load 59 00:04:25,528 --> 00:04:29,058 balanced, because you'd choose different home nodes 60 00:04:29,058 --> 00:04:32,761 effectively atrandom depending on which cache line it is. 61 00:04:32,761 --> 00:04:38,485 So you know two cache lines will same cache line will go to the same home node 62 00:04:38,485 --> 00:04:42,862 or the same directory. But if you have certain different cache 63 00:04:42,862 --> 00:04:47,979 lines in which it is pretty common because it is pretty hard to content all 64 00:04:47,979 --> 00:04:52,020 unwanted cache lines. This is up much data in one cache line. 65 00:04:52,020 --> 00:04:56,862 [COUGH] You'll spread across the different controllers and you'll 66 00:04:56,862 --> 00:05:02,780 effectively have some good distribution. Flip though is the OS losses placement 67 00:05:02,780 --> 00:05:06,831 ability here. So it's, it's tricky, 68 00:05:06,831 --> 00:05:09,629 it's a tricky trade-off here to think about. 69 00:05:09,629 --> 00:05:13,635 some people have even built systems where it's configurable. 70 00:05:13,635 --> 00:05:18,658 this gets a little more advanced. And I touched on this in the last slide 71 00:05:18,658 --> 00:05:22,664 of, of today's lecture. But, [COUGH] you could think about having 72 00:05:22,664 --> 00:05:26,161 some systems where, depending on the actual address and 73 00:05:26,161 --> 00:05:28,959 depending what comes out of your, page table. 74 00:05:28,959 --> 00:05:32,520 Maybe making different, choices of how to do the mapping. 75 00:05:32,520 --> 00:05:36,000 But everyone has to agree on the mapping. Which gets a little bit tricky because 76 00:05:36,000 --> 00:05:37,923 the directory has to agree on the mapping. 77 00:05:37,923 --> 00:05:40,900 And all of the caches in the system have to, agree on the mapping. 78 00:05:42,100 --> 00:05:45,061 Okay, so let's take a look at what is inside of a directory. 79 00:05:45,061 --> 00:05:48,825 So we added this new hardware structure, and whenever we add a new hardware 80 00:05:48,825 --> 00:05:53,460 structure I like to look at all the bits inside of the hardware structure. 81 00:05:53,460 --> 00:05:58,836 So we add a new arbor structure, and this arbor structure has an entry per cache 82 00:05:58,836 --> 00:06:02,398 line in in that particular memory connected to 83 00:06:02,398 --> 00:06:06,095 the directory. So if you were to look across the entire 84 00:06:06,095 --> 00:06:11,539 system, there will actually be an extra piece of data for every single cache line 85 00:06:11,539 --> 00:06:15,228 in the system. And the naive approach to this will habit 86 00:06:15,228 --> 00:06:19,219 such that every single cache line in the system, whether it's. 87 00:06:19,219 --> 00:06:23,996 Sorry, not every single cache line, every single memory line in the system. 88 00:06:23,996 --> 00:06:29,166 So if you've ten terabytes of memory in the system, the naive approach is going 89 00:06:29,166 --> 00:06:34,687 to have a directory entry for every single block size chunk of memory, a 90 00:06:34,687 --> 00:06:39,140 cache box size chunk of memory in the system. 91 00:06:39,140 --> 00:06:42,862 And these are held in big tables, typically they're held in SRAM. 92 00:06:42,862 --> 00:06:47,871 You might try to put them in DRAM. And what do we have here well the 93 00:06:47,871 --> 00:06:53,451 directory needs to know what state the cache line is in and we're going to look 94 00:06:53,451 --> 00:06:58,542 at three different states in our basic protocol here shared, uncached, and 95 00:06:58,542 --> 00:07:02,709 exclusive. So everything starts out as uncached it's 96 00:07:02,709 --> 00:07:07,602 out in main memory. When it gets pulled into a cache's read 97 00:07:07,602 --> 00:07:12,443 only, the directory is going to okay that is 98 00:07:12,443 --> 00:07:18,689 now shared. If it gets pulled into a cache 99 00:07:18,689 --> 00:07:23,880 read/write, the directory is going to note that as exclusive. 100 00:07:23,880 --> 00:07:31,674 Now, if it's in shared or exclusive, we need to know what node, well if it's 101 00:07:31,674 --> 00:07:34,603 exclusively, you know, uniquely what node has that? 102 00:07:34,603 --> 00:07:38,309 So we can go message it when we need, need to go invalidate it. 103 00:07:38,309 --> 00:07:43,150 And if it's shared, we need to know the list of all possible places that it could 104 00:07:43,150 --> 00:07:46,180 be, that we're going to have to send messages to. 105 00:07:46,180 --> 00:07:49,607 And this is better then having to broadcast or send messages to all the 106 00:07:49,607 --> 00:07:53,544 nodes in the system. So we're going to have what's called a 107 00:07:53,544 --> 00:07:57,970 sharer list here. Which is a, in a naive full map directory 108 00:07:57,970 --> 00:08:03,988 is going to have one bit per core in the system, or per cache in the system. 109 00:08:03,988 --> 00:08:08,161 And it's either just going to have one or a zero in it. 110 00:08:08,161 --> 00:08:14,822 So if it's a one that means that core has a share or read only copy of the data. 111 00:08:14,822 --> 00:08:21,724 And when some other cache goes to get it in writable in its cache it's going to 112 00:08:21,724 --> 00:08:26,860 have to invalidate, let's say this one or zero with core's cache. 113 00:08:28,640 --> 00:08:36,081 Now if you're exclusive, your not going to have multiple bit set 114 00:08:36,081 --> 00:08:38,476 here. Cause this basically means that, that 115 00:08:38,476 --> 00:08:43,039 core has a writable copy and we can't have if we want to keep the data coherent 116 00:08:43,039 --> 00:08:47,540 we won't want multiple, we don't want multiple copy writings in the system. 117 00:08:47,540 --> 00:08:50,673 So as you can see here, denoted only one, one here. 118 00:08:50,673 --> 00:08:55,596 And if it's uncached, we don't need to track anything there, we just got, don't 119 00:08:55,596 --> 00:09:00,459 cares. [COUGH] There's one other state here that 120 00:09:00,459 --> 00:09:05,330 I, I have and it's pending. And this usually actually turns into a 121 00:09:05,330 --> 00:09:08,910 couple sub-states there's different ways to track this. 122 00:09:08,910 --> 00:09:13,013 At the directory, these transactions take multiple steps. 123 00:09:13,013 --> 00:09:16,391 You're going to send some data and start transitioning. 124 00:09:16,391 --> 00:09:19,444 Let's say, you want to get a data, data writable. 125 00:09:19,444 --> 00:09:24,316 Well it, that, the directory's going to have to invalidate all the other copies. 126 00:09:24,316 --> 00:09:29,642 It can't do this instantaneously, but we want to provide the appearance of a 127 00:09:29,642 --> 00:09:33,670 atomicity or, or, or that the operations are atomic in some way. 128 00:09:33,670 --> 00:09:38,778 So typically, you'll actually have some sub-states that are shared, that are 129 00:09:38,778 --> 00:09:44,362 stored here, which are something like, oh this cash line is currently transitioning 130 00:09:44,362 --> 00:09:48,653 from, I don't know, U to E. Don't allow some other transaction to 131 00:09:48,653 --> 00:09:52,131 happen to it right now. Just kind of block that. 132 00:09:52,131 --> 00:09:57,632 Another way to do that, the one way is to store it actually in the directory, as a 133 00:09:57,632 --> 00:10:00,579 state bit. Another way is you have some fully 134 00:10:00,579 --> 00:10:05,621 associative structure, a side structure, which just has all of the cache lines 135 00:10:05,621 --> 00:10:09,943 currently in flux. And, the directory's smart enough to know 136 00:10:09,943 --> 00:10:15,414 that if some other request comes in for that line, while it's in flux just to 137 00:10:15,414 --> 00:10:20,064 NACK that request, or negative acknowledge that request and tell the 138 00:10:20,064 --> 00:10:23,295 other cache to retry. So you can do it either way. 139 00:10:23,295 --> 00:10:27,191 but it gets pretty complicated. We're not going to talk about all the 140 00:10:27,191 --> 00:10:31,845 details of that but we'll talk about the high level transitions assuming that they 141 00:10:31,845 --> 00:10:38,840 are somehow topic. So here we're going to look at how MSI. 142 00:10:38,840 --> 00:10:44,217 It fits together with this. But you could actually think about doing 143 00:10:44,217 --> 00:10:49,532 this with Mesi or some other protocol. It's a little bit simpler, emphasize a 144 00:10:49,532 --> 00:10:52,818 little bit simpler so we're going to look at that. 145 00:10:52,818 --> 00:10:58,273 Also the benefit of something like a Mesi protocol is lessened in a directory 146 00:10:58,273 --> 00:11:04,030 because if you pull something in, in the exclusive state, which is unmodified at 147 00:11:04,030 --> 00:11:07,465 the beginning. [COUGH] And someone else wants to get a 148 00:11:07,465 --> 00:11:10,532 read only copy. You're basically going to have to send a 149 00:11:10,532 --> 00:11:13,716 message to that core. And that was inexpensive on a bus, 150 00:11:13,716 --> 00:11:16,899 because it could just see the transaction going across. 151 00:11:16,899 --> 00:11:21,645 And it would just snoop it and would demote from E to shared or something like 152 00:11:21,645 --> 00:11:24,540 that, E to S. But now, it actually turns into actual 153 00:11:24,540 --> 00:11:27,086 work. The directory's going to have to generate 154 00:11:27,086 --> 00:11:29,575 messages. And you're going to have to wait for 155 00:11:29,575 --> 00:11:33,280 responses coming back from a cache which had it in exclusive, so. 156 00:11:33,280 --> 00:11:37,972 [COUGH] full mezies a little bit less common when you stretch grow these 157 00:11:37,972 --> 00:11:45,116 distributed shared memory protocols. Okay, so this is a slide we had before. 158 00:11:45,116 --> 00:11:50,527 This is MSI on a bus. Well things change a little bit when we 159 00:11:50,527 --> 00:11:55,067 go to MSI for directory coherence. And before we go through this, I wanted 160 00:11:55,067 --> 00:12:00,880 to point out, that there is actually two different state machines going on here. 161 00:12:00,880 --> 00:12:06,061 There's one state machine that is happening in the cache controllers, so 162 00:12:06,061 --> 00:12:11,134 actually, in the cache of a respective processor. And then there's a different 163 00:12:11,134 --> 00:12:14,471 state machine which is happening in the directory. 164 00:12:14,471 --> 00:12:18,008 And you'll see that they have different letters here. 165 00:12:18,008 --> 00:12:22,747 This is SU and E versus MS and I. And, and we label these differently on 166 00:12:22,747 --> 00:12:25,750 purpose just to, not, not get totally confused. 167 00:12:25,750 --> 00:12:30,889 And these state machines interact by sending messages between each other, and 168 00:12:30,889 --> 00:12:34,940 as messages flow between the directory and the cache. 169 00:12:34,940 --> 00:12:40,258 There will be both going through different state transitions on this, on 170 00:12:40,258 --> 00:12:46,840 this two tables. Okay, so let's, let's jump into this. 171 00:12:46,840 --> 00:12:52,570 This is the same modified, shared and invalid states that we have in our bus 172 00:12:52,570 --> 00:12:57,320 space snoopy and aside protocol. We didn't change anything here. 173 00:12:57,320 --> 00:13:02,597 And the rules, the rules the same. If you haven't modified, you can do a 174 00:13:02,597 --> 00:13:05,840 right to this and not to send any messages. 175 00:13:07,500 --> 00:13:11,996 If you have a shared, you can read the data and not have to contact anybody. 176 00:13:11,996 --> 00:13:16,672 If you have an invalid, and you want to do anything with it, you probably need to 177 00:13:16,672 --> 00:13:19,849 contact somebody. Or you probably need to contact the 178 00:13:19,849 --> 00:13:22,607 directory. Before, we would have to send the 179 00:13:22,607 --> 00:13:26,444 transaction on the bus. Likewise, the transition from S to M or M 180 00:13:26,444 --> 00:13:29,681 to S where you, used to communicate it was the same. 181 00:13:29,681 --> 00:13:35,402 So think about this as the same state machine running, except running on a bus 182 00:13:35,402 --> 00:13:38,395 where before we would send transactions across the bus. 183 00:13:38,395 --> 00:13:42,094 Now we're going to take those transactions and turn them into messages 184 00:13:42,094 --> 00:13:45,794 that we send to the directory and messages that we receive from the 185 00:13:45,794 --> 00:13:49,983 directory that we have to respond to. So before when we were snooping traffic 186 00:13:49,983 --> 00:13:53,574 crossed the bus which caused us to transition different locations. 187 00:13:53,574 --> 00:13:57,437 So here other processor has intent to write and we saw that across the bus. 188 00:13:57,437 --> 00:14:00,430 So we had to transition ourselves to the invalid state. 189 00:14:00,430 --> 00:14:04,825 Now, we're actually going to get a message from the directory controller. 190 00:14:04,825 --> 00:14:09,538 So let's, let's walk through this. But it's, it's almost exactly the same as 191 00:14:09,538 --> 00:14:12,978 what we saw before. So this is the, the cache date for a 192 00:14:12,978 --> 00:14:22,040 particular line for processor P1. we'll start with the entry points. 193 00:14:22,040 --> 00:14:28,215 We start off an invalid and let's say we want to get a read, a readable copy of 194 00:14:28,215 --> 00:14:32,720 this line. So we're going to take a read miss. 195 00:14:32,720 --> 00:14:37,829 So what we're going to do is plus serve one is actually going to send a read miss 196 00:14:37,829 --> 00:14:42,876 message through the directory controller. And during that time, it does not have a 197 00:14:42,876 --> 00:14:45,804 readable copy. It cannot go and access the data. 198 00:14:45,804 --> 00:14:49,800 It's, it's a, it's effectively still in the I state. 199 00:14:49,800 --> 00:14:53,387 [COUGH] Sometimes people will actually have sort of a pending state here 200 00:14:53,387 --> 00:14:57,662 depending on how you go to implement this depends if you have a side structure sort 201 00:14:57,662 --> 00:15:01,053 or something like a mishandling registrar where you'll track that in. 202 00:15:01,053 --> 00:15:04,400 Or you can track that in the, the cache data itself. 203 00:15:04,400 --> 00:15:10,041 [COUGH] So you're going to read miss. You send the read miss message, and 204 00:15:10,041 --> 00:15:15,526 you're waiting for a response. This response is going to have the data 205 00:15:15,526 --> 00:15:19,522 that you need. And, it's going to be synchronization 206 00:15:19,522 --> 00:15:24,100 points saying, okay you're safe to transition to S. 207 00:15:24,100 --> 00:15:28,412 Okay that seems pretty simple. Similar sort of thing here for write miss 208 00:15:28,412 --> 00:15:33,324 if we're in the in invalid state and we do a write we're going to send a write 209 00:15:33,324 --> 00:15:38,441 miss request to the directory controller. It's going to do something and it may we 210 00:15:38,441 --> 00:15:42,473 may have to be waiting for awhile here cause it may have to go invalidate all of 211 00:15:42,473 --> 00:15:45,958 the other lines in the system. [COUGH] And then it gets a response and 212 00:15:45,958 --> 00:15:49,841 once it gets a response we have a data that we can transition to the modified 213 00:15:49,841 --> 00:15:53,704 state. So as we said, these arcs are pretty easy 214 00:15:53,704 --> 00:15:59,807 you can read by P1 and nothing changes or we can read or write from the M state by 215 00:15:59,807 --> 00:16:06,009 P1 and we also communicate with anybody. But now we have a few different messages 216 00:16:06,009 --> 00:16:09,420 coming in here. If we're in the shared state, 217 00:16:09,420 --> 00:16:15,080 we have to be responsive to an invalidation message. 218 00:16:15,080 --> 00:16:18,774 Which is a little bit different than a bus snoop. 219 00:16:18,774 --> 00:16:22,580 So before, we saw another processor trying to write. 220 00:16:22,580 --> 00:16:27,873 That's when transition goes to I, but now the directory controller sends us a 221 00:16:27,873 --> 00:16:33,236 message which says, invalidate this line and that will transition us to I here. 222 00:16:33,236 --> 00:16:35,579 Note, there will probably be a reply. 223 00:16:35,579 --> 00:16:40,357 We will probably have to send a reply, because the director controller wants to 224 00:16:40,357 --> 00:16:43,364 know. When all of the cache lines in the system 225 00:16:43,364 --> 00:16:47,308 have been validated and it may take a variable amount of time and its sending 226 00:16:47,308 --> 00:16:51,202 messages so it wants to wait for a reply to come back so we're going to have to 227 00:16:51,202 --> 00:16:55,306 send a reply. So this arc here is similar. 228 00:16:55,306 --> 00:16:59,659 Except, we need to write back data, cause we had modified data. 229 00:16:59,659 --> 00:17:03,802 We had writable data. We get an invalidate message from the 230 00:17:03,802 --> 00:17:07,945 directory controller. So we need to write back the data, and 231 00:17:07,945 --> 00:17:11,878 then reply afterwards. Similar, similar sort of idea here. 232 00:17:11,878 --> 00:17:15,108 Okay, so that leaves two arcs left here in the 233 00:17:15,108 --> 00:17:19,201 middle. We're in shared, and we want to do a 234 00:17:19,201 --> 00:17:23,611 right to a, to that cache line. So, our cache we have in the shared 235 00:17:23,611 --> 00:17:25,917 state. We want to do a write to it. 236 00:17:25,917 --> 00:17:31,917 [COUGH] Before we can actually do a write we have to send a message to the 237 00:17:31,917 --> 00:17:35,327 directory saying, I'm doing a write miss here. 238 00:17:35,327 --> 00:17:41,842 I want to get this data writable. And we have to wait for a reply before we 239 00:17:41,842 --> 00:17:46,037 transition here. [COUGH] because we have to wait for the 240 00:17:46,037 --> 00:17:50,989 directory contror to communicate with all the other cache's so that they don't have 241 00:17:50,989 --> 00:17:53,760 redoing copies and we can have a writable copy. 242 00:17:55,560 --> 00:17:59,819 So it's going to invalidate all the other readable copies in the meantime. 243 00:17:59,819 --> 00:18:05,129 And then finally, we have an edge coming this way which is from modified down to 244 00:18:05,129 --> 00:18:07,852 shared. And this is a little bit different. 245 00:18:07,852 --> 00:18:12,520 well, it's the same idea here. Another processor is tying to do a read. 246 00:18:13,540 --> 00:18:18,319 So we have in a modified state when another processor tries to do a read. 247 00:18:18,319 --> 00:18:22,968 So we receive a read miss message. We don't need to invalidate the data, 248 00:18:22,968 --> 00:18:27,682 but we need to write back the data. because we have the most up to date copy, 249 00:18:27,682 --> 00:18:31,675 because we had it modified. So we're going to go into write back the 250 00:18:31,675 --> 00:18:36,193 data and that's going to be response, and then we're going to transition to 251 00:18:36,193 --> 00:18:40,120 share and state. We can keep a read copy of this, because 252 00:18:40,120 --> 00:18:45,540 the other, the other core is, is, is only having a, a readable copy of it also. 253 00:18:47,000 --> 00:18:48,430 Okay, so that's the. 254 00:18:48,430 --> 00:18:53,850 Any questions about that so far? Okay, 255 00:18:53,850 --> 00:19:07,220 so two interesting arcs that we're going to add in here is this one and this one. 256 00:19:07,220 --> 00:19:10,800 Which we didn't have in our base MSI protocol. 257 00:19:11,860 --> 00:19:19,088 And you know, you may not need these. But what these correspond to is, if our 258 00:19:19,088 --> 00:19:27,087 cache has the data in it and then because of let's say a conflict miss, or capacity 259 00:19:27,087 --> 00:19:33,185 miss it gets bumped out. It might be a good idea to go update the 260 00:19:33,185 --> 00:19:36,889 directory, and tell the directory that in the future 261 00:19:36,889 --> 00:19:41,883 if some other cache wants to go get that data, that it doesn't need to go contact 262 00:19:41,883 --> 00:19:45,703 you again. [COUGH] So, if it's in the modified state 263 00:19:45,703 --> 00:19:50,258 we can write back the data because we have dirty data we write back that the 264 00:19:50,258 --> 00:19:54,271 directory and then we notify the directory saying we don't have a copy of 265 00:19:54,271 --> 00:19:57,200 this anymore you can transition to having it uncached. 266 00:19:58,240 --> 00:20:02,268 Likewise here, if we have a read-only copy we may or may not want to do this. 267 00:20:02,268 --> 00:20:06,085 If we, if there's, you know, extra bandwidth on the, on the interconnect we 268 00:20:06,085 --> 00:20:09,318 might want to send a message when we do an invalidation here. 269 00:20:09,318 --> 00:20:13,506 And this is not an invalidation because of an invalidation message, but this is 270 00:20:13,506 --> 00:20:17,260 an invalidation, because it just gets bumped out of the cache. 271 00:20:17,260 --> 00:20:21,275 We may want to notify the directory saying please remove us from the sharer 272 00:20:21,275 --> 00:20:25,444 list. And if the sharer list is already empty, 273 00:20:25,444 --> 00:20:30,507 the, the directory might change the cache line from shared to being uncached 274 00:20:30,507 --> 00:20:35,143 completely. but I do want to point out that these are 275 00:20:35,143 --> 00:20:40,072 not strictly necessary. The reason they're not strictly necessary 276 00:20:40,072 --> 00:20:44,202 is, if we build the cache controller system such that if you're in the invalid 277 00:20:44,202 --> 00:20:48,439 state for a particular cached line, and you get some message coming in that would 278 00:20:48,439 --> 00:20:52,032 have been let's say this message, or that message, or some other arc. 279 00:20:52,032 --> 00:20:55,411 We can just reply back saying yeah, we don't have it anymore. 280 00:20:55,411 --> 00:20:59,380 We're invalid, you know, we don't really care about that, that transition. 281 00:21:00,560 --> 00:21:04,624 So if you were, you were here, the only message that's going to come really to 282 00:21:04,624 --> 00:21:08,689 you is an invalidation message that would just take you to this state anyway. 283 00:21:08,689 --> 00:21:12,701 So, we can just ignore the message or just reply the same as we would to the 284 00:21:12,701 --> 00:21:20,951 normal invalidation message. Okay, so directory state transition looks 285 00:21:20,951 --> 00:21:29,960 a little different here. We have uncached, shared, and exclusive. 286 00:21:31,780 --> 00:21:37,075 [COUGH] As we said, shared means there can be multiple read-only copies in the 287 00:21:37,075 --> 00:21:40,402 system. Exclusive means there's only one cache in 288 00:21:40,402 --> 00:21:45,019 the system with that data. What's interesting here is if you were to 289 00:21:45,019 --> 00:21:50,662 actually have a MESI protocol running, that would not change the protocol 290 00:21:50,662 --> 00:21:55,875 running in the directory. Because exclusive here is effectively the 291 00:21:55,875 --> 00:22:00,824 same, same state, with respect to how the directory sees 292 00:22:00,824 --> 00:22:03,620 the line you won't have to do anything different. 293 00:22:03,620 --> 00:22:08,070 Okay, so let's walk through a few transition here of the state of the cache 294 00:22:08,070 --> 00:22:10,980 line in the directory and this is not in the cache. 295 00:22:12,060 --> 00:22:17,503 Let's start off uncashed and let's say we're getting a message which is a read 296 00:22:17,503 --> 00:22:22,484 miss from processor P. Well, we should transition to S now. 297 00:22:22,484 --> 00:22:27,838 We should give it a readable copy and we should reply with the actual data and we 298 00:22:27,838 --> 00:22:33,322 should put P on the sharer list, so that we know that if someone else needs to go 299 00:22:33,322 --> 00:22:36,760 invalidate that line we need to go contact P. 300 00:22:36,760 --> 00:22:41,169 Now that we're in the shared state, let's say there's other read misses from other 301 00:22:41,169 --> 00:22:44,664 P's other processors here. Well were going to give it up the data 302 00:22:44,664 --> 00:22:49,720 and we're going to add it to the sharer list so we're take sharers and add to it. 303 00:22:49,720 --> 00:22:53,340 The processor the sharer list is just going to grow. 304 00:22:53,340 --> 00:22:59,165 Okay lets, lets start here and go the other way where an un uncached in all the 305 00:22:59,165 --> 00:23:05,065 sun in we get a rightness from proster P. Well we give it the data and the sharer 306 00:23:05,065 --> 00:23:10,670 list or the owner is going to get P uniquely on to it, ever going to give it 307 00:23:10,670 --> 00:23:14,357 in these causes day because we're on cache reform. 308 00:23:14,357 --> 00:23:23,442 We don't want to contact anybody else. let's look at this art here before we go 309 00:23:23,442 --> 00:23:26,573 to these. So this is a little bit different. 310 00:23:26,573 --> 00:23:32,178 Quite a bit different than what we had in these slides, because it's doing 311 00:23:32,178 --> 00:23:36,327 something different. But in this state here, we know, let's 312 00:23:36,327 --> 00:23:44,480 say, processor P zero has the data exclusively. 313 00:23:44,480 --> 00:23:49,057 But all of a sudden, a different processor, let's say processor two goes 314 00:23:49,057 --> 00:23:52,668 to access the data. Well, we already have the data in the 315 00:23:52,668 --> 00:23:56,402 exclusive state. So we're going to stay in this exclusive 316 00:23:56,402 --> 00:23:59,913 state because some other caches going to want to get it exclusive, but it's 317 00:23:59,913 --> 00:24:04,298 different cache. So what has to happen here is we need to 318 00:24:04,298 --> 00:24:09,818 go invalidate the data out of P zero. P zero is going to write back the data, 319 00:24:09,818 --> 00:24:13,480 it's going to transition to the invalid state. 320 00:24:13,480 --> 00:24:21,960 [COUGH] The, we need to then provide the data to the new processor P2 we'll say 321 00:24:21,960 --> 00:24:29,036 and add that P2 to the sharer list. So we can, we can transition to this 322 00:24:29,036 --> 00:24:33,863 state and then finally let's look at the edges between these two points oh, 323 00:24:33,863 --> 00:24:41,440 actually let's go this way first. if you've data that gets ridden back. 324 00:24:41,440 --> 00:24:46,631 so this is that arc, which I said is similar to the arc here, which is 325 00:24:46,631 --> 00:24:50,935 optional. Let's say you have data that gets right, 326 00:24:50,935 --> 00:24:55,213 ridden back here. Actually this, this arc may not be 327 00:24:55,213 --> 00:24:58,010 optional, let's think about that for a second. 328 00:24:58,010 --> 00:25:03,151 This arc may not be optional. no it's still optional. 329 00:25:03,151 --> 00:25:07,633 because you can just NACK the message effectively, and, and tell it it's in 330 00:25:07,633 --> 00:25:11,193 main memory. okay, so let's hear, and you see a data 331 00:25:11,193 --> 00:25:15,000 write back happening. So, message gets sent to you which is the 332 00:25:15,000 --> 00:25:19,174 equivalent of this arg here. The data was writeable, was exclusive to 333 00:25:19,174 --> 00:25:24,085 some cache, and it's no longer writeable. It's probably a good idea to go contact 334 00:25:24,085 --> 00:25:27,768 the directory, write back the data, and clear the sharer list. 335 00:25:27,768 --> 00:25:32,311 The sharer list is empty, so it knows that no one has a copy of it, at that 336 00:25:32,311 --> 00:25:38,012 point. Okay few other financials here, okay we 337 00:25:38,012 --> 00:25:43,710 are in the shared state. So we have multiple read-only copies. 338 00:25:43,710 --> 00:25:49,813 And one cache comes along and says,"Oh, I need to do a writeness message." I need 339 00:25:49,813 --> 00:25:54,758 to get a writtable. Well, now we actually have to go through 340 00:25:54,758 --> 00:25:57,801 a pretty long process. We're going to walk through the entire 341 00:25:57,801 --> 00:26:01,736 sharer list and send messages to all the sharers in the sharer list saying, 342 00:26:01,736 --> 00:26:04,960 invalidate this copy and tell me when you're done. 343 00:26:04,960 --> 00:26:07,792 We're going to collect all the responses at the directory. 344 00:26:07,792 --> 00:26:11,455 And once all the responses have come back, we know no one else has readable 345 00:26:11,455 --> 00:26:16,324 copy. We can give the data value to the 346 00:26:16,324 --> 00:26:21,680 requester. And add it to the sharer or owner list. 347 00:26:24,100 --> 00:26:27,309 Okay, last arc here is from E to S. 348 00:26:27,309 --> 00:26:33,078 This orange arc and that happens if we have a particular line as writable in one 349 00:26:33,078 --> 00:26:36,220 cash, and another cash wants to go read it now. 350 00:26:36,220 --> 00:26:41,174 Will send a read miss the other cache is going to downgrade from E to S, excuse me 351 00:26:42,236 --> 00:26:47,262 from M to S in its vocal cache. But the directory is going to transition 352 00:26:47,262 --> 00:26:52,429 from E to S here and we have to go get the most up to date from the node. 353 00:26:52,429 --> 00:26:58,091 So, we're going to send a fetches and a fetch request to the node that had it 354 00:26:58,091 --> 00:27:03,825 before and exclusive, once you get the up to most up to date data you can forward 355 00:27:03,825 --> 00:27:09,346 that to the new reader and everyone and, and we add their processor to the sharer 356 00:27:09,346 --> 00:27:11,750 list. Okay, 357 00:27:11,750 --> 00:27:17,058 so questions about that one so far? These, these do start to get a little 358 00:27:17,058 --> 00:27:19,620 complicated because you have multiple state machines interacting. 359 00:27:24,640 --> 00:27:27,297 Okay, so were going to speed up a little bit here. 360 00:27:27,297 --> 00:27:31,342 I include this chart from your book just to give you an example of. 361 00:27:31,342 --> 00:27:34,867 We went through, very quickly here, all the different messages. 362 00:27:34,867 --> 00:27:38,334 And, this chart here sums up all the different message types. 363 00:27:38,334 --> 00:27:41,627 And from who they could go from and who they could go to. 364 00:27:41,627 --> 00:27:45,730 And this is, this is in your textbook. and sometimes messages need to 365 00:27:45,730 --> 00:27:49,312 communicate addresses. Sometimes they need to communicate data. 366 00:27:49,312 --> 00:27:53,530 Sometimes they need to communicate which node the message is coming from. 367 00:27:53,530 --> 00:27:57,575 To add it to the, the sharer list. But I'm not going to go through this 368 00:27:57,575 --> 00:28:00,664 into, to great detail. One think I did want to say is, these 369 00:28:00,664 --> 00:28:04,034 message types here, do not include [INAUDIBLE]. 370 00:28:04,034 --> 00:28:12,020 So, when you go to request something, there's replies that come back. 371 00:28:13,120 --> 00:28:19,880 These replies after, that's not drawn in this 372 00:28:19,880 --> 00:28:22,853 diagram. We, we see data value reply but that's 373 00:28:22,853 --> 00:28:27,765 not, that's just what of, actual data. There's not like a, response coming back 374 00:28:27,765 --> 00:28:32,937 from the sharer acking the, the sharer, or acking the invalidator or something 375 00:28:32,937 --> 00:28:36,104 like that. Another type of message that is pretty 376 00:28:36,104 --> 00:28:40,176 common, that is not drawn here is a negative acknowledgement. 377 00:28:40,176 --> 00:28:45,477 So it's pretty common if you have a cache line that is being transitioned, it's in 378 00:28:45,477 --> 00:28:50,040 a pending state, at the directory, and get a request coming in. 379 00:28:50,040 --> 00:28:53,803 You might need to tell that cach retry later. 380 00:28:53,803 --> 00:28:56,180 I can't handle this case later right now.