1
00:00:04,060 --> 00:00:10,883
Okay, so, now that we've gone through the 
beginning excersises of what a directory 

2
00:00:10,883 --> 00:00:14,979
based distributed shared memory machine 
looks like. 

3
00:00:14,979 --> 00:00:20,360
Let's talk about how to actually figure 
out where the directory is. 

4
00:00:22,520 --> 00:00:25,600
So you have an address. 
And usually these systems you don't want 

5
00:00:25,600 --> 00:00:29,258
to do it on the physical address spaces. 
You're not going to want to do this on 

6
00:00:29,258 --> 00:00:32,098
virtual addresses. 
You don't want to have to run this, 

7
00:00:32,098 --> 00:00:35,467
This is because you're sharing data 
between lots of different systems. 

8
00:00:35,467 --> 00:00:37,826
At this point you're sort of, out of the 
system bus. 

9
00:00:37,826 --> 00:00:41,195
Your address is no longer virtual. 
You've gone, you've gone through the 

10
00:00:41,195 --> 00:00:44,757
translation look inside buffer or then 
MMU and you've figured out what the 

11
00:00:44,757 --> 00:00:51,360
physical address is. 
So, 

12
00:00:51,360 --> 00:00:54,810
to figure out what the directory is, or 
sometimes called the, the if in a 

13
00:00:54,810 --> 00:01:02,613
distributed memory machine the home node. 
Or is the it's a number of which one of 

14
00:01:02,613 --> 00:01:10,196
these directories to go to. 
And there's a lot of different ways to do 

15
00:01:10,196 --> 00:01:12,608
this. 
But one of the more common ones, is to 

16
00:01:12,608 --> 00:01:19,160
just use some bits out of the address. 
So you take the number of directories in 

17
00:01:19,160 --> 00:01:22,495
the system. 
Take the log base two of that. 

18
00:01:22,495 --> 00:01:26,603
And then, you take that number of bits to 
be the home node number. 

19
00:01:26,603 --> 00:01:30,205
So when you take a cache miss, and it's 
not in your cache. 

20
00:01:30,205 --> 00:01:34,755
And you need to go figure out and do the 
load of that data, we'll say. 

21
00:01:34,755 --> 00:01:39,747
You send a message and the message ID and 
the destination of that message will 

22
00:01:39,747 --> 00:01:43,792
actually be the home node. 
And hopefully, your interconnect knows 

23
00:01:43,792 --> 00:01:48,580
how to route the data to that directory. 
Now, 

24
00:01:48,580 --> 00:01:51,980
taking the high outer bits has some 
benefits. 

25
00:01:51,980 --> 00:01:55,975
Lets, lets take a look at that. 
As we discussed already in a, in a 

26
00:01:55,975 --> 00:02:00,095
non-linear form memory access 
architecture, the OS can control the 

27
00:02:00,095 --> 00:02:04,034
placement. 
I can do this because based on these high 

28
00:02:04,034 --> 00:02:09,298
order bits, you can actually determine 
where, which node in the system or which 

29
00:02:09,298 --> 00:02:14,630
directory in the system you're going to. 
So you can actually basically allocate 

30
00:02:14,630 --> 00:02:19,766
memory, allocate your stack, allocate 
your instruction space, based on the 

31
00:02:19,766 --> 00:02:22,350
physical address and the OS commands 
that. 

32
00:02:22,350 --> 00:02:27,089
because the OS has absolute authority 
over where physical addresses get doled 

33
00:02:27,089 --> 00:02:32,260
out to. 
Downside is a directory or a home node 

34
00:02:32,260 --> 00:02:38,036
can become a hot spot. 
So let's say all of a sudden, all of the 

35
00:02:38,036 --> 00:02:44,225
processors in your system try to access 
one page of memory. 

36
00:02:44,225 --> 00:02:48,426
There's like a, a hot page which has all 
the locks in the system. 

37
00:02:48,426 --> 00:02:54,440
And, you're in some threaded program and 
you have to access those locks a lot. 

38
00:02:54,440 --> 00:02:57,922
Well, if you look at that, that's all 
going to be down here. 

39
00:02:57,922 --> 00:03:00,581
It's going to be sort of low order 
addresses. 

40
00:03:00,581 --> 00:03:05,077
It might be from sort of here down, 
whatever your page size is will say. 

41
00:03:05,077 --> 00:03:09,320
So even, even if you're not having false 
sharing, anything like that. 

42
00:03:10,740 --> 00:03:15,147
You typically would try to pack all the 
data onto a page or a structure or 

43
00:03:15,147 --> 00:03:19,908
something like that, and it's pretty hard 
to interleave it based on the very high 

44
00:03:19,908 --> 00:03:24,374
order bits of your, of your address. 
And especially considering a program has 

45
00:03:24,374 --> 00:03:28,370
effectively no control of the high order 
bits of a physical address, 

46
00:03:28,370 --> 00:03:33,592
that's managed by the OS. 
So if you do this, one node can become a 

47
00:03:33,592 --> 00:03:37,004
hot spot, because these are all alias to 
the same directory. 

48
00:03:37,004 --> 00:03:41,687
So all, all the messaging traffic goes to 
one node and this almost starts to turn 

49
00:03:41,687 --> 00:03:44,810
back into a bus. 
Now, we have one directory, all traffic 

50
00:03:44,810 --> 00:03:47,816
has to go there. 
It's a little better cause we don't 

51
00:03:47,816 --> 00:03:52,095
necessarily need to invalid all other 
locations, but the directory and the 

52
00:03:52,095 --> 00:03:55,160
bandwidth in the directory starts to 
become critical. 

53
00:03:56,860 --> 00:04:03,143
Hm, well that's a tough one. 
The flip side is you can start to try to 

54
00:04:03,143 --> 00:04:07,562
have the low order bits determine where 
your directory is, 

55
00:04:07,562 --> 00:04:13,651
or which home node you're using. 
So, you still have the, the offset within 

56
00:04:13,651 --> 00:04:17,615
a cache line. 
But then you have the number the, the 

57
00:04:17,615 --> 00:04:21,554
bits of the physical address that can 
determine what home node your going to be 

58
00:04:21,554 --> 00:04:25,528
the low order bits. 
Well, this ends up being very well load 

59
00:04:25,528 --> 00:04:29,058
balanced, 
because you'd choose different home nodes 

60
00:04:29,058 --> 00:04:32,761
effectively atrandom depending on which 
cache line it is. 

61
00:04:32,761 --> 00:04:38,485
So you know two cache lines will same 
cache line will go to the same home node 

62
00:04:38,485 --> 00:04:42,862
or the same directory. 
But if you have certain different cache 

63
00:04:42,862 --> 00:04:47,979
lines in which it is pretty common 
because it is pretty hard to content all 

64
00:04:47,979 --> 00:04:52,020
unwanted cache lines. 
This is up much data in one cache line. 

65
00:04:52,020 --> 00:04:56,862
[COUGH] You'll spread across the 
different controllers and you'll 

66
00:04:56,862 --> 00:05:02,780
effectively have some good distribution. 
Flip though is the OS losses placement 

67
00:05:02,780 --> 00:05:06,831
ability here. 
So it's, it's tricky, 

68
00:05:06,831 --> 00:05:09,629
it's a tricky trade-off here to think 
about. 

69
00:05:09,629 --> 00:05:13,635
some people have even built systems where 
it's configurable. 

70
00:05:13,635 --> 00:05:18,658
this gets a little more advanced. 
And I touched on this in the last slide 

71
00:05:18,658 --> 00:05:22,664
of, of today's lecture. 
But, [COUGH] you could think about having 

72
00:05:22,664 --> 00:05:26,161
some systems where, 
depending on the actual address and 

73
00:05:26,161 --> 00:05:28,959
depending what comes out of your, page 
table. 

74
00:05:28,959 --> 00:05:32,520
Maybe making different, choices of how to 
do the mapping. 

75
00:05:32,520 --> 00:05:36,000
But everyone has to agree on the mapping. 
Which gets a little bit tricky because 

76
00:05:36,000 --> 00:05:37,923
the directory has to agree on the 
mapping. 

77
00:05:37,923 --> 00:05:40,900
And all of the caches in the system have 
to, agree on the mapping. 

78
00:05:42,100 --> 00:05:45,061
Okay, so let's take a look at what is 
inside of a directory. 

79
00:05:45,061 --> 00:05:48,825
So we added this new hardware structure, 
and whenever we add a new hardware 

80
00:05:48,825 --> 00:05:53,460
structure I like to look at all the bits 
inside of the hardware structure. 

81
00:05:53,460 --> 00:05:58,836
So we add a new arbor structure, and this 
arbor structure has an entry per cache 

82
00:05:58,836 --> 00:06:02,398
line in 
in that particular memory connected to 

83
00:06:02,398 --> 00:06:06,095
the directory. 
So if you were to look across the entire 

84
00:06:06,095 --> 00:06:11,539
system, there will actually be an extra 
piece of data for every single cache line 

85
00:06:11,539 --> 00:06:15,228
in the system. 
And the naive approach to this will habit 

86
00:06:15,228 --> 00:06:19,219
such that every single cache line in the 
system, whether it's. 

87
00:06:19,219 --> 00:06:23,996
Sorry, not every single cache line, every 
single memory line in the system. 

88
00:06:23,996 --> 00:06:29,166
So if you've ten terabytes of memory in 
the system, the naive approach is going 

89
00:06:29,166 --> 00:06:34,687
to have a directory entry for every 
single block size chunk of memory, a 

90
00:06:34,687 --> 00:06:39,140
cache box size chunk of memory in the 
system. 

91
00:06:39,140 --> 00:06:42,862
And these are held in big tables, 
typically they're held in SRAM. 

92
00:06:42,862 --> 00:06:47,871
You might try to put them in DRAM. 
And what do we have here well the 

93
00:06:47,871 --> 00:06:53,451
directory needs to know what state the 
cache line is in and we're going to look 

94
00:06:53,451 --> 00:06:58,542
at three different states in our basic 
protocol here shared, uncached, and 

95
00:06:58,542 --> 00:07:02,709
exclusive. 
So everything starts out as uncached it's 

96
00:07:02,709 --> 00:07:07,602
out in main memory. 
When it gets pulled into a cache's read 

97
00:07:07,602 --> 00:07:12,443
only, 
the directory is going to okay that is 

98
00:07:12,443 --> 00:07:18,689
now shared. 
If it gets pulled into a cache 

99
00:07:18,689 --> 00:07:23,880
read/write, the directory is going to 
note that as exclusive. 

100
00:07:23,880 --> 00:07:31,674
Now, if it's in shared or exclusive, we 
need to know what node, well if it's 

101
00:07:31,674 --> 00:07:34,603
exclusively, you know, uniquely what node 
has that? 

102
00:07:34,603 --> 00:07:38,309
So we can go message it when we need, 
need to go invalidate it. 

103
00:07:38,309 --> 00:07:43,150
And if it's shared, we need to know the 
list of all possible places that it could 

104
00:07:43,150 --> 00:07:46,180
be, that we're going to have to send 
messages to. 

105
00:07:46,180 --> 00:07:49,607
And this is better then having to 
broadcast or send messages to all the 

106
00:07:49,607 --> 00:07:53,544
nodes in the system. 
So we're going to have what's called a 

107
00:07:53,544 --> 00:07:57,970
sharer list here. 
Which is a, in a naive full map directory 

108
00:07:57,970 --> 00:08:03,988
is going to have one bit per core in the 
system, or per cache in the system. 

109
00:08:03,988 --> 00:08:08,161
And it's either just going to have one or 
a zero in it. 

110
00:08:08,161 --> 00:08:14,822
So if it's a one that means that core has 
a share or read only copy of the data. 

111
00:08:14,822 --> 00:08:21,724
And when some other cache goes to get it 
in writable in its cache it's going to 

112
00:08:21,724 --> 00:08:26,860
have to invalidate, let's say this one or 
zero with core's cache. 

113
00:08:28,640 --> 00:08:36,081
Now if you're exclusive, 
your not going to have multiple bit set 

114
00:08:36,081 --> 00:08:38,476
here. 
Cause this basically means that, that 

115
00:08:38,476 --> 00:08:43,039
core has a writable copy and we can't 
have if we want to keep the data coherent 

116
00:08:43,039 --> 00:08:47,540
we won't want multiple, we don't want 
multiple copy writings in the system. 

117
00:08:47,540 --> 00:08:50,673
So as you can see here, denoted only one, 
one here. 

118
00:08:50,673 --> 00:08:55,596
And if it's uncached, we don't need to 
track anything there, we just got, don't 

119
00:08:55,596 --> 00:09:00,459
cares. 
[COUGH] There's one other state here that 

120
00:09:00,459 --> 00:09:05,330
I, I have and it's pending. 
And this usually actually turns into a 

121
00:09:05,330 --> 00:09:08,910
couple sub-states there's different ways 
to track this. 

122
00:09:08,910 --> 00:09:13,013
At the directory, 
these transactions take multiple steps. 

123
00:09:13,013 --> 00:09:16,391
You're going to send some data and start 
transitioning. 

124
00:09:16,391 --> 00:09:19,444
Let's say, you want to get a data, data 
writable. 

125
00:09:19,444 --> 00:09:24,316
Well it, that, the directory's going to 
have to invalidate all the other copies. 

126
00:09:24,316 --> 00:09:29,642
It can't do this instantaneously, but we 
want to provide the appearance of a 

127
00:09:29,642 --> 00:09:33,670
atomicity or, or, or that the operations 
are atomic in some way. 

128
00:09:33,670 --> 00:09:38,778
So typically, you'll actually have some 
sub-states that are shared, that are 

129
00:09:38,778 --> 00:09:44,362
stored here, which are something like, oh 
this cash line is currently transitioning 

130
00:09:44,362 --> 00:09:48,653
from, I don't know, U to E. 
Don't allow some other transaction to 

131
00:09:48,653 --> 00:09:52,131
happen to it right now. 
Just kind of block that. 

132
00:09:52,131 --> 00:09:57,632
Another way to do that, the one way is to 
store it actually in the directory, as a 

133
00:09:57,632 --> 00:10:00,579
state bit. 
Another way is you have some fully 

134
00:10:00,579 --> 00:10:05,621
associative structure, a side structure, 
which just has all of the cache lines 

135
00:10:05,621 --> 00:10:09,943
currently in flux. 
And, the directory's smart enough to know 

136
00:10:09,943 --> 00:10:15,414
that if some other request comes in for 
that line, while it's in flux just to 

137
00:10:15,414 --> 00:10:20,064
NACK that request, or negative 
acknowledge that request and tell the 

138
00:10:20,064 --> 00:10:23,295
other cache to retry. 
So you can do it either way. 

139
00:10:23,295 --> 00:10:27,191
but it gets pretty complicated. 
We're not going to talk about all the 

140
00:10:27,191 --> 00:10:31,845
details of that but we'll talk about the 
high level transitions assuming that they 

141
00:10:31,845 --> 00:10:38,840
are somehow topic. 
So here we're going to look at how MSI. 

142
00:10:38,840 --> 00:10:44,217
It fits together with this. 
But you could actually think about doing 

143
00:10:44,217 --> 00:10:49,532
this with Mesi or some other protocol. 
It's a little bit simpler, emphasize a 

144
00:10:49,532 --> 00:10:52,818
little bit simpler so we're going to look 
at that. 

145
00:10:52,818 --> 00:10:58,273
Also the benefit of something like a Mesi 
protocol is lessened in a directory 

146
00:10:58,273 --> 00:11:04,030
because if you pull something in, in the 
exclusive state, which is unmodified at 

147
00:11:04,030 --> 00:11:07,465
the beginning. 
[COUGH] And someone else wants to get a 

148
00:11:07,465 --> 00:11:10,532
read only copy. 
You're basically going to have to send a 

149
00:11:10,532 --> 00:11:13,716
message to that core. 
And that was inexpensive on a bus, 

150
00:11:13,716 --> 00:11:16,899
because it could just see the transaction 
going across. 

151
00:11:16,899 --> 00:11:21,645
And it would just snoop it and would 
demote from E to shared or something like 

152
00:11:21,645 --> 00:11:24,540
that, E to S. 
But now, it actually turns into actual 

153
00:11:24,540 --> 00:11:27,086
work. 
The directory's going to have to generate 

154
00:11:27,086 --> 00:11:29,575
messages. 
And you're going to have to wait for 

155
00:11:29,575 --> 00:11:33,280
responses coming back from a cache which 
had it in exclusive, so. 

156
00:11:33,280 --> 00:11:37,972
[COUGH] full mezies a little bit less 
common when you stretch grow these 

157
00:11:37,972 --> 00:11:45,116
distributed shared memory protocols. 
Okay, so this is a slide we had before. 

158
00:11:45,116 --> 00:11:50,527
This is MSI on a bus. 
Well things change a little bit when we 

159
00:11:50,527 --> 00:11:55,067
go to MSI for directory coherence. 
And before we go through this, I wanted 

160
00:11:55,067 --> 00:12:00,880
to point out, that there is actually two 
different state machines going on here. 

161
00:12:00,880 --> 00:12:06,061
There's one state machine that is 
happening in the cache controllers, so 

162
00:12:06,061 --> 00:12:11,134
actually, in the cache of a respective 
processor. And then there's a different 

163
00:12:11,134 --> 00:12:14,471
state machine which is happening in the 
directory. 

164
00:12:14,471 --> 00:12:18,008
And you'll see that they have different 
letters here. 

165
00:12:18,008 --> 00:12:22,747
This is SU and E versus MS and I. 
And, and we label these differently on 

166
00:12:22,747 --> 00:12:25,750
purpose just to, not, not get totally 
confused. 

167
00:12:25,750 --> 00:12:30,889
And these state machines interact by 
sending messages between each other, and 

168
00:12:30,889 --> 00:12:34,940
as messages flow between the directory 
and the cache. 

169
00:12:34,940 --> 00:12:40,258
There will be both going through 
different state transitions on this, on 

170
00:12:40,258 --> 00:12:46,840
this two tables. 
Okay, so let's, let's jump into this. 

171
00:12:46,840 --> 00:12:52,570
This is the same modified, shared and 
invalid states that we have in our bus 

172
00:12:52,570 --> 00:12:57,320
space snoopy and aside protocol. 
We didn't change anything here. 

173
00:12:57,320 --> 00:13:02,597
And the rules, the rules the same. 
If you haven't modified, you can do a 

174
00:13:02,597 --> 00:13:05,840
right to this and not to send any 
messages. 

175
00:13:07,500 --> 00:13:11,996
If you have a shared, you can read the 
data and not have to contact anybody. 

176
00:13:11,996 --> 00:13:16,672
If you have an invalid, and you want to 
do anything with it, you probably need to 

177
00:13:16,672 --> 00:13:19,849
contact somebody. 
Or you probably need to contact the 

178
00:13:19,849 --> 00:13:22,607
directory. 
Before, we would have to send the 

179
00:13:22,607 --> 00:13:26,444
transaction on the bus. 
Likewise, the transition from S to M or M 

180
00:13:26,444 --> 00:13:29,681
to S where you, used to communicate it 
was the same. 

181
00:13:29,681 --> 00:13:35,402
So think about this as the same state 
machine running, except running on a bus 

182
00:13:35,402 --> 00:13:38,395
where before we would send transactions 
across the bus. 

183
00:13:38,395 --> 00:13:42,094
Now we're going to take those 
transactions and turn them into messages 

184
00:13:42,094 --> 00:13:45,794
that we send to the directory and 
messages that we receive from the 

185
00:13:45,794 --> 00:13:49,983
directory that we have to respond to. 
So before when we were snooping traffic 

186
00:13:49,983 --> 00:13:53,574
crossed the bus which caused us to 
transition different locations. 

187
00:13:53,574 --> 00:13:57,437
So here other processor has intent to 
write and we saw that across the bus. 

188
00:13:57,437 --> 00:14:00,430
So we had to transition ourselves to the 
invalid state. 

189
00:14:00,430 --> 00:14:04,825
Now, we're actually going to get a 
message from the directory controller. 

190
00:14:04,825 --> 00:14:09,538
So let's, let's walk through this. 
But it's, it's almost exactly the same as 

191
00:14:09,538 --> 00:14:12,978
what we saw before. 
So this is the, the cache date for a 

192
00:14:12,978 --> 00:14:22,040
particular line for processor P1. 
we'll start with the entry points. 

193
00:14:22,040 --> 00:14:28,215
We start off an invalid and let's say we 
want to get a read, a readable copy of 

194
00:14:28,215 --> 00:14:32,720
this line. 
So we're going to take a read miss. 

195
00:14:32,720 --> 00:14:37,829
So what we're going to do is plus serve 
one is actually going to send a read miss 

196
00:14:37,829 --> 00:14:42,876
message through the directory controller. 
And during that time, it does not have a 

197
00:14:42,876 --> 00:14:45,804
readable copy. 
It cannot go and access the data. 

198
00:14:45,804 --> 00:14:49,800
It's, it's a, it's effectively still in 
the I state. 

199
00:14:49,800 --> 00:14:53,387
[COUGH] Sometimes people will actually 
have sort of a pending state here 

200
00:14:53,387 --> 00:14:57,662
depending on how you go to implement this 
depends if you have a side structure sort 

201
00:14:57,662 --> 00:15:01,053
or something like a mishandling registrar 
where you'll track that in. 

202
00:15:01,053 --> 00:15:04,400
Or you can track that in the, the cache 
data itself. 

203
00:15:04,400 --> 00:15:10,041
[COUGH] So you're going to read miss. 
You send the read miss message, and 

204
00:15:10,041 --> 00:15:15,526
you're waiting for a response. 
This response is going to have the data 

205
00:15:15,526 --> 00:15:19,522
that you need. 
And, it's going to be synchronization 

206
00:15:19,522 --> 00:15:24,100
points saying, okay you're safe to 
transition to S. 

207
00:15:24,100 --> 00:15:28,412
Okay that seems pretty simple. 
Similar sort of thing here for write miss 

208
00:15:28,412 --> 00:15:33,324
if we're in the in invalid state and we 
do a write we're going to send a write 

209
00:15:33,324 --> 00:15:38,441
miss request to the directory controller. 
It's going to do something and it may we 

210
00:15:38,441 --> 00:15:42,473
may have to be waiting for awhile here 
cause it may have to go invalidate all of 

211
00:15:42,473 --> 00:15:45,958
the other lines in the system. 
[COUGH] And then it gets a response and 

212
00:15:45,958 --> 00:15:49,841
once it gets a response we have a data 
that we can transition to the modified 

213
00:15:49,841 --> 00:15:53,704
state. 
So as we said, these arcs are pretty easy 

214
00:15:53,704 --> 00:15:59,807
you can read by P1 and nothing changes or 
we can read or write from the M state by 

215
00:15:59,807 --> 00:16:06,009
P1 and we also communicate with anybody. 
But now we have a few different messages 

216
00:16:06,009 --> 00:16:09,420
coming in here. 
If we're in the shared state, 

217
00:16:09,420 --> 00:16:15,080
we have to be responsive to an 
invalidation message. 

218
00:16:15,080 --> 00:16:18,774
Which is a little bit different than a 
bus snoop. 

219
00:16:18,774 --> 00:16:22,580
So before, we saw another processor 
trying to write. 

220
00:16:22,580 --> 00:16:27,873
That's when transition goes to I, but now 
the directory controller sends us a 

221
00:16:27,873 --> 00:16:33,236
message which says, invalidate this line 
and that will transition us to I here. 

222
00:16:33,236 --> 00:16:35,579
Note, 
there will probably be a reply. 

223
00:16:35,579 --> 00:16:40,357
We will probably have to send a reply, 
because the director controller wants to 

224
00:16:40,357 --> 00:16:43,364
know. 
When all of the cache lines in the system 

225
00:16:43,364 --> 00:16:47,308
have been validated and it may take a 
variable amount of time and its sending 

226
00:16:47,308 --> 00:16:51,202
messages so it wants to wait for a reply 
to come back so we're going to have to 

227
00:16:51,202 --> 00:16:55,306
send a reply. 
So this arc here is similar. 

228
00:16:55,306 --> 00:16:59,659
Except, we need to write back data, cause 
we had modified data. 

229
00:16:59,659 --> 00:17:03,802
We had writable data. 
We get an invalidate message from the 

230
00:17:03,802 --> 00:17:07,945
directory controller. 
So we need to write back the data, and 

231
00:17:07,945 --> 00:17:11,878
then reply afterwards. 
Similar, similar sort of idea here. 

232
00:17:11,878 --> 00:17:15,108
Okay, 
so that leaves two arcs left here in the 

233
00:17:15,108 --> 00:17:19,201
middle. 
We're in shared, and we want to do a 

234
00:17:19,201 --> 00:17:23,611
right to a, to that cache line. 
So, our cache we have in the shared 

235
00:17:23,611 --> 00:17:25,917
state. 
We want to do a write to it. 

236
00:17:25,917 --> 00:17:31,917
[COUGH] Before we can actually do a write 
we have to send a message to the 

237
00:17:31,917 --> 00:17:35,327
directory saying, I'm doing a write miss 
here. 

238
00:17:35,327 --> 00:17:41,842
I want to get this data writable. 
And we have to wait for a reply before we 

239
00:17:41,842 --> 00:17:46,037
transition here. 
[COUGH] because we have to wait for the 

240
00:17:46,037 --> 00:17:50,989
directory contror to communicate with all 
the other cache's so that they don't have 

241
00:17:50,989 --> 00:17:53,760
redoing copies and we can have a writable 
copy. 

242
00:17:55,560 --> 00:17:59,819
So it's going to invalidate all the other 
readable copies in the meantime. 

243
00:17:59,819 --> 00:18:05,129
And then finally, we have an edge coming 
this way which is from modified down to 

244
00:18:05,129 --> 00:18:07,852
shared. 
And this is a little bit different. 

245
00:18:07,852 --> 00:18:12,520
well, it's the same idea here. 
Another processor is tying to do a read. 

246
00:18:13,540 --> 00:18:18,319
So we have in a modified state when 
another processor tries to do a read. 

247
00:18:18,319 --> 00:18:22,968
So we receive a read miss message. 
We don't need to invalidate the data, 

248
00:18:22,968 --> 00:18:27,682
but we need to write back the data. 
because we have the most up to date copy, 

249
00:18:27,682 --> 00:18:31,675
because we had it modified. 
So we're going to go into write back the 

250
00:18:31,675 --> 00:18:36,193
data and that's going to be response, 
and then we're going to transition to 

251
00:18:36,193 --> 00:18:40,120
share and state. 
We can keep a read copy of this, because 

252
00:18:40,120 --> 00:18:45,540
the other, the other core is, is, is only 
having a, a readable copy of it also. 

253
00:18:47,000 --> 00:18:48,430
Okay, 
so that's the. 

254
00:18:48,430 --> 00:18:53,850
Any questions about that so far? 
Okay, 

255
00:18:53,850 --> 00:19:07,220
so two interesting arcs that we're going 
to add in here is this one and this one. 

256
00:19:07,220 --> 00:19:10,800
Which we didn't have in our base MSI 
protocol. 

257
00:19:11,860 --> 00:19:19,088
And you know, you may not need these. 
But what these correspond to is, if our 

258
00:19:19,088 --> 00:19:27,087
cache has the data in it and then because 
of let's say a conflict miss, or capacity 

259
00:19:27,087 --> 00:19:33,185
miss it gets bumped out. 
It might be a good idea to go update the 

260
00:19:33,185 --> 00:19:36,889
directory, 
and tell the directory that in the future 

261
00:19:36,889 --> 00:19:41,883
if some other cache wants to go get that 
data, that it doesn't need to go contact 

262
00:19:41,883 --> 00:19:45,703
you again. 
[COUGH] So, if it's in the modified state 

263
00:19:45,703 --> 00:19:50,258
we can write back the data because we 
have dirty data we write back that the 

264
00:19:50,258 --> 00:19:54,271
directory and then we notify the 
directory saying we don't have a copy of 

265
00:19:54,271 --> 00:19:57,200
this anymore you can transition to having 
it uncached. 

266
00:19:58,240 --> 00:20:02,268
Likewise here, if we have a read-only 
copy we may or may not want to do this. 

267
00:20:02,268 --> 00:20:06,085
If we, if there's, you know, extra 
bandwidth on the, on the interconnect we 

268
00:20:06,085 --> 00:20:09,318
might want to send a message when we do 
an invalidation here. 

269
00:20:09,318 --> 00:20:13,506
And this is not an invalidation because 
of an invalidation message, but this is 

270
00:20:13,506 --> 00:20:17,260
an invalidation, because it just gets 
bumped out of the cache. 

271
00:20:17,260 --> 00:20:21,275
We may want to notify the directory 
saying please remove us from the sharer 

272
00:20:21,275 --> 00:20:25,444
list. 
And if the sharer list is already empty, 

273
00:20:25,444 --> 00:20:30,507
the, the directory might change the cache 
line from shared to being uncached 

274
00:20:30,507 --> 00:20:35,143
completely. 
but I do want to point out that these are 

275
00:20:35,143 --> 00:20:40,072
not strictly necessary. 
The reason they're not strictly necessary 

276
00:20:40,072 --> 00:20:44,202
is, if we build the cache controller 
system such that if you're in the invalid 

277
00:20:44,202 --> 00:20:48,439
state for a particular cached line, and 
you get some message coming in that would 

278
00:20:48,439 --> 00:20:52,032
have been let's say this message, or that 
message, or some other arc. 

279
00:20:52,032 --> 00:20:55,411
We can just reply back saying yeah, we 
don't have it anymore. 

280
00:20:55,411 --> 00:20:59,380
We're invalid, you know, we don't really 
care about that, that transition. 

281
00:21:00,560 --> 00:21:04,624
So if you were, you were here, the only 
message that's going to come really to 

282
00:21:04,624 --> 00:21:08,689
you is an invalidation message that would 
just take you to this state anyway. 

283
00:21:08,689 --> 00:21:12,701
So, we can just ignore the message or 
just reply the same as we would to the 

284
00:21:12,701 --> 00:21:20,951
normal invalidation message. 
Okay, so directory state transition looks 

285
00:21:20,951 --> 00:21:29,960
a little different here. 
We have uncached, shared, and exclusive. 

286
00:21:31,780 --> 00:21:37,075
[COUGH] As we said, shared means there 
can be multiple read-only copies in the 

287
00:21:37,075 --> 00:21:40,402
system. 
Exclusive means there's only one cache in 

288
00:21:40,402 --> 00:21:45,019
the system with that data. 
What's interesting here is if you were to 

289
00:21:45,019 --> 00:21:50,662
actually have a MESI protocol running, 
that would not change the protocol 

290
00:21:50,662 --> 00:21:55,875
running in the directory. 
Because exclusive here is effectively the 

291
00:21:55,875 --> 00:22:00,824
same, same state, 
with respect to how the directory sees 

292
00:22:00,824 --> 00:22:03,620
the line you won't have to do anything 
different. 

293
00:22:03,620 --> 00:22:08,070
Okay, so let's walk through a few 
transition here of the state of the cache 

294
00:22:08,070 --> 00:22:10,980
line in the directory and this is not in 
the cache. 

295
00:22:12,060 --> 00:22:17,503
Let's start off uncashed and let's say 
we're getting a message which is a read 

296
00:22:17,503 --> 00:22:22,484
miss from processor P. 
Well, we should transition to S now. 

297
00:22:22,484 --> 00:22:27,838
We should give it a readable copy and we 
should reply with the actual data and we 

298
00:22:27,838 --> 00:22:33,322
should put P on the sharer list, so that 
we know that if someone else needs to go 

299
00:22:33,322 --> 00:22:36,760
invalidate that line we need to go 
contact P. 

300
00:22:36,760 --> 00:22:41,169
Now that we're in the shared state, let's 
say there's other read misses from other 

301
00:22:41,169 --> 00:22:44,664
P's other processors here. 
Well were going to give it up the data 

302
00:22:44,664 --> 00:22:49,720
and we're going to add it to the sharer 
list so we're take sharers and add to it. 

303
00:22:49,720 --> 00:22:53,340
The processor the sharer list is just 
going to grow. 

304
00:22:53,340 --> 00:22:59,165
Okay lets, lets start here and go the 
other way where an un uncached in all the 

305
00:22:59,165 --> 00:23:05,065
sun in we get a rightness from proster P. 
Well we give it the data and the sharer 

306
00:23:05,065 --> 00:23:10,670
list or the owner is going to get P 
uniquely on to it, ever going to give it 

307
00:23:10,670 --> 00:23:14,357
in these causes day because we're on 
cache reform. 

308
00:23:14,357 --> 00:23:23,442
We don't want to contact anybody else. 
let's look at this art here before we go 

309
00:23:23,442 --> 00:23:26,573
to these. 
So this is a little bit different. 

310
00:23:26,573 --> 00:23:32,178
Quite a bit different than what we had in 
these slides, because it's doing 

311
00:23:32,178 --> 00:23:36,327
something different. 
But in this state here, we know, let's 

312
00:23:36,327 --> 00:23:44,480
say, processor P zero has the data 
exclusively. 

313
00:23:44,480 --> 00:23:49,057
But all of a sudden, a different 
processor, let's say processor two goes 

314
00:23:49,057 --> 00:23:52,668
to access the data. 
Well, we already have the data in the 

315
00:23:52,668 --> 00:23:56,402
exclusive state. 
So we're going to stay in this exclusive 

316
00:23:56,402 --> 00:23:59,913
state because some other caches going to 
want to get it exclusive, but it's 

317
00:23:59,913 --> 00:24:04,298
different cache. 
So what has to happen here is we need to 

318
00:24:04,298 --> 00:24:09,818
go invalidate the data out of P zero. 
P zero is going to write back the data, 

319
00:24:09,818 --> 00:24:13,480
it's going to transition to the invalid 
state. 

320
00:24:13,480 --> 00:24:21,960
[COUGH] The, we need to then provide the 
data to the new processor P2 we'll say 

321
00:24:21,960 --> 00:24:29,036
and add that P2 to the sharer list. 
So we can, we can transition to this 

322
00:24:29,036 --> 00:24:33,863
state and then finally let's look at the 
edges between these two points oh, 

323
00:24:33,863 --> 00:24:41,440
actually let's go this way first. 
if you've data that gets ridden back. 

324
00:24:41,440 --> 00:24:46,631
so this is that arc, which I said is 
similar to the arc here, which is 

325
00:24:46,631 --> 00:24:50,935
optional. 
Let's say you have data that gets right, 

326
00:24:50,935 --> 00:24:55,213
ridden back here. 
Actually this, this arc may not be 

327
00:24:55,213 --> 00:24:58,010
optional, 
let's think about that for a second. 

328
00:24:58,010 --> 00:25:03,151
This arc may not be optional. 
no it's still optional. 

329
00:25:03,151 --> 00:25:07,633
because you can just NACK the message 
effectively, and, and tell it it's in 

330
00:25:07,633 --> 00:25:11,193
main memory. 
okay, so let's hear, and you see a data 

331
00:25:11,193 --> 00:25:15,000
write back happening. 
So, message gets sent to you which is the 

332
00:25:15,000 --> 00:25:19,174
equivalent of this arg here. 
The data was writeable, was exclusive to 

333
00:25:19,174 --> 00:25:24,085
some cache, and it's no longer writeable. 
It's probably a good idea to go contact 

334
00:25:24,085 --> 00:25:27,768
the directory, write back the data, and 
clear the sharer list. 

335
00:25:27,768 --> 00:25:32,311
The sharer list is empty, so it knows 
that no one has a copy of it, at that 

336
00:25:32,311 --> 00:25:38,012
point. 
Okay few other financials here, okay we 

337
00:25:38,012 --> 00:25:43,710
are in the shared state. 
So we have multiple read-only copies. 

338
00:25:43,710 --> 00:25:49,813
And one cache comes along and says,"Oh, I 
need to do a writeness message." I need 

339
00:25:49,813 --> 00:25:54,758
to get a writtable. 
Well, now we actually have to go through 

340
00:25:54,758 --> 00:25:57,801
a pretty long process. 
We're going to walk through the entire 

341
00:25:57,801 --> 00:26:01,736
sharer list and send messages to all the 
sharers in the sharer list saying, 

342
00:26:01,736 --> 00:26:04,960
invalidate this copy and tell me when 
you're done. 

343
00:26:04,960 --> 00:26:07,792
We're going to collect all the responses 
at the directory. 

344
00:26:07,792 --> 00:26:11,455
And once all the responses have come 
back, we know no one else has readable 

345
00:26:11,455 --> 00:26:16,324
copy. 
We can give the data value to the 

346
00:26:16,324 --> 00:26:21,680
requester. 
And add it to the sharer or owner list. 

347
00:26:24,100 --> 00:26:27,309
Okay, 
last arc here is from E to S. 

348
00:26:27,309 --> 00:26:33,078
This orange arc and that happens if we 
have a particular line as writable in one 

349
00:26:33,078 --> 00:26:36,220
cash, and another cash wants to go read 
it now. 

350
00:26:36,220 --> 00:26:41,174
Will send a read miss the other cache is 
going to downgrade from E to S, excuse me 

351
00:26:42,236 --> 00:26:47,262
from M to S in its vocal cache. 
But the directory is going to transition 

352
00:26:47,262 --> 00:26:52,429
from E to S here and we have to go get 
the most up to date from the node. 

353
00:26:52,429 --> 00:26:58,091
So, we're going to send a fetches and a 
fetch request to the node that had it 

354
00:26:58,091 --> 00:27:03,825
before and exclusive, once you get the up 
to most up to date data you can forward 

355
00:27:03,825 --> 00:27:09,346
that to the new reader and everyone and, 
and we add their processor to the sharer 

356
00:27:09,346 --> 00:27:11,750
list. 
Okay, 

357
00:27:11,750 --> 00:27:17,058
so questions about that one so far? 
These, these do start to get a little 

358
00:27:17,058 --> 00:27:19,620
complicated because you have multiple 
state machines interacting. 

359
00:27:24,640 --> 00:27:27,297
Okay, so were going to speed up a little 
bit here. 

360
00:27:27,297 --> 00:27:31,342
I include this chart from your book just 
to give you an example of. 

361
00:27:31,342 --> 00:27:34,867
We went through, very quickly here, all 
the different messages. 

362
00:27:34,867 --> 00:27:38,334
And, this chart here sums up all the 
different message types. 

363
00:27:38,334 --> 00:27:41,627
And from who they could go from and who 
they could go to. 

364
00:27:41,627 --> 00:27:45,730
And this is, this is in your textbook. 
and sometimes messages need to 

365
00:27:45,730 --> 00:27:49,312
communicate addresses. 
Sometimes they need to communicate data. 

366
00:27:49,312 --> 00:27:53,530
Sometimes they need to communicate which 
node the message is coming from. 

367
00:27:53,530 --> 00:27:57,575
To add it to the, the sharer list. 
But I'm not going to go through this 

368
00:27:57,575 --> 00:28:00,664
into, to great detail. 
One think I did want to say is, these 

369
00:28:00,664 --> 00:28:04,034
message types here, do not include 
[INAUDIBLE]. 

370
00:28:04,034 --> 00:28:12,020
So, when you go to request something, 
there's replies that come back. 

371
00:28:13,120 --> 00:28:19,880
These replies after, that's not drawn in 
this 

372
00:28:19,880 --> 00:28:22,853
diagram. 
We, we see data value reply but that's 

373
00:28:22,853 --> 00:28:27,765
not, that's just what of, actual data. 
There's not like a, response coming back 

374
00:28:27,765 --> 00:28:32,937
from the sharer acking the, the sharer, 
or acking the invalidator or something 

375
00:28:32,937 --> 00:28:36,104
like that. 
Another type of message that is pretty 

376
00:28:36,104 --> 00:28:40,176
common, that is not drawn here is a 
negative acknowledgement. 

377
00:28:40,176 --> 00:28:45,477
So it's pretty common if you have a cache 
line that is being transitioned, it's in 

378
00:28:45,477 --> 00:28:50,040
a pending state, at the directory, and 
get a request coming in. 

379
00:28:50,040 --> 00:28:53,803
You might need to tell that cach retry 
later. 

380
00:28:53,803 --> 00:28:56,180
I can't handle this case later right now.