1
00:00:04,100 --> 00:00:08,972
So we're going to introduce this notion of
snoopy cache coherence

2
00:00:08,972 --> 00:00:13,696
or snoopy cache.
And this is work by Jim Goodman and one of

3
00:00:13,696 --> 00:00:19,220
his students who's now I think a UC Irvine
faculty member.

4
00:00:19,220 --> 00:00:23,980
And they had this idea that have the cache

5
00:00:23,980 --> 00:00:29,215
watch.
Or what they'll call is snoop either

6
00:00:29,215 --> 00:00:36,406
DMA transfers or later it was extended to
transactions from other processors.

7
00:00:36,406 --> 00:00:41,180
Other caches.
And then do the right thing.

8
00:00:41,180 --> 00:00:43,580
Now, what do we mean by, do the right
thing?

9
00:00:44,710 --> 00:00:47,124
Well, I'm purposely being a little bit

10
00:00:47,124 --> 00:00:49,538
vague there, because there's sort of two
or

11
00:00:49,538 --> 00:00:54,070
two classes of ideas that you can, you can
do for doing the right thing.

12
00:00:54,070 --> 00:01:00,010
But the most basic, do the right thing, is
If a cache sees a memory transaction

13
00:01:00,010 --> 00:01:04,880
go by for an address it has inside the
cache, it needs to take some action.

14
00:01:05,990 --> 00:01:07,930
Now what is that action, could be?

15
00:01:07,930 --> 00:01:11,155
Well, maybe the most basic thing it could
do is if it

16
00:01:11,155 --> 00:01:15,470
sees that transaction go by and it knows
that it has an address.

17
00:01:15,470 --> 00:01:18,260
Which is going across the bus and someone
is trying to do a.

18
00:01:18,260 --> 00:01:19,180
Memory transaction for

19
00:01:19,180 --> 00:01:21,450
an address, which it's caching.

20
00:01:21,450 --> 00:01:25,430
It might need to at least tell the other
entity that something is going on.

21
00:01:25,430 --> 00:01:28,350
That it has a, the more up to date copy,
somehow.

22
00:01:28,350 --> 00:01:32,590
And we're going to look at in today's
class two different

23
00:01:32,590 --> 00:01:38,100
protocols, or two different classes of
protocols to handle this problem.

24
00:01:39,650 --> 00:01:44,210
Now, if we go look at this the, the
implementation of this we have a

25
00:01:44,210 --> 00:01:47,260
processor and we have a cache.
And what do we have in a cache?

26
00:01:47,260 --> 00:01:53,219
Well, we have the data array, which
actually has the data and then you

27
00:01:53,219 --> 00:01:59,520
also have a tag and state array.
So the tags have the, the tag match logic.

28
00:01:59,520 --> 00:02:03,770
And it also has the actual tags.
the upper bits if you will of the address.

29
00:02:05,010 --> 00:02:10,570
And the state array has whether the data's
dirty and whether it's valid.

30
00:02:10,570 --> 00:02:15,380
and usually sort of least recently used or
most recently used information.

31
00:02:16,820 --> 00:02:21,750
And the insight that the Snoopy cache work
had was that instead of

32
00:02:21,750 --> 00:02:27,150
having only one port into the tag array,
the tag and state array.

33
00:02:27,150 --> 00:02:29,460
Instead you add a second port.

34
00:02:29,460 --> 00:02:36,370
So the second port here is attached to the
memory bus.

35
00:02:36,370 --> 00:02:42,290
And this second port has to watch all
transactions on the bus.

36
00:02:42,290 --> 00:02:45,965
And if it sees a transaction, which has an

37
00:02:45,965 --> 00:02:51,540
address that is it has in its cache
already.

38
00:02:51,540 --> 00:02:54,640
It has to somehow signal that there's a
case where there's

39
00:02:54,640 --> 00:02:58,390
going to be sterile data and something
needs to be done.

40
00:02:58,390 --> 00:03:01,558
So that might include removing the
removing the data from

41
00:03:01,558 --> 00:03:02,860
its own cache.

42
00:03:02,860 --> 00:03:05,558
That might include it sending the data
across

43
00:03:05,558 --> 00:03:08,895
the bus and telling the other entity data
processor

44
00:03:08,895 --> 00:03:13,168
we'll say it or DMA agent is trying to
read the data to wait a little bit.

45
00:03:13,168 --> 00:03:15,352
And we're going to look at that over the
next

46
00:03:15,352 --> 00:03:18,620
few slides of what different things can be
done there.

47
00:03:18,620 --> 00:03:22,400
But what's the downside to something like
this snooping protocol?

48
00:03:22,400 --> 00:03:26,630
Well, every entity, every processor or
every cache

49
00:03:26,630 --> 00:03:31,880
that sits on the shared bus has to watch
all memory transactions.

50
00:03:31,880 --> 00:03:33,820
So they have to watch everything.

51
00:03:33,820 --> 00:03:38,230
And, the problem with this is the
processor wants to access this cache.

52
00:03:38,230 --> 00:03:41,350
So it wants to be able to do operations
can currently while

53
00:03:41,350 --> 00:03:44,760
the stuff happening at the bus, which is
not related to it.

54
00:03:44,760 --> 00:03:49,200
So this requires our Snoopy cache tags to
be dual ported.

55
00:03:49,200 --> 00:03:51,675
So, there is two ports to this tag which

56
00:03:51,675 --> 00:03:55,299
makes it bigger and potentially weighs
power etcetera.

57
00:03:58,420 --> 00:03:58,780
Okay.

58
00:03:58,780 --> 00:04:04,765
So let's take a look at our shared memory
multiprocessor, now that

59
00:04:04,765 --> 00:04:11,130
we are thinking about having snooping
caches or snoopy caches.

60
00:04:11,130 --> 00:04:15,108
So in this picture here we now have a
shared memory bus, in the

61
00:04:15,108 --> 00:04:20,110
middle we have three processors on the
left and we have three snoopy caches.

62
00:04:21,270 --> 00:04:22,885
So now, whenever,

63
00:04:22,885 --> 00:04:29,155
let's say, processor 3, tries to do a
transaction, it needs to, let's

64
00:04:29,155 --> 00:04:34,906
say broadcast onto the bus that it's
trying to read a certain address.

65
00:04:34,906 --> 00:04:37,930
The other cache needs to be notified that
they need

66
00:04:37,930 --> 00:04:41,620
to check their own tags then do the right
thing.

67
00:04:41,620 --> 00:04:45,100
So what is do the right thing mean?

68
00:04:45,100 --> 00:04:47,420
So let's, let's, let's take a look at
that.

69
00:04:49,905 --> 00:04:55,097
And we're going to broadly put snoopy
cache coherence

70
00:04:55,097 --> 00:05:01,360
protocols into two different categories
here.

71
00:05:02,700 --> 00:05:05,710
The first one we're going to call is
update protocls.

72
00:05:05,710 --> 00:05:10,050
And the second one is going to be
invalidation protocols.

73
00:05:11,790 --> 00:05:14,476
So what's, what's the difference?
Well,

74
00:05:14,476 --> 00:05:20,170
we're trying to maintain the data to be
consistent amongst all of the caches.

75
00:05:20,170 --> 00:05:22,090
So we're trying to get rid of this tail
data.

76
00:05:24,330 --> 00:05:26,409
The first thing you do is you can try to
do

77
00:05:26,409 --> 00:05:31,220
a write update or what sometimes people
will call broadcast based protocols.

78
00:05:31,220 --> 00:05:35,290
So the basic idea is whenever you do a
write,

79
00:05:35,290 --> 00:05:39,859
to your cache, you also write that on to
the bus.

80
00:05:41,150 --> 00:05:43,020
And everyone is listening on the bus.

81
00:05:43,020 --> 00:05:44,190
Everyone's snooping on the bus.

82
00:05:45,830 --> 00:05:49,786
And if you let's say processor one does a
write

83
00:05:49,786 --> 00:05:51,890
to address five.

84
00:05:51,890 --> 00:05:56,154
If processor two has address five in its
cache, it's going to

85
00:05:56,154 --> 00:06:01,009
see the write and it's going to take the
updated data and update itself.

86
00:06:02,170 --> 00:06:04,414
So this is going to this is sort of the

87
00:06:04,414 --> 00:06:08,010
moral equivalent of the write through case
we saw before.

88
00:06:08,010 --> 00:06:11,373
But now on the bus when someone does a
write, everyone updates their

89
00:06:11,373 --> 00:06:15,208
local caches by listening on the bus for
the particular address and listening

90
00:06:15,208 --> 00:06:18,580
for the new updated data.
So its a broadcast.

91
00:06:18,580 --> 00:06:25,906
So every write you do that transitions or
every write you do sort of through

92
00:06:25,906 --> 00:06:33,040
a memory transaction is going to update
all of the other caches in the system.

93
00:06:33,040 --> 00:06:34,230
Now, why is this good?

94
00:06:34,230 --> 00:06:37,896
Well, it guarantees that when another
processor tries

95
00:06:37,896 --> 00:06:40,626
to do a read of, say, that address five,

96
00:06:40,626 --> 00:06:45,420
it now longer has a stale value.
It has now the most up to date value.

97
00:06:45,420 --> 00:06:48,400
Because we broadcast it when we update it
in place.

98
00:06:48,400 --> 00:06:51,420
We'll talk about that in a little more
detail in a second.

99
00:06:51,420 --> 00:06:54,657
Second thing you can do is what is
actually more

100
00:06:54,657 --> 00:06:56,815
commonplace today is called an

101
00:06:56,815 --> 00:07:00,760
invalidation protocol or write invalidate
protocol.

102
00:07:01,910 --> 00:07:05,042
And a write invalidate protocol,

103
00:07:05,042 --> 00:07:12,030
whenever you do a write, you invalidate
all other copies of a piece of data.

104
00:07:12,030 --> 00:07:17,574
And by invalidating all other cache
copies, you've effectively

105
00:07:17,574 --> 00:07:22,680
remove the possibility that you could have
stale data.

106
00:07:22,680 --> 00:07:26,560
So, let's look at this, these two
protocols in more detail here.

107
00:07:27,570 --> 00:07:30,222
So, first of all we'll look at

108
00:07:30,222 --> 00:07:35,580
a write update based protocol or a
broadcast based protocol.

109
00:07:35,580 --> 00:07:36,850
And we're going to look at two cases.

110
00:07:36,850 --> 00:07:42,990
A, write miss to the cache and a read miss
to the cache.

111
00:07:45,050 --> 00:07:48,220
Okay.
First of all, a write miss.

112
00:07:48,220 --> 00:07:50,710
So you go to a write miss.

113
00:07:50,710 --> 00:07:54,987
Well, if your miss in your cache and doing
it right, you

114
00:07:54,987 --> 00:08:00,730
tell everyone else in the system that you
are doing the write.

115
00:08:00,730 --> 00:08:04,074
And all other processors which are
listening

116
00:08:04,074 --> 00:08:07,280
to the bus, update their copies in place.

117
00:08:08,340 --> 00:08:10,134
So, you broadcast on the bus,

118
00:08:10,134 --> 00:08:11,960
I'm running to address five.

119
00:08:11,960 --> 00:08:16,690
Somebody has address five in their cache,
they update their copy internally.

120
00:08:18,570 --> 00:08:19,754
Okay.

121
00:08:19,754 --> 00:08:28,750
that, that sounds not, not not so, not so
bad but there's a lot of bandwidths there.

122
00:08:28,750 --> 00:08:32,530
You're basically broadcasting rights to
everybody.

123
00:08:32,530 --> 00:08:35,049
let's look at the read side.

124
00:08:36,690 --> 00:08:44,980
On the read miss case here, you know that
main memory is always up to date.

125
00:08:44,980 --> 00:08:50,930
Because you are forced to basically write
through to main memory.

126
00:08:50,930 --> 00:08:53,310
So you just go to main memory without
reading this.

127
00:08:53,310 --> 00:08:56,020
And you don't have to even, sort of, check
other caches.

128
00:08:56,020 --> 00:08:57,219
On let's go to read hit.

129
00:08:58,524 --> 00:09:01,818
a read hit your cache and you know that
the data is in your cache and

130
00:09:01,818 --> 00:09:04,950
you know the data is up to date, so you
don't have to worry.

131
00:09:07,090 --> 00:09:07,620
Okay.

132
00:09:07,620 --> 00:09:14,276
So let's say up, update protocol, let's
take a look at a right invalidate

133
00:09:14,276 --> 00:09:20,480
protocol, what happens on a write miss and
a, and a read miss.

134
00:09:20,480 --> 00:09:26,048
So on a write miss in a invalidation based
protocol, you're going to

135
00:09:26,048 --> 00:09:32,288
actually invalidate all of the other
caches, which have that address in them

136
00:09:32,288 --> 00:09:37,850
before you're allowed to do the right.
Now how does this work?

137
00:09:37,850 --> 00:09:41,126
Well, on that shared multi-drop bus we
screen,

138
00:09:41,126 --> 00:09:43,690
I'm going to do a write to address 5.

139
00:09:43,690 --> 00:09:47,540
And everyone else because they have a
snoopy cache, is listening

140
00:09:47,540 --> 00:09:51,000
and they hear someone is doing a write to
address 5.

141
00:09:51,000 --> 00:09:56,920
And what they do is invalidate their data.
They remove it from their cache.

142
00:09:56,920 --> 00:09:57,360
And by virtue

143
00:09:57,360 --> 00:09:59,846
of them removing it from their cache, You
can

144
00:09:59,846 --> 00:10:04,180
no longer have a stale value in the, their
cache.

145
00:10:04,180 --> 00:10:06,420
Now, I will want to point out here though
that,

146
00:10:06,420 --> 00:10:08,730
one of the things you might have to do is.

147
00:10:08,730 --> 00:10:10,930
When you go to do the invalidate, you
might actually

148
00:10:10,930 --> 00:10:14,050
have to write back that data to main
memory somehow.

149
00:10:14,050 --> 00:10:17,520
Because, if you have a, let's say a right
back cache.

150
00:10:17,520 --> 00:10:22,460
And a different processor has a dirty
piece of data let's

151
00:10:22,460 --> 00:10:29,280
say for address five and then processor
one tries to write that data.

152
00:10:29,280 --> 00:10:32,340
Well, at some point you need to, sort of,
merge the data in a cash line.

153
00:10:33,470 --> 00:10:36,300
Processor 2 might have to go and
invalidate.

154
00:10:36,300 --> 00:10:39,110
But at least it knows that it has to do
this invalidation.

155
00:10:39,110 --> 00:10:42,950
It has to do the invalidation before
processor 1 does the write.

156
00:10:45,160 --> 00:10:49,700
But we can do this all on our Snoopy
shared bus.

157
00:10:51,740 --> 00:10:55,233
Okay.
What happens on a read miss?

158
00:10:55,233 --> 00:11:01,570
well, if no one else has the copy this is
easy.

159
00:11:01,570 --> 00:11:06,020
You scream on the bus, you scream in the
room saying does anyone have address 5?

160
00:11:06,020 --> 00:11:09,320
And then if it's an empty room no one

161
00:11:09,320 --> 00:11:14,300
yells back and no other caches have that
data.

162
00:11:14,300 --> 00:11:16,764
But conversely if someone

163
00:11:16,764 --> 00:11:23,960
has a dirty copy of that data, they need
to write it back to main memory.

164
00:11:23,960 --> 00:11:27,440
And you need to read that most up to date
copy.

165
00:11:27,440 --> 00:11:30,364
So you need to flush the data and
potentially

166
00:11:30,364 --> 00:11:34,130
invalidate the data depending on the cache
coherence protocol.

167
00:11:34,130 --> 00:11:36,410
And we're going to look at a couple
different cases of that.

168
00:11:38,560 --> 00:11:42,394
So before I move off this, I just want to,
sort of, point out when

169
00:11:42,394 --> 00:11:45,447
write invalidate protocols versus write
update or

170
00:11:45,447 --> 00:11:48,630
broadcast protocols are good and vice
versa.

171
00:11:48,630 --> 00:11:54,290
So most processors you'll see today use
some sort of invalidation protocol.

172
00:11:54,290 --> 00:11:56,875
And generally, those are those actually
are going to work

173
00:11:56,875 --> 00:12:00,030
better because there's going to be less
bandwidth across your bus.

174
00:12:00,030 --> 00:12:03,679
Because you only need to communicate
across a bus

175
00:12:03,679 --> 00:12:08,460
when you take a cache miss or write or, or
read.

176
00:12:08,460 --> 00:12:15,880
conversely update protocols you basically
have to broadcast all, all your updates.

177
00:12:15,880 --> 00:12:19,444
But the, the, in cases where update-based
protocol actually

178
00:12:19,444 --> 00:12:23,300
wins As if you have many readers and one
writer.

179
00:12:24,450 --> 00:12:28,684
So let's say you have five processors
which are reading the output of

180
00:12:28,684 --> 00:12:30,730
one processor via memory.

181
00:12:30,730 --> 00:12:33,830
So you have some sort of producer consumer
relationship.

182
00:12:33,830 --> 00:12:38,454
In an update base protocol, you can have
that one processor just write And it will

183
00:12:38,454 --> 00:12:42,270
broadcast and push the data to the five
processors that are trying to do the read.

184
00:12:43,710 --> 00:12:46,635
In the invalidation protocol what's
going to happen is

185
00:12:46,635 --> 00:12:50,080
you're basically going to have to have
invalidates bouncing the

186
00:12:50,080 --> 00:12:54,110
data back and forth and back and forth
every time you do a read or a write.

187
00:12:54,110 --> 00:12:58,786
So there's actually could be more bus
communication in the validation protocol.

188
00:12:58,786 --> 00:13:03,520
For, one reader or excuse me, one writer,
multi-reader cases.

189
00:13:05,170 --> 00:13:08,490
But for the rest of today, we're going to
focus on

190
00:13:08,490 --> 00:13:14,220
invalidation or right invalidate, right
invalidate protocols on a snoopy bus.

191
00:13:23,190 --> 00:13:26,821
Okay.
So let's take a look at the extra

192
00:13:26,821 --> 00:13:31,987
information you need to add into a cache
in order to

193
00:13:31,987 --> 00:13:37,030
go implement something like a write and
validate

194
00:13:37,030 --> 00:13:42,599
protocol.
And we're going to look at a base

195
00:13:42,599 --> 00:13:48,560
case here which is called a MSI protocol.

196
00:13:48,560 --> 00:13:52,390
This is one of the more, the basic ones
that you'll see.

197
00:13:52,390 --> 00:13:55,820
and there's some extra bits that are added
here.

198
00:13:55,820 --> 00:13:57,520
So let's look at the tag information.

199
00:13:57,520 --> 00:14:03,800
So, we have a tag for your processor and
typically you have a valid bit.

200
00:14:03,800 --> 00:14:06,070
When you talk about having valid bits or
dirty bits.

201
00:14:06,070 --> 00:14:09,080
Valid and dirty bits in something like a
processor.

202
00:14:09,080 --> 00:14:13,686
as I was on the cache, excuse me.
So, each cache line has this

203
00:14:13,686 --> 00:14:19,200
information on a per cache line basis.
And instead of having,

204
00:14:19,200 --> 00:14:24,553
let's say, one valid bit and one dirty
bit, we're going

205
00:14:24,553 --> 00:14:29,445
to use these two bits to encode a state in
a state machine.

206
00:14:29,445 --> 00:14:36,810
And we're going to add, so, so we have two
bits so we can code four things.

207
00:14:36,810 --> 00:14:38,730
We can definitely encode three things

208
00:14:38,730 --> 00:14:39,408
in two bits.

209
00:14:39,408 --> 00:14:43,808
So we're going to actually look at a
protocol here that we're going to

210
00:14:43,808 --> 00:14:48,840
call the MSI protocol for this, that's
named for the state names.

211
00:14:48,840 --> 00:14:55,455
So we have M, which is modified, S, which
is shared and I, which is invalid.

212
00:14:55,455 --> 00:14:58,995
And we're going to say that invalid is

213
00:14:58,995 --> 00:15:03,420
basically whether the data is valid or
not.

214
00:15:03,420 --> 00:15:04,059
If you're

215
00:15:04,059 --> 00:15:07,183
in the invalid state it is, it is invalid
if you are in

216
00:15:07,183 --> 00:15:10,970
any of the outer states here, the data is
valid in your cache.

217
00:15:12,880 --> 00:15:15,548
But now instead of having just a dirty
bit, which tells

218
00:15:15,548 --> 00:15:18,490
you whether the data is dirty or not in
your cache.

219
00:15:18,490 --> 00:15:23,270
Instead we're going to have two other
states here, modified and shared.

220
00:15:23,270 --> 00:15:29,512
And what we're going to see is by adding
these states We can guarantee some

221
00:15:29,512 --> 00:15:32,148
level of cache coherence.

222
00:15:32,148 --> 00:15:35,130
And we're going to implement a cache
coherence protocol on top of this.

223
00:15:35,130 --> 00:15:41,370
And, we're going to remove some of the
communication across

224
00:15:41,370 --> 00:15:47,870
the bus by remembering whether the date is
widely

225
00:15:47,870 --> 00:15:53,860
shared or whether we have the sole copy.
So, what are these three states.

226
00:15:55,120 --> 00:15:58,684
I is invalid, that's pretty self
explanatory there, the data

227
00:15:58,684 --> 00:16:02,540
in your cache is invalid, you're not
allowed to read it.

228
00:16:02,540 --> 00:16:05,162
If you try to access somebody who has an
invalid bit

229
00:16:05,162 --> 00:16:07,870
set in your state bits, you are going to
take a cache miss.

230
00:16:09,360 --> 00:16:11,728
And then your going to have to transition
to one of these other two states.

231
00:16:11,728 --> 00:16:16,400
M is modified.

232
00:16:16,400 --> 00:16:20,261
So modified means that the data in the
cache has

233
00:16:20,261 --> 00:16:26,230
been modified relative to what is in main,
main memory.

234
00:16:26,230 --> 00:16:28,670
So, this is dirty data.

235
00:16:28,670 --> 00:16:34,220
And then, we're going to have this other
state here, called shared or S for shared.

236
00:16:34,220 --> 00:16:39,832
Well, what is, what is shared?
Well, S or shared state here is going

237
00:16:39,832 --> 00:16:44,890
to mean that you only have a read copy of
the data.

238
00:16:46,040 --> 00:16:49,610
And someone else may also have a read-only
copy of the data,

239
00:16:49,610 --> 00:16:53,570
hence the term shared.
So it's shared amongst multiple people.

240
00:16:53,570 --> 00:16:57,854
But, we're going to say that when you're
in the shared state you are not allowed to

241
00:16:57,854 --> 00:17:02,908
do a write.
And one of the things I wanted to point

242
00:17:02,908 --> 00:17:11,910
out about this is that each cache line In
each cache has this state machine there.

243
00:17:11,910 --> 00:17:17,860
So, different caches have their own copies
of this state information.

244
00:17:17,860 --> 00:17:20,030
And the state information only relates to

245
00:17:20,030 --> 00:17:23,480
that particular cache or that particular
processor.

246
00:17:23,480 --> 00:17:26,252
And the state diagram we're going to build
up

247
00:17:26,252 --> 00:17:29,306
here is only in relationship to processor
1.

248
00:17:29,306 --> 00:17:33,906
And we'll see that because it's a snoopy
cache, you might

249
00:17:33,906 --> 00:17:37,586
be in a state and then you might see a
transaction

250
00:17:37,586 --> 00:17:39,370
go across the bus.

251
00:17:39,370 --> 00:17:44,130
And you might have to take some action to
keep memory coherent.

252
00:17:44,130 --> 00:17:47,640
So, let's walk through this basic state
diagram here.

253
00:17:47,640 --> 00:17:50,664
We'll see that when we're done we'll
actually have the

254
00:17:50,664 --> 00:17:54,260
data be coherent when we go to implement
something like this.

255
00:17:54,260 --> 00:18:00,810
So what's our first arc here?
Well, let's say we have a read miss.

256
00:18:02,020 --> 00:18:02,956
You are,

257
00:18:02,956 --> 00:18:08,690
we are, in the invalidate state and we do
a read miss.

258
00:18:09,710 --> 00:18:13,490
And, this arc here is basically is from
invalid or it is from invalid, I

259
00:18:13,490 --> 00:18:15,821
just didn't draw it looping around because

260
00:18:15,821 --> 00:18:18,900
other wise the, the diagram gets too
complex.

261
00:18:18,900 --> 00:18:20,570
So, you do a read miss.

262
00:18:22,120 --> 00:18:24,636
You shout onto the, onto the bus, I want

263
00:18:24,636 --> 00:18:27,966
to read some address or processor one here
shouts onto

264
00:18:27,966 --> 00:18:30,760
the bus and says, I want to read an
address.

265
00:18:32,060 --> 00:18:32,500
Okay.

266
00:18:32,500 --> 00:18:35,880
So, something happens on the bus.

267
00:18:35,880 --> 00:18:38,726
The other processors might have to do
transitions.

268
00:18:38,726 --> 00:18:41,242
But ultimately, you're going to get the
data

269
00:18:41,242 --> 00:18:43,510
or get the cache line from main memory.

270
00:18:43,510 --> 00:18:49,360
And your bringing into the shared state.
So, you now have a read only copy.

271
00:18:49,360 --> 00:18:51,290
You can read this from your cache as much
as you want.

272
00:18:51,290 --> 00:18:54,120
And you don't have to communicate on the
bus anymore then.

273
00:18:54,120 --> 00:18:55,279
Because you just have to do a cache hit.

274
00:18:57,040 --> 00:19:00,400
Now, if you're right though, things get a
little more complex.

275
00:19:02,680 --> 00:19:06,268
So, so this is, let's, let's take a look
at

276
00:19:06,268 --> 00:19:11,030
the case here, that you have the data in
shared state.

277
00:19:11,030 --> 00:19:15,110
But some other processor tries to do a
read.

278
00:19:15,110 --> 00:19:17,510
So they shot on to the bus, I want to do

279
00:19:17,510 --> 00:19:20,800
a read of the address that is in the
shared state.

280
00:19:22,810 --> 00:19:26,720
Well, if we go look at the state diagram
here, that's okay.

281
00:19:26,720 --> 00:19:27,810
It's shared.

282
00:19:27,810 --> 00:19:30,294
We can have a read only copy and someone
else

283
00:19:30,294 --> 00:19:32,990
can also have a read only copy at the same
time.

284
00:19:35,690 --> 00:19:39,620
So, we just transition from S to S, we
stay in the S state.

285
00:19:40,840 --> 00:19:42,345
We don't have to do anything when someone

286
00:19:42,345 --> 00:19:44,330
else reads out, some other processor reads
the data.

287
00:19:46,640 --> 00:19:51,476
Now, let's say we're in the shared state
and some other processor yells

288
00:19:51,476 --> 00:19:55,710
out onto the bus, saying I have an intent
to do a write.

289
00:19:55,710 --> 00:19:59,344
I want to do a write to address that you
have in your cache

290
00:19:59,344 --> 00:20:04,030
in the shared state or processor one has
the cache in the shared state.

291
00:20:05,310 --> 00:20:11,748
Well, we said that stale data causes the
incoherence of, of data and

292
00:20:11,748 --> 00:20:17,344
that's what we were trying to prevent.
So, if you see some

293
00:20:17,344 --> 00:20:23,670
other processor trying to do a write, you
just drop the data from your cache.

294
00:20:23,670 --> 00:20:27,500
Now, because you'll only have a read-only
copy, you don't have to write this back.

295
00:20:27,500 --> 00:20:29,640
You don't even have to tell anybody about
this.

296
00:20:29,640 --> 00:20:32,840
You just have to guarantee when they say,
I want

297
00:20:32,840 --> 00:20:36,050
to do it right, that you transition from S
to I.

298
00:20:37,070 --> 00:20:39,632
Now what this means, though, is if
processor one

299
00:20:39,632 --> 00:20:41,706
wants to go read that data in the future,

300
00:20:41,706 --> 00:20:43,414
it's going to have to get a copy of it

301
00:20:43,414 --> 00:20:45,699
again, because the copy it had is now
invalid.

302
00:20:48,410 --> 00:20:49,382
Okay.

303
00:20:49,382 --> 00:20:53,594
let's say you have the data in the shared
state

304
00:20:53,594 --> 00:20:57,410
but now you want to do a write to that
data.

305
00:21:00,400 --> 00:21:02,710
Can you write to it while it's in the
shared state?

306
00:21:04,100 --> 00:21:05,460
No.

307
00:21:05,460 --> 00:21:06,085
Why?

308
00:21:06,085 --> 00:21:07,470
because someone else may have a copy of
it.

309
00:21:07,470 --> 00:21:10,782
But you're in the shared state, that means
it's shared

310
00:21:10,782 --> 00:21:13,940
and someone else could have a stale copy
of that data.

311
00:21:15,100 --> 00:21:18,412
So you go to do a, a write, what you have
to do before you

312
00:21:18,412 --> 00:21:20,572
transition your state is you need to

313
00:21:20,572 --> 00:21:23,370
broadcast your intent to write to that
line.

314
00:21:23,370 --> 00:21:25,750
Or processor one has to broadcast

315
00:21:25,750 --> 00:21:29,720
onto the Snoopy bus that it wants to write
to that line.

316
00:21:30,840 --> 00:21:31,090
Okay.

317
00:21:31,090 --> 00:21:33,816
So it says, I want to do a write, I want
to do a write!

318
00:21:33,816 --> 00:21:36,516
And before transitions it needs to wait or
give all the

319
00:21:36,516 --> 00:21:41,120
other processors time to invalidate and
potentially write back the data.

320
00:21:41,120 --> 00:21:43,460
And different processors implement this in
different ways.

321
00:21:43,460 --> 00:21:46,580
Some processors just wait a certain amount
of time.

322
00:21:46,580 --> 00:21:49,850
There's a waiting period so you broadcast
saying.

323
00:21:49,850 --> 00:21:50,786
I have an intent

324
00:21:50,786 --> 00:21:53,954
to write, but I'm not going to do the
write until I

325
00:21:53,954 --> 00:21:57,940
some period of time in the future, it's
like a waiting period.

326
00:21:57,940 --> 00:22:03,032
That gives enough time for the imposters
to snoop the transaction, invalidate

327
00:22:03,032 --> 00:22:07,644
the data, and write it back to main
memory, if they have copied.

328
00:22:07,644 --> 00:22:11,340
Or, let's say everyone else only has
read-only copies of the data.

329
00:22:14,380 --> 00:22:17,188
They need a transition from S to I.

330
00:22:17,188 --> 00:22:18,781
So I want to point out here that,

331
00:22:18,781 --> 00:22:21,318
you know, all different, the different
caches in

332
00:22:21,318 --> 00:22:24,920
the system don't have to have the data in
the same state at the same time.

333
00:22:24,920 --> 00:22:27,880
This is a per-cache state.

334
00:22:27,880 --> 00:22:30,760
So, when you have an intent to write, you
broadcast this

335
00:22:30,760 --> 00:22:34,920
information, and everyone else has to
invalidate it from their cache.

336
00:22:34,920 --> 00:22:37,028
So whether they have it in S or I, they

337
00:22:37,028 --> 00:22:39,408
need to somehow or, or S or M, they need
to

338
00:22:39,408 --> 00:22:40,440
transition to I.

339
00:22:41,730 --> 00:22:46,700
And only then can you transition to M and
actually do it right.

340
00:22:46,700 --> 00:22:48,230
Now, I said there's other ways that you

341
00:22:48,230 --> 00:22:50,992
can go about doing this besides a waiting
period.

342
00:22:50,992 --> 00:22:54,944
one potential thing that people do is
they'll actually.

343
00:22:54,944 --> 00:22:57,784
broadcast their intents to write but
instead of

344
00:22:57,784 --> 00:23:02,400
actually doing the write instead of having
waiting period.

345
00:23:02,400 --> 00:23:04,650
They wait for an acknowledgement from all
of the other

346
00:23:04,650 --> 00:23:11,008
caches somehow.
that, that's functionally the same.

347
00:23:11,008 --> 00:23:14,284
you can either wait and sort of have a
proof knowing that you've waited long

348
00:23:14,284 --> 00:23:18,270
enough or you can wait until everyone has
responded back saying, yep I'm, I'm done.

349
00:23:21,380 --> 00:23:22,970
Okay.
So what's this modified state?

350
00:23:22,970 --> 00:23:24,140
What does this give you?

351
00:23:24,140 --> 00:23:26,641
Well, if you're in the modified state, you
can

352
00:23:26,641 --> 00:23:29,650
write the data And you can read the data.

353
00:23:29,650 --> 00:23:33,660
So, let's say processor 1 does a read or
write.

354
00:23:33,660 --> 00:23:35,130
We just stay in the modified state.

355
00:23:35,130 --> 00:23:37,260
And you can do as many reads and writes as
you want.

356
00:23:37,260 --> 00:23:39,235
And you never have to communicate on the
bus.

357
00:23:39,235 --> 00:23:40,030
You have the sole copy.

358
00:23:40,030 --> 00:23:46,390
You have the, the, the token if you will.
You can, you can read it,

359
00:23:46,390 --> 00:23:49,210
you can write it, no one, you're
guaranteed that no one else has a copy.

360
00:23:52,170 --> 00:23:52,630
Okay.

361
00:23:52,630 --> 00:23:55,480
Let's say you're in the m state and

362
00:23:55,480 --> 00:24:00,110
some other processor broadcasts their
intent to write.

363
00:24:02,430 --> 00:24:05,750
You're in the modified state, you have
dirty data.

364
00:24:05,750 --> 00:24:07,420
You've done a write to this data.

365
00:24:07,420 --> 00:24:09,530
Main memory is out of date.

366
00:24:09,530 --> 00:24:13,999
So, the most basic thing you can do is
invalidate

367
00:24:13,999 --> 00:24:18,530
your copy and give that copy back to main
memory.

368
00:24:18,530 --> 00:24:20,760
You'll write back that date to main
memory.

369
00:24:20,760 --> 00:24:24,360
And then the other processor will go and
read that data

370
00:24:24,360 --> 00:24:28,190
from main memory and pull it in in their
modified state.

371
00:24:28,190 --> 00:24:30,450
So you will transition from M to I.

372
00:24:30,450 --> 00:24:32,570
And they will transition, let's say from I
to M.

373
00:24:35,400 --> 00:24:39,970
Okay.
one more arc in this diagram here.

374
00:24:39,970 --> 00:24:45,026
What happens if you're in the modified
state and some other processor tries

375
00:24:45,026 --> 00:24:48,910
to do a read or broadcast an intent to
read across the bus?

376
00:24:49,980 --> 00:24:52,828
Well, what's nice here is you don't

377
00:24:52,828 --> 00:24:57,144
actually have to transition to the
invalidation state.

378
00:24:57,144 --> 00:24:59,920
And the reason is you have the most
up-to-date copy of the data.

379
00:25:01,110 --> 00:25:03,762
They need to see the most up-to-date copy
of the data so that

380
00:25:03,762 --> 00:25:07,270
they don't get any stale data and you
don't have any stale data.

381
00:25:07,270 --> 00:25:12,680
But you don't have to transition down to
the invalidation state.

382
00:25:12,680 --> 00:25:15,639
You, you, you potentially could try to go
down to the state.

383
00:25:15,639 --> 00:25:19,019
But it's going to, require you to actually
go pull the data

384
00:25:19,019 --> 00:25:21,960
back in if you if you wanted to read in
the future.

385
00:25:21,960 --> 00:25:26,140
So instead, in the MSI protocol, if
another

386
00:25:26,140 --> 00:25:31,560
processor tries to do a read, you do the
write back first.

387
00:25:31,560 --> 00:25:35,000
After you've done the write back, you
transition to the shared state.

388
00:25:35,000 --> 00:25:38,498
And they pull it in in a read-only copy
and you have a read-only

389
00:25:38,498 --> 00:25:42,550
copy so you both will transition to the
shared state or the S state here.

390
00:25:49,260 --> 00:25:53,946
Okay.
So, let's look at this in a little bit

391
00:25:53,946 --> 00:25:58,804
more detail here.
yes, one more, one more arc

392
00:25:58,804 --> 00:26:03,644
I missed.
Write miss so if you're in the

393
00:26:03,644 --> 00:26:09,138
modified state or sorry, you're in the
invalid

394
00:26:09,138 --> 00:26:14,378
state.
And you want to do it right

395
00:26:14,378 --> 00:26:15,530
to align.

396
00:26:18,150 --> 00:26:20,058
We didn't draw this arc but there's an arc

397
00:26:20,058 --> 00:26:22,300
that goes from invalid all the way around
to here.

398
00:26:23,490 --> 00:26:27,108
And on this write miss, you broadcast your
intent to write, you

399
00:26:27,108 --> 00:26:31,430
have to wait for everyone to do their
invalidations up to main memory.

400
00:26:31,430 --> 00:26:33,839
And in the base case you pull that data in

401
00:26:33,839 --> 00:26:37,320
from main memory and you enter into the
modified state.

402
00:26:39,370 --> 00:26:39,730
Okay.

403
00:26:39,730 --> 00:26:42,740
So, let's take a look at a basic two
processor example here.

404
00:26:42,740 --> 00:26:46,580
Walking through a set of reads and a set
of writes.

405
00:26:46,580 --> 00:26:49,450
And watch everything that happens here.

406
00:26:49,450 --> 00:26:52,810
So we're going to have processor 1 on the
top, processor 2 on the bottom.

407
00:26:52,810 --> 00:26:56,970
And we're going to walk through different
arc here as processor 1 does read and

408
00:26:56,970 --> 00:27:01,330
processor 2 does read and processor 1 does
write and processor 2 does write.

409
00:27:02,400 --> 00:27:04,636
So, let's say the first thing

410
00:27:04,636 --> 00:27:08,280
that happen is processor 1 does a read.

411
00:27:08,280 --> 00:27:10,744
This is all to the same address or all to

412
00:27:10,744 --> 00:27:14,600
the same cache line or addresses within a
cache line.

413
00:27:14,600 --> 00:27:15,570
So processor 1 does a read.

414
00:27:16,950 --> 00:27:20,180
We'll assume that all the data starts out
invalid in all the caches.

415
00:27:20,180 --> 00:27:22,271
So usually when you reset your computer
system,

416
00:27:22,271 --> 00:27:24,760
everyone's caches are invalid, how could
they not be?

417
00:27:27,210 --> 00:27:30,030
So you're going to do a read miss, it's
going to go up to main

418
00:27:30,030 --> 00:27:34,640
memory and you're going to bring the data
into the shared state in processor one.

419
00:27:34,640 --> 00:27:37,370
Processor 2 is going to leave this data
invalid for right now.

420
00:27:39,170 --> 00:27:39,910
Okay.

421
00:27:39,910 --> 00:27:45,660
Now processor one does it right.
Well, it broadcasts its intent to write

422
00:27:45,660 --> 00:27:50,230
Processor 2 doesn't have this data and no
one else in the system has the data.

423
00:27:50,230 --> 00:27:52,393
So no one, no one screams

424
00:27:52,393 --> 00:27:56,750
back I have the data on the bus.
please wait.

425
00:27:56,750 --> 00:28:01,350
So instead processor one just goes ahead
and transitions into the modified state.

426
00:28:01,350 --> 00:28:06,020
And it, in this case her it even has the
most up to date date.

427
00:28:06,020 --> 00:28:09,590
So, it won't even have to go out to main
memory potential.

428
00:28:12,310 --> 00:28:18,218
Okay.
Then processor 1 does let's say Processor

429
00:28:18,218 --> 00:28:23,810
1 can do reason writes past this point.
now processor 2 does a read.

430
00:28:25,290 --> 00:28:26,778
Well, if processor 2 does a read now

431
00:28:26,778 --> 00:28:29,568
we're going to have two different state
transitions happening.

432
00:28:29,568 --> 00:28:31,540
We're going to see processor one do a
state

433
00:28:31,540 --> 00:28:34,378
transition and processor two do a state
transition.

434
00:28:34,378 --> 00:28:37,482
So processor 2 is in the invalid state

435
00:28:37,482 --> 00:28:39,392
right now.

436
00:28:39,392 --> 00:28:43,329
It's going to transition to the shared
state on the read miss.

437
00:28:46,100 --> 00:28:52,870
At the same time because processor 1 has
it in the modified state.

438
00:28:52,870 --> 00:29:00,070
It's going to have to in write back the
data and transition to the shared state.

439
00:29:00,070 --> 00:29:02,930
So, both these processors are now in the
shared state, so they both have read

440
00:29:02,930 --> 00:29:03,326
[INAUDIBLE]

441
00:29:03,326 --> 00:29:03,730
copies.

442
00:29:03,730 --> 00:29:11,704
And the most up to date copy has been
pushed back or ridden back.

443
00:29:11,704 --> 00:29:12,890
Okay.

444
00:29:12,890 --> 00:29:15,910
So now let's say processor 2 does it
right.

445
00:29:17,840 --> 00:29:20,910
It has in the shared state, this has the
shared state.

446
00:29:20,910 --> 00:29:28,462
Well, when you go to do this right, the
we're going to see an intent

447
00:29:28,462 --> 00:29:32,480
to write coming out of processor 2.

448
00:29:32,480 --> 00:29:34,928
And this is actually going to cause
processor

449
00:29:34,928 --> 00:29:37,580
1 here to transition from shared to
invalid.

450
00:29:38,720 --> 00:29:41,960
So, you know, this is kind of odd here,
processor 2

451
00:29:41,960 --> 00:29:46,940
is doing something that's causing
processor 1 states machine to transition.

452
00:29:46,940 --> 00:29:50,436
And it's because they're on the shared bus
processor

453
00:29:50,436 --> 00:29:53,120
2 shouts, I'm going to do a write soon.

454
00:29:54,670 --> 00:30:01,196
Processor 1 sees this and it transitions.
Once you know that processor

455
00:30:01,196 --> 00:30:07,530
1 has finished the transition, processor 2
now can transition to the modified state.

456
00:30:07,530 --> 00:30:12,205
And actually continue to do reads or
writes to that line.

457
00:30:12,205 --> 00:30:14,340
Okay.

458
00:30:14,340 --> 00:30:16,886
So, let's say processor 1 wants to do a
read

459
00:30:16,886 --> 00:30:19,767
to this line, this line's getting a lot of
traffic

460
00:30:19,767 --> 00:30:20,670
to it.

461
00:30:20,670 --> 00:30:23,750
Processor 2 just wrote it and it's in the
modified state.

462
00:30:24,840 --> 00:30:29,115
Processor 1's in the invalid state, all of
a sudden what's going to

463
00:30:29,115 --> 00:30:34,620
happen is, it's going to want to try to
transition to the shared state.

464
00:30:34,620 --> 00:30:36,650
But this modified state is going to cause
a problem.

465
00:30:36,650 --> 00:30:45,230
So we're going to have to wait for the,
we're going to have to wait

466
00:30:45,230 --> 00:30:49,663
for the right back to a curve here from

467
00:30:49,663 --> 00:30:54,934
modified to shared.
So it's going to, invalidate that data

468
00:30:54,934 --> 00:31:00,269
and then finally, processor 1 is going to
be able to bring into the shared state.

469
00:31:02,510 --> 00:31:02,880
Okay.

470
00:31:02,880 --> 00:31:05,360
So let's say processor one now does a
read.

471
00:31:05,360 --> 00:31:09,790
So, it's in the shared state.
This is in the shared state.

472
00:31:13,160 --> 00:31:17,725
We see the intent to write coming out of
processor 1.

473
00:31:17,725 --> 00:31:23,289
So processor 2 is going to transition from
shared to invalid and

474
00:31:23,289 --> 00:31:29,270
only then are you allowed to transition
from shared to modified.

475
00:31:29,270 --> 00:31:33,098
And then as we get close to the end here
let's say

476
00:31:33,098 --> 00:31:38,290
processor 2 tries to do a write to round
everything out here.

477
00:31:38,290 --> 00:31:40,946
So we have processor 1 in the modified
state.

478
00:31:40,946 --> 00:31:44,426
Processor 2 is invalid and it's going to
try to go

479
00:31:44,426 --> 00:31:49,432
this way for the arc will probably come
like this.

480
00:31:49,432 --> 00:31:55,450
at the same time processor 1 is going to
have to invalidate this data, somehow.

481
00:31:55,450 --> 00:31:58,476
So we see the, the right try to happen and

482
00:31:58,476 --> 00:32:03,820
what's going to happen is, when it sees
the intent to write.

483
00:32:03,820 --> 00:32:06,238
Processor 1's going to have to transition
from

484
00:32:06,238 --> 00:32:08,780
modify to invalidate and write back that
data.

485
00:32:08,780 --> 00:32:13,180
And only then can the rightness occur and
processor 2 bring

486
00:32:13,180 --> 00:32:17,910
it into the write state or the modified
state, excuse me.

487
00:32:19,490 --> 00:32:24,260
And finally, processor 1 can try to do a
write.

488
00:32:25,810 --> 00:32:28,928
Now, why, why are we doing this?
Well, it's to get this last

489
00:32:28,928 --> 00:32:33,278
arc in here so we want to transition over
all the arcs in this diagram.

490
00:32:33,278 --> 00:32:36,630
So processor 1 tries to do a write to this
address.

491
00:32:36,630 --> 00:32:37,790
It has it invalid.

492
00:32:37,790 --> 00:32:40,706
Processor 2 has it in modified and you'll

493
00:32:40,706 --> 00:32:44,430
see that it will broadcast its intent to
write.

494
00:32:44,430 --> 00:32:46,590
And while it's doing this broadcast it
needs to

495
00:32:46,590 --> 00:32:49,630
move from modified to invalidate, write
back the data.

496
00:32:49,630 --> 00:32:51,250
And then processor 1

497
00:32:51,250 --> 00:32:51,700
[COUGH]

498
00:32:51,700 --> 00:32:56,350
brings in the background data and can do a
write to that line.

499
00:33:04,700 --> 00:33:07,560
Okay.
So a little bit of an observation here.

500
00:33:07,560 --> 00:33:11,735
So we've gone through the basic MSI
protocol.

501
00:33:11,735 --> 00:33:17,510
But one of the big ideas here is that

502
00:33:17,510 --> 00:33:23,783
If a line is in the modified state in one
cache, there can only

503
00:33:23,783 --> 00:33:29,977
be one copy of that data.
And this is the important

504
00:33:29,977 --> 00:33:34,633
invariant that allows the MSI Snoopy based
invalidation

505
00:33:34,633 --> 00:33:39,752
protocol for work.
So this allows memory to stay

506
00:33:39,752 --> 00:33:44,944
coherent because there's only one copy of
writable

507
00:33:44,944 --> 00:33:49,550
data anywhere.
And if you, sort of, work through this and

508
00:33:49,550 --> 00:33:55,950
I recommend you do work through this this
case a little bit more carefully.

509
00:33:55,950 --> 00:34:00,390
You'll see that you might have multiple
people in the shared state.

510
00:34:00,390 --> 00:34:04,683
Or multiple processors, multiple
processors caches that can

511
00:34:04,683 --> 00:34:08,810
share data but those are only read-only
copies.

512
00:34:08,810 --> 00:34:10,550
So they can read it, they can read it,
they can read it.

513
00:34:10,550 --> 00:34:16,070
Well, when someone tries to do a write,
you invalidate all of the shares first.

514
00:34:17,300 --> 00:34:21,084
Then you can do the write.
And if someone else tries to go read the

515
00:34:21,084 --> 00:34:25,020
data, you need to transition from modified
to shared.

516
00:34:25,020 --> 00:34:25,650
Now, why do we do that?

517
00:34:25,650 --> 00:34:29,820
Why can't we leave someone in modified and
someone else in shared?

518
00:34:29,820 --> 00:34:32,632
Well, it's because by virtue of being in
the

519
00:34:32,632 --> 00:34:36,970
modified state, that processor can
continue to do further writes.

520
00:34:38,380 --> 00:34:42,162
So if you continue further right, the
cache, which brought it in let's

521
00:34:42,162 --> 00:34:46,220
say the shared state, the read state
wouldn't be able to see those updates.

522
00:34:46,220 --> 00:34:49,273
So we guarantee the invariant for this
whole system

523
00:34:49,273 --> 00:34:51,870
is that if a line is in the modified
state.

524
00:34:51,870 --> 00:34:54,915
Then no other cache has a copy of the data
or

525
00:34:54,915 --> 00:34:58,720
no other entity in the system has a copy
of the data.

526
00:35:00,590 --> 00:35:01,500
That's really important.

527
00:35:04,410 --> 00:35:09,310
Okay.
So let's try to enhance this MSI protocol.

528
00:35:12,370 --> 00:35:14,230
Why do we want to enhance the MSI
protocol?

529
00:35:14,230 --> 00:35:17,496
Well, as you might imagine MSI protocol
actually

530
00:35:17,496 --> 00:35:21,090
you have to communicate across the buzz
relatively often.

531
00:35:21,090 --> 00:35:23,538
We're going to look at a couple from
scheme

532
00:35:23,538 --> 00:35:27,400
that reduce the communication traffic
across the bus.

533
00:35:27,400 --> 00:35:31,640
So you can either have a, a bus that
clocks at a slower speed.

534
00:35:31,640 --> 00:35:34,715
Or you could have more processors on this
bus

535
00:35:34,715 --> 00:35:38,015
as you think about better ways to actually
implement

536
00:35:38,015 --> 00:35:39,950
a validation base protocol.

537
00:35:41,620 --> 00:35:47,600
And the first one we're going to look at
is called MESI.

538
00:35:47,600 --> 00:35:51,500
And sometimes people call this the
Illinois protocol because

539
00:35:51,500 --> 00:35:55,950
this, as, comes from University of
Illinois Jack Patel.

540
00:35:57,140 --> 00:36:00,428
His research group.
And this dates back to, to 1984.

541
00:36:00,428 --> 00:36:03,508
And there's actually, four states so
instead

542
00:36:03,508 --> 00:36:05,818
of having three states in the, as we

543
00:36:05,818 --> 00:36:09,650
saw inside the MSI protocol, the insight
here was.

544
00:36:09,650 --> 00:36:13,565
If we're using two steep bits to implement
three

545
00:36:13,565 --> 00:36:17,670
states well, we have a fourth state we
could use.

546
00:36:17,670 --> 00:36:20,750
You know, two to the two is four so we
could even do four states.

547
00:36:20,750 --> 00:36:22,880
Instead of three states, for the same
number of bets.

548
00:36:24,050 --> 00:36:27,230
Now, the question is, what states do you
want to add, you know?

549
00:36:27,230 --> 00:36:28,850
Just because you have four states doesn't

550
00:36:28,850 --> 00:36:30,890
mean your protocol is going to be better.

551
00:36:30,890 --> 00:36:38,070
Well, we're going to have an invalid
state, a shared state.

552
00:36:38,070 --> 00:36:38,270
Okay.

553
00:36:38,270 --> 00:36:43,734
Those look similar as what we had before.
And now, we're going

554
00:36:43,734 --> 00:36:48,928
to have a new state called the exclusive
state or exclusive

555
00:36:48,928 --> 00:36:54,329
but unmodified.
And this is very similar to modified and

556
00:36:54,329 --> 00:36:59,744
exclusive state, the M state.
And in fact, the

557
00:36:59,744 --> 00:37:05,294
insight here is we're going to take the
state diagram from MSI

558
00:37:05,294 --> 00:37:10,658
and take the modified state and split it
into two states.

559
00:37:10,658 --> 00:37:13,770
M and E.

560
00:37:13,770 --> 00:37:19,260
Well, modified and exclusive.
Now why do we want to do this?

561
00:37:19,260 --> 00:37:24,306
Well, what we're going to see is that
we're going to be able to reduce the

562
00:37:24,306 --> 00:37:30,090
communication on the bus, in the case
where a processor reads a piece of data.

563
00:37:30,090 --> 00:37:34,270
And no one else reads or rights that data.

564
00:37:34,270 --> 00:37:37,620
And then that same processor later goes to
write that data.

565
00:37:39,010 --> 00:37:41,272
So if you look at MSI when you go into

566
00:37:41,272 --> 00:37:44,800
a read, you always brought it into the
shared state.

567
00:37:46,420 --> 00:37:50,161
In MESI or MESI protocol, when you go to
do a read but

568
00:37:50,161 --> 00:37:56,180
no one else has a copy, you're going to
bring it into the exclusive state.

569
00:37:56,180 --> 00:37:58,871
And this allows you to, you're not allowed
to

570
00:37:58,871 --> 00:38:01,830
do a write when it's in the exclusive
state.

571
00:38:01,830 --> 00:38:04,503
But, you can upgrade from exclusive to

572
00:38:04,503 --> 00:38:08,421
modified without having to communicate
across the bus.

573
00:38:08,421 --> 00:38:08,883
But, you can upgrade from exclusive to

574
00:38:08,883 --> 00:38:09,630
modified without having to communicate
across the bus.

575
00:38:09,630 --> 00:38:12,000
And this is an important difference

576
00:38:12,000 --> 00:38:14,844
than in the MSI protocol because here in
the

577
00:38:14,844 --> 00:38:17,925
MSI protocol if you had something in the
shared

578
00:38:17,925 --> 00:38:20,084
state And you want to go to a write to

579
00:38:20,084 --> 00:38:22,250
it, you need to broadcast your intent to
write.

580
00:38:22,250 --> 00:38:26,080
And everyone needs to have a Snoop
transaction occur on this.

581
00:38:26,080 --> 00:38:30,540
But in the MESI protocol, you don't need
that extra work there.

582
00:38:30,540 --> 00:38:34,091
You know that you have the exclusive read
only copy of the data so

583
00:38:34,091 --> 00:38:37,106
when you go to do a right you can, you can
just go do

584
00:38:37,106 --> 00:38:39,570
the rights.
And we're going to go walk through this.

585
00:38:39,570 --> 00:38:40,880
the state diagram now.

586
00:38:40,880 --> 00:38:44,840
And it's going to look pretty similar to
the MSI protocol.

587
00:38:44,840 --> 00:38:51,130
And all of the arcs in this diagram of are
in relationship to processor 1.

588
00:38:51,130 --> 00:38:54,028
And like I said before, different
processors can

589
00:38:54,028 --> 00:38:56,730
have the same cache line in different
states.

590
00:38:57,830 --> 00:38:58,250
Okay.

591
00:38:58,250 --> 00:39:02,240
So, let's take a look at this.
Step one, you have a read miss.

592
00:39:03,650 --> 00:39:13,440
It's a, a, a read miss but note I have a
little word next to here.

593
00:39:13,440 --> 00:39:17,320
So this is going from invalidate out of
read miss.

594
00:39:17,320 --> 00:39:18,918
And we're going to see that there's
going to

595
00:39:18,918 --> 00:39:20,880
be two arcs coming out of invalidate.

596
00:39:20,880 --> 00:39:21,510
On a read miss.

597
00:39:21,510 --> 00:39:23,398
One that goes into the shared state and

598
00:39:23,398 --> 00:39:26,110
that's actually going to go into the
exclusive state.

599
00:39:28,120 --> 00:39:29,635
Now, what does this mean?

600
00:39:29,635 --> 00:39:34,193
Well, on a read miss, processor 1 here is
going to

601
00:39:34,193 --> 00:39:38,070
shout out onto the bus, I want to do a
read.

602
00:39:41,120 --> 00:39:46,193
On the bus, if someone responds back
saying, I have a read or I have a

603
00:39:46,193 --> 00:39:51,692
shared copy of that data or I have a copy
of that data, which is read only.

604
00:39:51,692 --> 00:39:55,517
What's going to happen is your gong to
transition to the shared

605
00:39:55,517 --> 00:40:00,010
state here, because you don't want to
invalidate the other caches data.

606
00:40:01,080 --> 00:40:03,816
And this is actually going to be in
contrast to,

607
00:40:03,816 --> 00:40:06,264
if you tried to read miss, but no one else

608
00:40:06,264 --> 00:40:07,630
has a copy right now.

609
00:40:09,030 --> 00:40:10,440
Because if no one has a copy, we're

610
00:40:10,440 --> 00:40:12,940
actually going to transition straight to
this exclusive state.

611
00:40:14,930 --> 00:40:16,030
we'll get there in a second.

612
00:40:16,030 --> 00:40:20,618
So, let's say some other processor tries
to do a read and you have it in the

613
00:40:20,618 --> 00:40:25,190
shared state.
Well, you just stay in the shared state.

614
00:40:26,550 --> 00:40:31,344
But you might need to respond back saying
that yes you you

615
00:40:31,344 --> 00:40:33,700
have a copy of that that data.

616
00:40:34,784 --> 00:40:37,742
this is, this is important so that the
other processor knows where to

617
00:40:37,742 --> 00:40:41,208
enter into the shared state or exclusive
state when the're going to do read miss.

618
00:40:44,020 --> 00:40:44,952
Okay.

619
00:40:44,952 --> 00:40:48,486
let's say you're in the shared state and
you

620
00:40:48,486 --> 00:40:51,950
want to do a write to this piece of data.

621
00:40:51,950 --> 00:40:56,710
Well, this looks similar, the exact same
thing as what happens in the MSI protocol.

622
00:40:56,710 --> 00:41:00,490
You try to do the write, you broadcast
your intent to write, you

623
00:41:00,490 --> 00:41:04,200
have to give everyone a chance to either
respond to say that they

624
00:41:06,270 --> 00:41:09,460
Have a copy and they have to invalidate
their copies and write back

625
00:41:09,460 --> 00:41:13,360
to main memory and then after that you
transposition from the modified state.

626
00:41:13,360 --> 00:41:14,400
And now you can do your write.

627
00:41:16,160 --> 00:41:20,156
And just like as before, if you're in the
modified state you can

628
00:41:20,156 --> 00:41:24,180
do a read or write to the line and you
stay in the modified state.

629
00:41:25,770 --> 00:41:31,386
Okay, so finally now we come to this
interesting case that you,

630
00:41:31,386 --> 00:41:35,943
you read miss.
But instead of the data being already

631
00:41:35,943 --> 00:41:41,820
shared in other caches or already being in
somebody else' s cache.

632
00:41:41,820 --> 00:41:44,669
You do a remiss, you're in invalid you
come

633
00:41:44,669 --> 00:41:48,460
into here, but the date is not shared
anywhere.

634
00:41:48,460 --> 00:41:50,360
So the data's not shared.

635
00:41:50,360 --> 00:41:51,080
Well, what does this mean?

636
00:41:51,080 --> 00:41:54,590
Well, you do the read miss here and you
can come into the exclusive state.

637
00:41:56,270 --> 00:41:56,486
Now why

638
00:41:56,486 --> 00:41:56,930
is this good?

639
00:41:56,930 --> 00:42:00,760
Well, you have the only readable copy in
the system.

640
00:42:00,760 --> 00:42:03,103
And why this is useful is if you try to do
a

641
00:42:03,103 --> 00:42:07,640
write, you don't have to go communicate at
on the bus anymore.

642
00:42:07,640 --> 00:42:10,492
So you're in the exclusive state, let's
say proster

643
00:42:10,492 --> 00:42:13,140
one tries to read, that's fine, you stay
here.

644
00:42:14,400 --> 00:42:18,600
Now, as, before, sort of, the exclusive
state

645
00:42:18,600 --> 00:42:22,510
functionally looks very similar to the
shared state.

646
00:42:22,510 --> 00:42:25,180
You only have a read copy, readable copy
of this data.

647
00:42:25,180 --> 00:42:28,690
If you try to do a write, you need to do
something.

648
00:42:28,690 --> 00:42:32,806
But this is the big difference here, when
you go to

649
00:42:32,806 --> 00:42:37,540
do this write, you don't have to tell
anybody about it.

650
00:42:39,020 --> 00:42:41,414
As far as everyone else is concerned, you
had

651
00:42:41,414 --> 00:42:44,730
an exclusive copy and now you have a
modified copy.

652
00:42:44,730 --> 00:42:46,930
You just transitioned your state diagram.

653
00:42:46,930 --> 00:42:47,830
You transitioned

654
00:42:47,830 --> 00:42:51,260
the state bits here to being in the
modified state and do the write.

655
00:42:53,360 --> 00:42:54,870
So, you transition and then do the write.

656
00:42:57,050 --> 00:42:58,080
Now, why is this good?

657
00:42:58,080 --> 00:43:02,360
As I said, you don't have to communicate
on the bus to do this transition.

658
00:43:02,360 --> 00:43:05,910
Versus, if you were in the MSI protocol
where you only have shared.

659
00:43:05,910 --> 00:43:07,389
You need to go to first broadcaster

660
00:43:07,389 --> 00:43:09,500
or intent to write, which takes bus
bandwidth.

661
00:43:09,500 --> 00:43:11,210
It also takes time.

662
00:43:11,210 --> 00:43:12,390
And then new transition.

663
00:43:12,390 --> 00:43:15,630
So this is trying to save us in this one
case here.

664
00:43:16,715 --> 00:43:18,310
okay.

665
00:43:18,310 --> 00:43:20,130
Let's fill in the rest of the arcs here.

666
00:43:20,130 --> 00:43:22,262
The rest of the arcs look somewhat similar
except

667
00:43:22,262 --> 00:43:25,255
when we sort of come out of the exclusive
state and you'll see why.

668
00:43:25,255 --> 00:43:31,320
The write-miss case from invalid.

669
00:43:31,320 --> 00:43:32,690
This looks the same as MSI.

670
00:43:32,690 --> 00:43:34,330
You take a write miss here.

671
00:43:34,330 --> 00:43:39,042
You're going to broadcast your intent to
write, wait for everyone to write back

672
00:43:39,042 --> 00:43:42,630
all the data and validate and then you in
turn to the modified state.

673
00:43:42,630 --> 00:43:43,265
[COUGH]

674
00:43:43,265 --> 00:43:49,370
Now, here's the little bit more
interesting art here.

675
00:43:50,390 --> 00:43:53,440
Let's say processor 1 has the data in the
exclusive state.

676
00:43:54,520 --> 00:43:56,740
And then some other processor wants to do
a read.

677
00:43:58,178 --> 00:44:01,978
Well, processor 1 needs to say I, I have a
copy of that so,

678
00:44:01,978 --> 00:44:07,528
the other processor can't enter into the
exclusive state when it tries to read.

679
00:44:07,528 --> 00:44:08,378
Instead it's going to

680
00:44:08,378 --> 00:44:10,700
come in this arc here with some shared
data.

681
00:44:10,700 --> 00:44:14,804
Into the shared state but at the same
time, processor 1 needs to

682
00:44:14,804 --> 00:44:20,180
transition from being in the exclusive
state to being in the shared state.

683
00:44:20,180 --> 00:44:20,890
And why is this?

684
00:44:20,890 --> 00:44:25,197
Well, if some other processor has it in
the shared state and you were to

685
00:44:25,197 --> 00:44:28,409
stay in the exclusive state, you could
potentially

686
00:44:28,409 --> 00:44:30,980
try to do a write to that data.

687
00:44:30,980 --> 00:44:33,659
While the other person has a read-only
copy of that data and you'll

688
00:44:33,659 --> 00:44:35,870
get the data out of sync and you'll have
stale values.

689
00:44:37,600 --> 00:44:41,640
So you need to, you need to somehow fix
that.

690
00:44:41,640 --> 00:44:44,970
And how we fix that is when someone else
tries to do a write.

691
00:44:44,970 --> 00:44:46,994
Or excuse me, when someone else tries to
do a read to the

692
00:44:46,994 --> 00:44:50,600
data and another processor tries to read,
we transition from exclusive to shared.

693
00:44:50,600 --> 00:44:54,990
And the other processor brings the data in
as Shared into their cache.

694
00:44:56,770 --> 00:44:57,200
Okay.

695
00:44:57,200 --> 00:44:58,712
Let's flush out the rest of

696
00:44:58,712 --> 00:45:00,598
the, the arch's here.

697
00:45:00,598 --> 00:45:04,510
this is similar to what we see before here
if you're in the shared state.

698
00:45:04,510 --> 00:45:07,110
Some you see in other processors intent to
right

699
00:45:07,110 --> 00:45:10,336
go across the bus for the line that you
have.

700
00:45:10,336 --> 00:45:13,430
You just have to invalidate it.

701
00:45:13,430 --> 00:45:15,868
Thankfully you don't have to write
anything back here

702
00:45:15,868 --> 00:45:19,000
because you are only have read a link
copy.

703
00:45:19,000 --> 00:45:21,650
Similar, sort of, thing here for the
exclusive state.

704
00:45:21,650 --> 00:45:24,040
You're in the exclusive state.
You only have a read a link copy.

705
00:45:24,040 --> 00:45:27,230
You know you have the only copy in the
system, though.

706
00:45:27,230 --> 00:45:31,040
But if some other processor tries to do a
write.

707
00:45:31,040 --> 00:45:32,420
You need to do an invalid.

708
00:45:34,630 --> 00:45:35,459
Okay.

709
00:45:35,459 --> 00:45:40,310
let's see, is there anything else There
are few more arcs.

710
00:45:40,310 --> 00:45:44,934
Modified this looks similar to what we saw
before, if you are in the modified

711
00:45:44,934 --> 00:45:47,450
state, then a another processor tries to
do

712
00:45:47,450 --> 00:45:50,180
a read, you transition into the shared
state.

713
00:45:52,410 --> 00:45:54,280
Now, and you have to write back the data.

714
00:45:54,280 --> 00:45:57,080
But what you'll notice here is you can't

715
00:45:57,080 --> 00:46:01,050
transition to the exclusive state in this
case.

716
00:46:01,050 --> 00:46:05,475
And in fact, the only entry point into the
exclusive state is this read

717
00:46:05,475 --> 00:46:09,989
miss case here when the data is not shared
or in another cache already.

718
00:46:11,430 --> 00:46:16,370
So when another processor tries to do a
read, you have to write back the data.

719
00:46:16,370 --> 00:46:17,450
You transition to the shared

720
00:46:17,450 --> 00:46:19,710
state but this is the same as what we did
in the MSI protocol.

721
00:46:22,690 --> 00:46:24,390
a couple of other things here.

722
00:46:24,390 --> 00:46:28,394
Another processor tries to do a write,
you'll see their

723
00:46:28,394 --> 00:46:33,100
broadcast of their intent to write if
you're in modified state.

724
00:46:33,100 --> 00:46:35,333
We need to clearly do an invalidate

725
00:46:35,333 --> 00:46:37,910
because only one cache can have dirty
data.

726
00:46:39,050 --> 00:46:43,720
And so you see it, you write it back and
you transition to the invalidation state.

727
00:46:43,720 --> 00:46:48,430
And that's, that's, that's the sum of this
protocol here.

728
00:46:48,430 --> 00:46:53,920
Now what invariance do we have?
What we said in

729
00:46:53,920 --> 00:46:59,100
the MSI protocol that you can only

730
00:46:59,100 --> 00:47:04,910
have one cache or one cache line for an

731
00:47:04,910 --> 00:47:04,911
[INAUDIBLE].

732
00:47:04,911 --> 00:47:11,827
Or one line only in the modified state in
one cache at a time for a

733
00:47:11,827 --> 00:47:19,600
particular cache slide.
So there's something similar here.

734
00:47:19,600 --> 00:47:22,170
Well, yes, but it's a little bit
different.

735
00:47:22,170 --> 00:47:25,366
Because we took the modified state and we

736
00:47:25,366 --> 00:47:30,320
split it into two similar, sort of,
invariant here.

737
00:47:30,320 --> 00:47:36,468
Except now we say that only a, a given
cache line can only be

738
00:47:36,468 --> 00:47:43,280
in either modified or exclusive in one
cache at any given time.

739
00:47:43,280 --> 00:47:46,010
And, this guarantees that we never have
stale data.

740
00:47:46,010 --> 00:47:49,420
And guarantees that we have a useful cache
coherence protocol.

741
00:47:54,100 --> 00:48:00,094
Okay.
next class we're going to expand this and

742
00:48:00,094 --> 00:48:06,658
talk about a few other cache coherence
protocols.

743
00:48:06,658 --> 00:48:10,970
One we're going to talk about is what they
used in the original AMD Opteron.

744
00:48:10,970 --> 00:48:13,820
Processors where they actually had more
states.

745
00:48:13,820 --> 00:48:17,540
They added another state here called the
own state.

746
00:48:17,540 --> 00:48:20,390
And they went from MESI to MOESI.

747
00:48:20,390 --> 00:48:24,790
And this own state is a optimization which
allows you to

748
00:48:24,790 --> 00:48:30,029
do cache to cache transfers and
effectively track who has the most

749
00:48:31,298 --> 00:48:35,385
up to date written data versus having to
always write everything out

750
00:48:35,385 --> 00:48:39,100
to main memory and do memory based out to
main memory transfers.

751
00:48:39,100 --> 00:48:42,126
And, we'll also talk a little bit about

752
00:48:42,126 --> 00:48:45,508
MESIF, which is something they use in some
of

753
00:48:45,508 --> 00:48:49,602
the Intel processors, where they will also
split

754
00:48:49,602 --> 00:48:54,210
the shared state into another state beyond
MESIF.

755
00:48:54,210 --> 00:48:54,830
So let's stop here for today.