1
00:00:03,830 --> 00:00:04,340
Okay.

2
00:00:04,340 --> 00:00:10,732
So let's change gears here and start
talking about implementation of different

3
00:00:10,732 --> 00:00:17,330
types of coherence and cache clearance
systems for multiprocessor systems.

4
00:00:17,330 --> 00:00:20,074
So, the first thing we're going to

5
00:00:20,074 --> 00:00:24,540
start off with is small symmetric
multiprocessors.

6
00:00:24,540 --> 00:00:27,860
Now, why do I call these things symmetric
multiprocessors?

7
00:00:27,860 --> 00:00:28,850
Well, in a symmetric

8
00:00:28,850 --> 00:00:32,660
multiprocessor, everything is the same
distance away from memory.

9
00:00:32,660 --> 00:00:36,060
So, we have processors across the top
here.

10
00:00:36,060 --> 00:00:41,750
They have a shared CPU memory bus here,
and memory is sitting over here.

11
00:00:41,750 --> 00:00:44,280
And these processors are all equally
distanced away from this memory.

12
00:00:45,840 --> 00:00:49,653
And this shared memory bus here also goes
and

13
00:00:49,653 --> 00:00:53,931
communicates with the I/O bus, where you
have things

14
00:00:53,931 --> 00:00:59,046
like discs, graphics controllers,
networking, and

15
00:00:59,046 --> 00:01:01,500
any processor can do any I/O.

16
00:01:01,500 --> 00:01:04,500
And any processor can communicate with
memory.

17
00:01:04,500 --> 00:01:05,720
And they're, they're symmetric.

18
00:01:07,170 --> 00:01:11,500
Now, let's zoom in on what this bus looks
like here.

19
00:01:11,500 --> 00:01:13,660
because it's going to actually influence
our design.

20
00:01:13,660 --> 00:01:16,067
And I want to point out that buses are

21
00:01:16,067 --> 00:01:18,972
only one design that you could come up
with

22
00:01:18,972 --> 00:01:21,210
for a multiprocessor system.

23
00:01:21,210 --> 00:01:25,590
You could also think about having point to
point interconnect.

24
00:01:25,590 --> 00:01:30,170
So what I mean by that is one processor
connects to another processor directly.

25
00:01:30,170 --> 00:01:32,760
But then a third processor connects to the

26
00:01:32,760 --> 00:01:36,240
first processor but not, not, not vice
versa.

27
00:01:36,240 --> 00:01:39,600
So you could have some sort of routing
needed.

28
00:01:39,600 --> 00:01:41,640
And this is what you'll see when we start

29
00:01:41,640 --> 00:01:45,500
to talk about large multi-cores or large
mutiprocessor systems.

30
00:01:45,500 --> 00:01:49,198
But for today, we're going to constrain

31
00:01:49,198 --> 00:01:54,100
ourselves to thinking about small
symmetric multi-cores, where

32
00:01:54,100 --> 00:02:00,770
all of the processors are equidistant away
from memory, and they sit on a shared bus.

33
00:02:00,770 --> 00:02:03,480
So let's take a loot at what a shared bus
looks like.

34
00:02:04,880 --> 00:02:11,642
So, here we have a diagram representing a
multi-drop

35
00:02:11,642 --> 00:02:13,300
memory bus.

36
00:02:13,300 --> 00:02:16,675
And let's start off by looking at all of
the different

37
00:02:16,675 --> 00:02:21,540
signal types that you need in this
multi-trop, multi-drop memory bus.

38
00:02:21,540 --> 00:02:25,370
And before we do that, let's describe what
multi-drop means.

39
00:02:25,370 --> 00:02:29,258
So, multi-drop just means that it's a
shared medium,

40
00:02:29,258 --> 00:02:32,530
it's a shared wire that all of the
processors.

41
00:02:32,530 --> 00:02:34,420
So here we have processor one, processor

42
00:02:34,420 --> 00:02:37,490
two, and main memory connect into this
bus.

43
00:02:37,490 --> 00:02:40,960
So it's just a wire and then you have taps
coming off the wires.

44
00:02:40,960 --> 00:02:43,346
And this is why we call it a multi-drop
bus.

45
00:02:44,370 --> 00:02:46,410
And when you go to look at this, there's

46
00:02:46,410 --> 00:02:49,700
some sort of positives and negatives in
multi-drop bus.

47
00:02:49,700 --> 00:02:52,620
The positive here is that you don't have
to route.

48
00:02:52,620 --> 00:02:56,448
If you wanted to have one processor
communicate with main memory, or

49
00:02:56,448 --> 00:03:00,165
read main memory, it can just shout saying
where is address five.

50
00:03:00,165 --> 00:03:02,400
And main memory can just to say, I have
that here in terms of data.

51
00:03:03,940 --> 00:03:06,775
But the downside to this is cluster one
and

52
00:03:06,775 --> 00:03:10,660
cluster two can't go shout at the same
time.

53
00:03:10,660 --> 00:03:13,498
So, as we start to add more processors,
something

54
00:03:13,498 --> 00:03:17,050
like a shared multi-drop bus might become
a problem.

55
00:03:17,050 --> 00:03:19,682
And we're going to talk about that once we
start to get to

56
00:03:19,682 --> 00:03:23,660
large multiprocessor systems or large
parallel systems at the end of this class.

57
00:03:24,710 --> 00:03:29,840
But let's for now let's, let's focus on
multi-drop memory buses.

58
00:03:29,840 --> 00:03:33,080
And let's look at the all the different
wires you're going to need here.

59
00:03:34,140 --> 00:03:35,880
So, we'll start from the bottom here.

60
00:03:35,880 --> 00:03:37,670
So in the bottom, we just have a clock.

61
00:03:37,670 --> 00:03:40,290
And this is basically driven externally.

62
00:03:40,290 --> 00:03:44,605
You don't need any processor-1,
processor-2.

63
00:03:44,605 --> 00:03:46,120
Or main memory is not going to be driving
this.

64
00:03:46,120 --> 00:03:48,911
This is just something they all receive to
keep everybody synchronized.

65
00:03:50,160 --> 00:03:51,500
Now, let's start at the top here.

66
00:03:52,760 --> 00:03:55,750
Arbitration.
What does arbitration mean?

67
00:03:55,750 --> 00:04:00,752
Well, arbitration means you need some way
to determine who is allowed to

68
00:04:00,752 --> 00:04:04,866
shout, or who is allowed to utilize the
bus at a given time.

69
00:04:04,866 --> 00:04:08,232
So, these sets of wires are going to be
used

70
00:04:08,232 --> 00:04:11,994
to have one of the three things for
instance on

71
00:04:11,994 --> 00:04:15,954
this bus determine who is allowed to use
the

72
00:04:15,954 --> 00:04:19,331
bus or shout on the bus at any given time.

73
00:04:19,331 --> 00:04:20,819
And how do

74
00:04:20,819 --> 00:04:23,310
we go about doing this?

75
00:04:23,310 --> 00:04:26,395
Well, there's a couple different ways you
can go build arbitration logic.

76
00:04:26,395 --> 00:04:30,550
one way is you could actually have what's
known as a pull-down bus.

77
00:04:30,550 --> 00:04:33,666
So, let's say you have a wire per
processor, or

78
00:04:33,666 --> 00:04:37,720
wire per entity that wants to communicate
on this bus.

79
00:04:37,720 --> 00:04:43,390
And when you want to go use it, you pull
down a wire and this inflicts priority.

80
00:04:43,390 --> 00:04:46,398
If you see that processor 1 is pulling

81
00:04:46,398 --> 00:04:51,660
down the a wire, and processor 2 is also
pulling it down.

82
00:04:51,660 --> 00:04:52,820
One always wins.

83
00:04:52,820 --> 00:04:55,004
But usually that's not the best thing to
do, because then you

84
00:04:55,004 --> 00:04:58,430
might have, you're required to basically
have some sort of fixed priority.

85
00:04:58,430 --> 00:05:06,770
Instead, you could think about having a
request and grant system.

86
00:05:06,770 --> 00:05:09,434
So in the request and grant system, let's

87
00:05:09,434 --> 00:05:12,160
say you have a chip which is an
arbitrator.

88
00:05:14,860 --> 00:05:21,068
And you have let's say three entities

89
00:05:21,068 --> 00:05:26,112
on here that have three request

90
00:05:26,112 --> 00:05:32,601
signals: REQ 1, REQ2, and REQ3.

91
00:05:32,601 --> 00:05:35,835
This arbitrator can try to do something
like a, round

92
00:05:35,835 --> 00:05:39,920
robin scheme, or try to influence some
sort of fairness.

93
00:05:39,920 --> 00:05:45,740
And what would happen is at the beginning
of a memory bus cycle.

94
00:05:45,740 --> 00:05:48,996
You'll actually have, whoever needs to use
the bus

95
00:05:48,996 --> 00:05:52,820
that cycle will all let's say, assert
their wire.

96
00:05:52,820 --> 00:05:54,690
Assert their request wire.

97
00:05:54,690 --> 00:05:59,150
And then arbitrator will take it all in,
and take it all into consideration.

98
00:05:59,150 --> 00:06:01,140
Might have some state inside of here.

99
00:06:01,140 --> 00:06:04,967
And then it will tell only one of the
entities on this

100
00:06:04,967 --> 00:06:06,850
bus with a grant signal.

101
00:06:14,780 --> 00:06:17,882
Three grant signals here, it will only
assert

102
00:06:17,882 --> 00:06:20,390
one of these grant signals and make a
decision

103
00:06:20,390 --> 00:06:23,294
and say, you know processor one wins, or
processor

104
00:06:23,294 --> 00:06:26,860
two wins depending on which wire here gets
asserted.

105
00:06:26,860 --> 00:06:29,790
So, multiple people can request but only
one wins.

106
00:06:30,840 --> 00:06:33,738
So the first thing you're going to want to
do to try

107
00:06:33,738 --> 00:06:37,750
to use this bus, is you're actually try to
arbitrate for the bus.

108
00:06:37,750 --> 00:06:38,860
And there's a set of wires for that.

109
00:06:40,380 --> 00:06:41,830
Okay, what happens next?

110
00:06:41,830 --> 00:06:49,650
Well, on the control wires, you're
going to say what you want to achieve.

111
00:06:49,650 --> 00:06:57,590
So you might have a request that says, I'm
want to do a read on the bus.

112
00:06:57,590 --> 00:07:00,460
Now we haven't yet said where we want to
do a read of.

113
00:07:00,460 --> 00:07:03,570
Because, if you look at this multi-drop
bus, we have wires for that.

114
00:07:03,570 --> 00:07:04,980
We have an address bus.

115
00:07:04,980 --> 00:07:05,640
So you first

116
00:07:05,640 --> 00:07:07,356
will say, I want to do a read and I

117
00:07:07,356 --> 00:07:09,920
want to do a read of address five, we'll
say.

118
00:07:12,350 --> 00:07:18,680
Then in a traditional multi-drop bus
you'll actually wait.

119
00:07:18,680 --> 00:07:21,591
So you'll assert the arbitration, the
control

120
00:07:21,591 --> 00:07:24,950
of the address, and you'll be waiting
around.

121
00:07:24,950 --> 00:07:26,860
You'll say, I want to agree to address
five.

122
00:07:26,860 --> 00:07:32,196
And then main memory will say I have a
dress five and it will assert

123
00:07:32,196 --> 00:07:38,026
onto the data bus here, we'll say, the
data for address five.

124
00:07:38,026 --> 00:07:41,200
And then processor 1 can read in that data
then.

125
00:07:42,940 --> 00:07:46,060
Now, what, what's some downsides of doing
something like this.

126
00:07:46,060 --> 00:07:51,845
Well, as you go to build this multi-drop
bus, you're basically going

127
00:07:51,845 --> 00:07:57,680
to reserve the bus the entire time that
you are doing one memory transaction.

128
00:07:58,770 --> 00:08:04,164
And you need to hold the bus the whole
time while you do the arbitration

129
00:08:04,164 --> 00:08:07,480
control, address data and data come back.

130
00:08:07,480 --> 00:08:09,610
And it could be a long time, because main

131
00:08:09,610 --> 00:08:13,018
memory can take a long time to return
data.

132
00:08:13,018 --> 00:08:16,356
And, and this, this is a problem, so what
did people think about doing?

133
00:08:16,356 --> 00:08:21,460
Well, they applied ideas from processor
design and

134
00:08:21,460 --> 00:08:26,600
said, maybe we can try to pipeline the
bus.

135
00:08:26,600 --> 00:08:29,420
So note, and let's flip back and forth
here for a second.

136
00:08:29,420 --> 00:08:33,480
The title of the slide changes, but the
content, the content doesn't.

137
00:08:33,480 --> 00:08:37,010
So this pipeline bus actually looks the
same.

138
00:08:37,010 --> 00:08:41,714
So it has the same data, but now instead
of arbitrating and winning

139
00:08:41,714 --> 00:08:47,100
the entire bus, and holding the entire bus
for a long period of time.

140
00:08:47,100 --> 00:08:51,918
Instead, we subdivide all these different
categories and actually pipeline

141
00:08:51,918 --> 00:08:56,300
the access to them and use them only when
they're needed.

142
00:08:56,300 --> 00:08:59,090
So, we can actually take a look at this as
a picture here.

143
00:08:59,090 --> 00:09:02,402
And we can see, for instance, on a
pipelined bus, you

144
00:09:02,402 --> 00:09:07,106
might first let's say processor 1 is
trying to do something.

145
00:09:07,106 --> 00:09:11,360
It'll assert processor 1 onto the
arbitration lines and let's say it wins.

146
00:09:12,680 --> 00:09:17,630
And then in the next cycle, it'll assert
that it wants to do a load.

147
00:09:18,810 --> 00:09:21,156
And then in the next cycle,

148
00:09:21,156 --> 00:09:26,296
it asserts that the address.
And finally, let's say the

149
00:09:26,296 --> 00:09:31,392
main memory returns the data quickly here
and returns the data

150
00:09:31,392 --> 00:09:36,910
over here very quickly.
Now, why is this good?

151
00:09:36,910 --> 00:09:39,020
Well, because it's pipelined.

152
00:09:39,020 --> 00:09:41,650
The next cycle here someone else can be
arbitrating for the bus.

153
00:09:43,250 --> 00:09:46,250
The cycle after the load or the control
data signals are used

154
00:09:46,250 --> 00:09:49,500
here, someone else can be putting a
different transaction on.

155
00:09:50,720 --> 00:09:54,014
Likewise in the address here, the next
cycle someone can be

156
00:09:54,014 --> 00:09:57,290
putting something here and data can be
coming the next cycle.

157
00:09:57,290 --> 00:10:01,050
So you can basically not have to hold all
the wires for the

158
00:10:01,050 --> 00:10:03,690
whole time of one memory transaction,

159
00:10:03,690 --> 00:10:07,170
but instead you can pipeline those
transactions.

160
00:10:07,170 --> 00:10:11,310
And this is just to give you an idea of
how the physical implementation of

161
00:10:11,310 --> 00:10:15,190
the wiring of small symmetric
multiprocessors work.

162
00:10:15,190 --> 00:10:16,650
In reality, they're a little more complex.

163
00:10:16,650 --> 00:10:19,082
So something, we're not going to talk
about

164
00:10:19,082 --> 00:10:19,658
[INAUDIBLE]

165
00:10:19,658 --> 00:10:20,234
[INAUDIBLE]

166
00:10:20,234 --> 00:10:23,882
what you'll see people do when they go
build these pipeline buses,

167
00:10:23,882 --> 00:10:28,190
is they'll actually do what is called a
split phase transaction bus.

168
00:10:28,190 --> 00:10:33,030
Where instead of let's say waiting for the
data to come back.

169
00:10:33,030 --> 00:10:36,358
For instance in this example here it's
very possible that the

170
00:10:36,358 --> 00:10:39,700
data from main memory might take a couple
cycles to come back.

171
00:10:40,780 --> 00:10:45,330
Instead of just waiting there which would
slow down your pipeline,

172
00:10:45,330 --> 00:10:48,200
if you have to stall for instance.

173
00:10:48,200 --> 00:10:52,271
Instead of doing that, you can issue a
request and then some time in the

174
00:10:52,271 --> 00:10:57,310
future, the main memory might arbitrate
for the bus again and the return the data.

175
00:10:57,310 --> 00:10:59,112
So that's why it's called a split phase

176
00:10:59,112 --> 00:11:02,350
transaction, so it's multiple phases to
one transaction.

177
00:11:02,350 --> 00:11:05,500
So the first phase might be request the
data, where you

178
00:11:05,500 --> 00:11:08,500
might have to use all of the portions of
the bus.

179
00:11:08,500 --> 00:11:10,375
And then the response for

180
00:11:10,375 --> 00:11:15,775
the, the data will be the main memory
arbitrating for the bus, saying that it's

181
00:11:15,775 --> 00:11:18,925
going to do a data response, and
reasserting

182
00:11:18,925 --> 00:11:22,100
the address and then giving the data back.

183
00:11:22,100 --> 00:11:25,070
So you can see that that's a a better way
to use the bus,

184
00:11:25,070 --> 00:11:27,890
because you don't have to hold the bus for
a long period of time.

185
00:11:31,930 --> 00:11:34,922
So, one of the challenges this is that you
still have

186
00:11:34,922 --> 00:11:38,650
everybody trying to scream on the bus at
the same time.

187
00:11:38,650 --> 00:11:42,390
And if you were to take everyone in this
room and try to scream all at

188
00:11:42,390 --> 00:11:44,362
the same time, we would not be able

189
00:11:44,362 --> 00:11:47,710
to understand what, what each other is, is
saying.

190
00:11:47,710 --> 00:11:51,480
So, that's why we need arbitration here is
to if you will to sort

191
00:11:51,480 --> 00:11:55,470
of house around the token so only one
person can speak at a time.

192
00:11:55,470 --> 00:11:57,160
But if you want to have multiple

193
00:11:57,160 --> 00:12:00,280
people speaking at a time, we're going to
have to look at

194
00:12:00,280 --> 00:12:04,634
more complex systems and we're going to
talk about that in two lectures.