1 00:00:03,830 --> 00:00:04,340 Okay. 2 00:00:04,340 --> 00:00:10,732 So let's change gears here and start talking about implementation of different 3 00:00:10,732 --> 00:00:17,330 types of coherence and cache clearance systems for multiprocessor systems. 4 00:00:17,330 --> 00:00:20,074 So, the first thing we're going to 5 00:00:20,074 --> 00:00:24,540 start off with is small symmetric multiprocessors. 6 00:00:24,540 --> 00:00:27,860 Now, why do I call these things symmetric multiprocessors? 7 00:00:27,860 --> 00:00:28,850 Well, in a symmetric 8 00:00:28,850 --> 00:00:32,660 multiprocessor, everything is the same distance away from memory. 9 00:00:32,660 --> 00:00:36,060 So, we have processors across the top here. 10 00:00:36,060 --> 00:00:41,750 They have a shared CPU memory bus here, and memory is sitting over here. 11 00:00:41,750 --> 00:00:44,280 And these processors are all equally distanced away from this memory. 12 00:00:45,840 --> 00:00:49,653 And this shared memory bus here also goes and 13 00:00:49,653 --> 00:00:53,931 communicates with the I/O bus, where you have things 14 00:00:53,931 --> 00:00:59,046 like discs, graphics controllers, networking, and 15 00:00:59,046 --> 00:01:01,500 any processor can do any I/O. 16 00:01:01,500 --> 00:01:04,500 And any processor can communicate with memory. 17 00:01:04,500 --> 00:01:05,720 And they're, they're symmetric. 18 00:01:07,170 --> 00:01:11,500 Now, let's zoom in on what this bus looks like here. 19 00:01:11,500 --> 00:01:13,660 because it's going to actually influence our design. 20 00:01:13,660 --> 00:01:16,067 And I want to point out that buses are 21 00:01:16,067 --> 00:01:18,972 only one design that you could come up with 22 00:01:18,972 --> 00:01:21,210 for a multiprocessor system. 23 00:01:21,210 --> 00:01:25,590 You could also think about having point to point interconnect. 24 00:01:25,590 --> 00:01:30,170 So what I mean by that is one processor connects to another processor directly. 25 00:01:30,170 --> 00:01:32,760 But then a third processor connects to the 26 00:01:32,760 --> 00:01:36,240 first processor but not, not, not vice versa. 27 00:01:36,240 --> 00:01:39,600 So you could have some sort of routing needed. 28 00:01:39,600 --> 00:01:41,640 And this is what you'll see when we start 29 00:01:41,640 --> 00:01:45,500 to talk about large multi-cores or large mutiprocessor systems. 30 00:01:45,500 --> 00:01:49,198 But for today, we're going to constrain 31 00:01:49,198 --> 00:01:54,100 ourselves to thinking about small symmetric multi-cores, where 32 00:01:54,100 --> 00:02:00,770 all of the processors are equidistant away from memory, and they sit on a shared bus. 33 00:02:00,770 --> 00:02:03,480 So let's take a loot at what a shared bus looks like. 34 00:02:04,880 --> 00:02:11,642 So, here we have a diagram representing a multi-drop 35 00:02:11,642 --> 00:02:13,300 memory bus. 36 00:02:13,300 --> 00:02:16,675 And let's start off by looking at all of the different 37 00:02:16,675 --> 00:02:21,540 signal types that you need in this multi-trop, multi-drop memory bus. 38 00:02:21,540 --> 00:02:25,370 And before we do that, let's describe what multi-drop means. 39 00:02:25,370 --> 00:02:29,258 So, multi-drop just means that it's a shared medium, 40 00:02:29,258 --> 00:02:32,530 it's a shared wire that all of the processors. 41 00:02:32,530 --> 00:02:34,420 So here we have processor one, processor 42 00:02:34,420 --> 00:02:37,490 two, and main memory connect into this bus. 43 00:02:37,490 --> 00:02:40,960 So it's just a wire and then you have taps coming off the wires. 44 00:02:40,960 --> 00:02:43,346 And this is why we call it a multi-drop bus. 45 00:02:44,370 --> 00:02:46,410 And when you go to look at this, there's 46 00:02:46,410 --> 00:02:49,700 some sort of positives and negatives in multi-drop bus. 47 00:02:49,700 --> 00:02:52,620 The positive here is that you don't have to route. 48 00:02:52,620 --> 00:02:56,448 If you wanted to have one processor communicate with main memory, or 49 00:02:56,448 --> 00:03:00,165 read main memory, it can just shout saying where is address five. 50 00:03:00,165 --> 00:03:02,400 And main memory can just to say, I have that here in terms of data. 51 00:03:03,940 --> 00:03:06,775 But the downside to this is cluster one and 52 00:03:06,775 --> 00:03:10,660 cluster two can't go shout at the same time. 53 00:03:10,660 --> 00:03:13,498 So, as we start to add more processors, something 54 00:03:13,498 --> 00:03:17,050 like a shared multi-drop bus might become a problem. 55 00:03:17,050 --> 00:03:19,682 And we're going to talk about that once we start to get to 56 00:03:19,682 --> 00:03:23,660 large multiprocessor systems or large parallel systems at the end of this class. 57 00:03:24,710 --> 00:03:29,840 But let's for now let's, let's focus on multi-drop memory buses. 58 00:03:29,840 --> 00:03:33,080 And let's look at the all the different wires you're going to need here. 59 00:03:34,140 --> 00:03:35,880 So, we'll start from the bottom here. 60 00:03:35,880 --> 00:03:37,670 So in the bottom, we just have a clock. 61 00:03:37,670 --> 00:03:40,290 And this is basically driven externally. 62 00:03:40,290 --> 00:03:44,605 You don't need any processor-1, processor-2. 63 00:03:44,605 --> 00:03:46,120 Or main memory is not going to be driving this. 64 00:03:46,120 --> 00:03:48,911 This is just something they all receive to keep everybody synchronized. 65 00:03:50,160 --> 00:03:51,500 Now, let's start at the top here. 66 00:03:52,760 --> 00:03:55,750 Arbitration. What does arbitration mean? 67 00:03:55,750 --> 00:04:00,752 Well, arbitration means you need some way to determine who is allowed to 68 00:04:00,752 --> 00:04:04,866 shout, or who is allowed to utilize the bus at a given time. 69 00:04:04,866 --> 00:04:08,232 So, these sets of wires are going to be used 70 00:04:08,232 --> 00:04:11,994 to have one of the three things for instance on 71 00:04:11,994 --> 00:04:15,954 this bus determine who is allowed to use the 72 00:04:15,954 --> 00:04:19,331 bus or shout on the bus at any given time. 73 00:04:19,331 --> 00:04:20,819 And how do 74 00:04:20,819 --> 00:04:23,310 we go about doing this? 75 00:04:23,310 --> 00:04:26,395 Well, there's a couple different ways you can go build arbitration logic. 76 00:04:26,395 --> 00:04:30,550 one way is you could actually have what's known as a pull-down bus. 77 00:04:30,550 --> 00:04:33,666 So, let's say you have a wire per processor, or 78 00:04:33,666 --> 00:04:37,720 wire per entity that wants to communicate on this bus. 79 00:04:37,720 --> 00:04:43,390 And when you want to go use it, you pull down a wire and this inflicts priority. 80 00:04:43,390 --> 00:04:46,398 If you see that processor 1 is pulling 81 00:04:46,398 --> 00:04:51,660 down the a wire, and processor 2 is also pulling it down. 82 00:04:51,660 --> 00:04:52,820 One always wins. 83 00:04:52,820 --> 00:04:55,004 But usually that's not the best thing to do, because then you 84 00:04:55,004 --> 00:04:58,430 might have, you're required to basically have some sort of fixed priority. 85 00:04:58,430 --> 00:05:06,770 Instead, you could think about having a request and grant system. 86 00:05:06,770 --> 00:05:09,434 So in the request and grant system, let's 87 00:05:09,434 --> 00:05:12,160 say you have a chip which is an arbitrator. 88 00:05:14,860 --> 00:05:21,068 And you have let's say three entities 89 00:05:21,068 --> 00:05:26,112 on here that have three request 90 00:05:26,112 --> 00:05:32,601 signals: REQ 1, REQ2, and REQ3. 91 00:05:32,601 --> 00:05:35,835 This arbitrator can try to do something like a, round 92 00:05:35,835 --> 00:05:39,920 robin scheme, or try to influence some sort of fairness. 93 00:05:39,920 --> 00:05:45,740 And what would happen is at the beginning of a memory bus cycle. 94 00:05:45,740 --> 00:05:48,996 You'll actually have, whoever needs to use the bus 95 00:05:48,996 --> 00:05:52,820 that cycle will all let's say, assert their wire. 96 00:05:52,820 --> 00:05:54,690 Assert their request wire. 97 00:05:54,690 --> 00:05:59,150 And then arbitrator will take it all in, and take it all into consideration. 98 00:05:59,150 --> 00:06:01,140 Might have some state inside of here. 99 00:06:01,140 --> 00:06:04,967 And then it will tell only one of the entities on this 100 00:06:04,967 --> 00:06:06,850 bus with a grant signal. 101 00:06:14,780 --> 00:06:17,882 Three grant signals here, it will only assert 102 00:06:17,882 --> 00:06:20,390 one of these grant signals and make a decision 103 00:06:20,390 --> 00:06:23,294 and say, you know processor one wins, or processor 104 00:06:23,294 --> 00:06:26,860 two wins depending on which wire here gets asserted. 105 00:06:26,860 --> 00:06:29,790 So, multiple people can request but only one wins. 106 00:06:30,840 --> 00:06:33,738 So the first thing you're going to want to do to try 107 00:06:33,738 --> 00:06:37,750 to use this bus, is you're actually try to arbitrate for the bus. 108 00:06:37,750 --> 00:06:38,860 And there's a set of wires for that. 109 00:06:40,380 --> 00:06:41,830 Okay, what happens next? 110 00:06:41,830 --> 00:06:49,650 Well, on the control wires, you're going to say what you want to achieve. 111 00:06:49,650 --> 00:06:57,590 So you might have a request that says, I'm want to do a read on the bus. 112 00:06:57,590 --> 00:07:00,460 Now we haven't yet said where we want to do a read of. 113 00:07:00,460 --> 00:07:03,570 Because, if you look at this multi-drop bus, we have wires for that. 114 00:07:03,570 --> 00:07:04,980 We have an address bus. 115 00:07:04,980 --> 00:07:05,640 So you first 116 00:07:05,640 --> 00:07:07,356 will say, I want to do a read and I 117 00:07:07,356 --> 00:07:09,920 want to do a read of address five, we'll say. 118 00:07:12,350 --> 00:07:18,680 Then in a traditional multi-drop bus you'll actually wait. 119 00:07:18,680 --> 00:07:21,591 So you'll assert the arbitration, the control 120 00:07:21,591 --> 00:07:24,950 of the address, and you'll be waiting around. 121 00:07:24,950 --> 00:07:26,860 You'll say, I want to agree to address five. 122 00:07:26,860 --> 00:07:32,196 And then main memory will say I have a dress five and it will assert 123 00:07:32,196 --> 00:07:38,026 onto the data bus here, we'll say, the data for address five. 124 00:07:38,026 --> 00:07:41,200 And then processor 1 can read in that data then. 125 00:07:42,940 --> 00:07:46,060 Now, what, what's some downsides of doing something like this. 126 00:07:46,060 --> 00:07:51,845 Well, as you go to build this multi-drop bus, you're basically going 127 00:07:51,845 --> 00:07:57,680 to reserve the bus the entire time that you are doing one memory transaction. 128 00:07:58,770 --> 00:08:04,164 And you need to hold the bus the whole time while you do the arbitration 129 00:08:04,164 --> 00:08:07,480 control, address data and data come back. 130 00:08:07,480 --> 00:08:09,610 And it could be a long time, because main 131 00:08:09,610 --> 00:08:13,018 memory can take a long time to return data. 132 00:08:13,018 --> 00:08:16,356 And, and this, this is a problem, so what did people think about doing? 133 00:08:16,356 --> 00:08:21,460 Well, they applied ideas from processor design and 134 00:08:21,460 --> 00:08:26,600 said, maybe we can try to pipeline the bus. 135 00:08:26,600 --> 00:08:29,420 So note, and let's flip back and forth here for a second. 136 00:08:29,420 --> 00:08:33,480 The title of the slide changes, but the content, the content doesn't. 137 00:08:33,480 --> 00:08:37,010 So this pipeline bus actually looks the same. 138 00:08:37,010 --> 00:08:41,714 So it has the same data, but now instead of arbitrating and winning 139 00:08:41,714 --> 00:08:47,100 the entire bus, and holding the entire bus for a long period of time. 140 00:08:47,100 --> 00:08:51,918 Instead, we subdivide all these different categories and actually pipeline 141 00:08:51,918 --> 00:08:56,300 the access to them and use them only when they're needed. 142 00:08:56,300 --> 00:08:59,090 So, we can actually take a look at this as a picture here. 143 00:08:59,090 --> 00:09:02,402 And we can see, for instance, on a pipelined bus, you 144 00:09:02,402 --> 00:09:07,106 might first let's say processor 1 is trying to do something. 145 00:09:07,106 --> 00:09:11,360 It'll assert processor 1 onto the arbitration lines and let's say it wins. 146 00:09:12,680 --> 00:09:17,630 And then in the next cycle, it'll assert that it wants to do a load. 147 00:09:18,810 --> 00:09:21,156 And then in the next cycle, 148 00:09:21,156 --> 00:09:26,296 it asserts that the address. And finally, let's say the 149 00:09:26,296 --> 00:09:31,392 main memory returns the data quickly here and returns the data 150 00:09:31,392 --> 00:09:36,910 over here very quickly. Now, why is this good? 151 00:09:36,910 --> 00:09:39,020 Well, because it's pipelined. 152 00:09:39,020 --> 00:09:41,650 The next cycle here someone else can be arbitrating for the bus. 153 00:09:43,250 --> 00:09:46,250 The cycle after the load or the control data signals are used 154 00:09:46,250 --> 00:09:49,500 here, someone else can be putting a different transaction on. 155 00:09:50,720 --> 00:09:54,014 Likewise in the address here, the next cycle someone can be 156 00:09:54,014 --> 00:09:57,290 putting something here and data can be coming the next cycle. 157 00:09:57,290 --> 00:10:01,050 So you can basically not have to hold all the wires for the 158 00:10:01,050 --> 00:10:03,690 whole time of one memory transaction, 159 00:10:03,690 --> 00:10:07,170 but instead you can pipeline those transactions. 160 00:10:07,170 --> 00:10:11,310 And this is just to give you an idea of how the physical implementation of 161 00:10:11,310 --> 00:10:15,190 the wiring of small symmetric multiprocessors work. 162 00:10:15,190 --> 00:10:16,650 In reality, they're a little more complex. 163 00:10:16,650 --> 00:10:19,082 So something, we're not going to talk about 164 00:10:19,082 --> 00:10:19,658 [INAUDIBLE] 165 00:10:19,658 --> 00:10:20,234 [INAUDIBLE] 166 00:10:20,234 --> 00:10:23,882 what you'll see people do when they go build these pipeline buses, 167 00:10:23,882 --> 00:10:28,190 is they'll actually do what is called a split phase transaction bus. 168 00:10:28,190 --> 00:10:33,030 Where instead of let's say waiting for the data to come back. 169 00:10:33,030 --> 00:10:36,358 For instance in this example here it's very possible that the 170 00:10:36,358 --> 00:10:39,700 data from main memory might take a couple cycles to come back. 171 00:10:40,780 --> 00:10:45,330 Instead of just waiting there which would slow down your pipeline, 172 00:10:45,330 --> 00:10:48,200 if you have to stall for instance. 173 00:10:48,200 --> 00:10:52,271 Instead of doing that, you can issue a request and then some time in the 174 00:10:52,271 --> 00:10:57,310 future, the main memory might arbitrate for the bus again and the return the data. 175 00:10:57,310 --> 00:10:59,112 So that's why it's called a split phase 176 00:10:59,112 --> 00:11:02,350 transaction, so it's multiple phases to one transaction. 177 00:11:02,350 --> 00:11:05,500 So the first phase might be request the data, where you 178 00:11:05,500 --> 00:11:08,500 might have to use all of the portions of the bus. 179 00:11:08,500 --> 00:11:10,375 And then the response for 180 00:11:10,375 --> 00:11:15,775 the, the data will be the main memory arbitrating for the bus, saying that it's 181 00:11:15,775 --> 00:11:18,925 going to do a data response, and reasserting 182 00:11:18,925 --> 00:11:22,100 the address and then giving the data back. 183 00:11:22,100 --> 00:11:25,070 So you can see that that's a a better way to use the bus, 184 00:11:25,070 --> 00:11:27,890 because you don't have to hold the bus for a long period of time. 185 00:11:31,930 --> 00:11:34,922 So, one of the challenges this is that you still have 186 00:11:34,922 --> 00:11:38,650 everybody trying to scream on the bus at the same time. 187 00:11:38,650 --> 00:11:42,390 And if you were to take everyone in this room and try to scream all at 188 00:11:42,390 --> 00:11:44,362 the same time, we would not be able 189 00:11:44,362 --> 00:11:47,710 to understand what, what each other is, is saying. 190 00:11:47,710 --> 00:11:51,480 So, that's why we need arbitration here is to if you will to sort 191 00:11:51,480 --> 00:11:55,470 of house around the token so only one person can speak at a time. 192 00:11:55,470 --> 00:11:57,160 But if you want to have multiple 193 00:11:57,160 --> 00:12:00,280 people speaking at a time, we're going to have to look at 194 00:12:00,280 --> 00:12:04,634 more complex systems and we're going to talk about that in two lectures.