1 00:00:03,240 --> 00:00:07,521 And I just wanted to say quick note about interconnection networks. 2 00:00:07,521 --> 00:00:12,036 if you guys are, get interested in interconnection networks as we go along, 3 00:00:12,036 --> 00:00:16,024 I highly recommend this book. I have not assigned anything from this 4 00:00:16,024 --> 00:00:19,778 book for this class. But this is Bill Dalley and Brian Towles' 5 00:00:19,778 --> 00:00:22,006 interconnections book. It's quite good. 6 00:00:22,006 --> 00:00:25,290 It's kind of the definitive guide on the subject matter. 7 00:00:25,290 --> 00:00:26,080 Okay. So, 8 00:00:26,080 --> 00:00:29,260 interconnection networks. We, 9 00:00:29,260 --> 00:00:36,038 we talked about buses and we talked about memory protocol about buses. 10 00:00:36,038 --> 00:00:45,060 That's only one way to share information. Now, people may argue that's a intuitive 11 00:00:45,060 --> 00:00:49,822 way to share information because we had the ability to do load and storage from 12 00:00:49,822 --> 00:00:53,739 our processor. But there are other ways to share 13 00:00:53,739 --> 00:00:57,276 information. So, in today's lecture, we are going to 14 00:00:57,276 --> 00:01:02,319 talk about two main pieces of two main topics. One, how do you share information 15 00:01:02,319 --> 00:01:08,039 in, in a different way which commingles the movement of data with a 16 00:01:08,039 --> 00:01:13,340 synchronization primitive. We are going to call that messaging, 17 00:01:13,340 --> 00:01:20,755 [COUGH] which is in contrast to memory or communicating via memory addresses. 18 00:01:20,755 --> 00:01:25,974 We're also going to talk about different ways to connect together processors, 19 00:01:25,974 --> 00:01:30,786 which can either have better performance or better scalability, i.e., 20 00:01:30,786 --> 00:01:37,565 have more nodes in the system. And, okay, so let's, let's compare and 21 00:01:37,565 --> 00:01:44,767 contrast buses to other forms of network and see why we might want to change. 22 00:01:44,767 --> 00:01:48,736 So, this, this is going back to what we had just talked about. 23 00:01:48,736 --> 00:01:53,697 Let's say, you have two cores on a bus. And let's forget about how you want to 24 00:01:53,697 --> 00:01:57,137 communicate. It could either be via shared memory or 25 00:01:57,137 --> 00:02:01,437 it could be via messaging or it could be via some other protocol, 26 00:02:01,437 --> 00:02:04,480 Ethernet, whatever, whatever you want put here, 27 00:02:04,480 --> 00:02:09,208 which is basically a form of messaging. We have one core and it wants to 28 00:02:09,208 --> 00:02:13,229 communicate with another core. Note, I don't draw any caches here 29 00:02:13,229 --> 00:02:16,301 because there may not be caches, there may be caches. 30 00:02:16,301 --> 00:02:18,882 It's doesn't, it's kind of immaterial here. 31 00:02:18,882 --> 00:02:23,676 If one person wants to talk to another person, they can just yell at the other 32 00:02:23,676 --> 00:02:26,502 person. It's two people so there's, it's pretty 33 00:02:26,502 --> 00:02:29,145 easy to do. We, now, there is some challenges, 34 00:02:29,145 --> 00:02:33,692 we can't both talk at the same time. We might not be able to understand each 35 00:02:33,692 --> 00:02:36,457 other. So, there's some arbitration that needs 36 00:02:36,457 --> 00:02:39,223 to happen. But, in general, that arbitration is 37 00:02:39,223 --> 00:02:42,234 pretty simple. Only two, two cores or entities on this 38 00:02:42,234 --> 00:02:43,656 bus. Okay. 39 00:02:43,656 --> 00:02:49,006 Now we, now we go to more cores. So, we have four people in a room trying 40 00:02:49,006 --> 00:02:53,676 to shout to each other. Well, or four people on a bus trying to 41 00:02:53,676 --> 00:02:57,254 shout to each other. And as, as we just talked about, the 42 00:02:57,254 --> 00:03:01,872 bandwidth can be a challenge here. The arbitration for the bus can be a 43 00:03:01,872 --> 00:03:04,603 challenge. And because we're talking about 44 00:03:04,603 --> 00:03:09,232 interconnection networks, the wire delay and capacitance of the 45 00:03:09,232 --> 00:03:13,263 network can be worse or it can be a challenge here. 46 00:03:13,263 --> 00:03:18,478 So, if we got one core, it needs to drive the shared multidrop bus. 47 00:03:18,478 --> 00:03:24,563 There's a lot of capacitance on this bus, much more so than this case because all 48 00:03:24,563 --> 00:03:30,173 of a sudden, we've, we've, doubled the length of the bus, so the wires are 49 00:03:30,173 --> 00:03:33,966 longer and we've also put more loads on the bus. 50 00:03:33,966 --> 00:03:37,760 So, there's actually more capacitance on this bus. 51 00:03:40,040 --> 00:03:45,557 Okay. Now, we start to think about trying to build a bus that has a lot more cores. 52 00:03:45,557 --> 00:03:50,114 In this case, twelve. And through this core, if you go shot to 53 00:03:50,114 --> 00:03:54,395 that core, there's no pipelines around on this bus or anything. 54 00:03:54,395 --> 00:03:59,137 You go to shout and has to propagate all the way down here and, you know, we're, 55 00:03:59,137 --> 00:04:02,298 we're talking about high rates of communication. 56 00:04:02,298 --> 00:04:07,764 You actually have to wait for the time of flight of light from here to get down to 57 00:04:07,764 --> 00:04:10,680 there. And because we're, if we're using 58 00:04:10,680 --> 00:04:14,861 something, let's say, like a snoopy protocol or a broadcast protocol, because 59 00:04:14,861 --> 00:04:20,543 that's all we have here, we have to wait for and node here to 60 00:04:20,543 --> 00:04:23,863 communicate with every other node. So, we have to wait for the worst case 61 00:04:23,863 --> 00:04:26,860 time for this node to communicate to that node, every clock cycle. 62 00:04:28,280 --> 00:04:31,717 Hm, okay. and as I said there is capacitance, so 63 00:04:31,717 --> 00:04:36,609 it's not quite a, just a transmission line, so it's not just a transmission 64 00:04:36,609 --> 00:04:40,823 line problem here. We also have to worry about [COUGH] the 65 00:04:40,823 --> 00:04:45,214 capacitance in trying to drive all of these different receivers. 66 00:04:45,214 --> 00:04:50,841 And it's a multidirectional bus so we have to have effectively tri-states and 67 00:04:50,841 --> 00:04:56,396 the ability to drive or just receive. Well, all of a sudden, we have twelve 68 00:04:56,396 --> 00:04:59,782 people and actually, we have twelve people in this room. 69 00:04:59,782 --> 00:05:04,215 So, let's all try to pick a number between one and ten and shout it real 70 00:05:04,215 --> 00:05:06,616 fast on the count of three. One, two, three, 71 00:05:06,616 --> 00:05:07,232 five. Okay. 72 00:05:07,232 --> 00:05:11,357 I could, I, I do, I shouted five, I don't know what everyone else said. 73 00:05:11,357 --> 00:05:14,866 So, [LAUGH] that's does anyone, could everyone hear everyone else's? 74 00:05:14,866 --> 00:05:19,484 Does everyone know exactly what all other ten people said at the same time, 75 00:05:19,484 --> 00:05:23,855 or twelve people said at the same time? You heard your nearest neighbor. 76 00:05:23,855 --> 00:05:27,180 Okay, but did you know, do you know what Yankey said? 77 00:05:28,860 --> 00:05:31,480 Yeah. Okay. So, 78 00:05:31,480 --> 00:05:35,568 this is, this is the challenge. And if we need to guarantee that only one 79 00:05:35,568 --> 00:05:39,089 person can yell on the bus at a time, we need some arbitration. 80 00:05:39,089 --> 00:05:43,291 But the arbitration logic is slower now because we have lots of people 81 00:05:43,291 --> 00:05:47,753 communicating so we have to run a wire from this node down to this node and 82 00:05:47,753 --> 00:05:52,202 then, we had to come back in the arbitraration, logical, say, over here 83 00:05:52,202 --> 00:05:56,717 that needs to make some decision. And the decision is slower because as 84 00:05:56,717 --> 00:06:01,886 more layers of logic, more combination of logic, we will say, to make arbitration 85 00:06:01,886 --> 00:06:04,504 decision. Hm, okay. Now, if we go to a thousand 86 00:06:04,504 --> 00:06:07,840 processors or a a thousand cores on a bus, 87 00:06:07,840 --> 00:06:11,418 you know, we, we could even have twelve people in the room shout at the same 88 00:06:11,418 --> 00:06:13,357 time. You can have a thousand people in the 89 00:06:13,357 --> 00:06:16,836 room shout at the same time, and physically be distanced to the wiring 90 00:06:16,836 --> 00:06:20,614 between this thousand different nodes is going to decrease the speed of the bus 91 00:06:20,614 --> 00:06:23,100 significantly. So, it's some, something to think about. 92 00:06:25,720 --> 00:06:31,484 So, this, this motivates us to take the same twelve course and think about some 93 00:06:31,484 --> 00:06:37,312 other ways to connect them. Now, what I'm going to show here is a, 94 00:06:37,312 --> 00:06:45,900 what's known as a switched interconnect or sometimes known as a point-to-point 95 00:06:45,900 --> 00:06:49,218 link solution. Now, point-to-point does not mean that 96 00:06:49,218 --> 00:06:52,856 this core can communicate directly with every other core. 97 00:06:52,856 --> 00:06:57,451 That has, that has a different name, we'll talk about that later today. 98 00:06:57,451 --> 00:07:01,000 Instead, point-to-point just means each link, 99 00:07:01,000 --> 00:07:04,935 only has one sender and one receiver. And then. 100 00:07:04,935 --> 00:07:10,240 you use switches along the way to make decisions and to route. 101 00:07:11,660 --> 00:07:15,676 So, if we look at this, we can actually have multiple nearest neighbor 102 00:07:15,676 --> 00:07:19,051 communication happening. So, all of a sudden, by adding this 103 00:07:19,051 --> 00:07:23,649 switching, we can both have connectivity between all the different nodes, but we 104 00:07:23,649 --> 00:07:27,080 can also have sort of subconversations happening. 105 00:07:27,080 --> 00:07:32,230 But this still allows for this processor here to go communicate with the one 106 00:07:32,230 --> 00:07:36,644 that's at the farthest extent. And we need to decide how to do that. 107 00:07:36,644 --> 00:07:41,928 Whether it communicates sort of this way or this way or that way or some other 108 00:07:41,928 --> 00:07:45,139 squiggly line. [COUGH] We can also take the same 109 00:07:45,139 --> 00:07:48,015 point-to-point switch interconnect network. 110 00:07:48,015 --> 00:07:51,895 And like a bus, which we can increase the width of the bus, 111 00:07:51,895 --> 00:07:55,900 which does not help us with the occupancy on the bus, 112 00:07:55,900 --> 00:08:00,476 we can add more networks or we can affectively add multiple concurrent, 113 00:08:00,476 --> 00:08:05,311 switching interconnection networks or we can increase the bandwidth on these 114 00:08:05,311 --> 00:08:09,307 buses. So, it's similar sorts of ideas there and 115 00:08:09,307 --> 00:08:14,399 similar sorts of bandwidth tricks you can do to increase bandwidth on buses. 116 00:08:14,399 --> 00:08:17,880 You can play on there, switch interconnection always. 117 00:08:17,880 --> 00:08:22,392 Okay. So, this is just a very broad overview. And now, we're getting into 118 00:08:22,392 --> 00:08:24,520 some, some more specific ideas.