1 00:00:03,420 --> 00:00:09,761 Okay, so now we start talking about performance of our networks, and there's 2 00:00:09,761 --> 00:00:13,610 two. Main thoughts I want to get across here, 3 00:00:13,610 --> 00:00:18,306 there are two really godo ways to measure our networks, bef, when we talked about 4 00:00:18,306 --> 00:00:22,767 performance here we were talking about parameters rather of the toplogy now 5 00:00:22,767 --> 00:00:26,160 we're going to look at the overall network performance. 6 00:00:26,160 --> 00:00:31,781 First thing is, bandwidth. So bandwidth is the rate of data that can 7 00:00:31,781 --> 00:00:39,834 be transmitted over a given network link, divided by amount of time. 8 00:00:39,834 --> 00:00:45,750 Okay, that sounds pretty reasonable. Latency is how long it takes to 9 00:00:45,750 --> 00:00:49,955 communicate. And send a completed message between a 10 00:00:49,955 --> 00:00:54,673 sender and a receiver, in seconds. So, the unit on this is seconds, 11 00:00:54,673 --> 00:00:59,907 the unit on this is something like bits per second, or bytes per second, 12 00:00:59,907 --> 00:01:06,655 so the amount of data per second. These two things are linked. 13 00:01:06,655 --> 00:01:12,615 So, if we take a look at something like bandwidth, it can actually effect our 14 00:01:12,615 --> 00:01:16,222 layency. And the reason for this is, [COUGH], if 15 00:01:16,222 --> 00:01:22,323 you increase the bandwidth, you are, are going to have to send fewer 16 00:01:22,323 --> 00:01:28,244 pieces of data for a long message. Because you can send it in wider chunks, 17 00:01:28,244 --> 00:01:30,833 or faster chunks, or something like that. 18 00:01:30,833 --> 00:01:35,884 So it can actually help with latency. It can also help with latency because it 19 00:01:35,884 --> 00:01:38,991 can help reduce your congestion on your network. 20 00:01:38,991 --> 00:01:43,977 Now we haven't talked about congestion yet, we'll talk about it in a few more 21 00:01:43,977 --> 00:01:47,085 slides. But by having more bandwidth it will, you 22 00:01:47,085 --> 00:01:52,860 can effectively reduce the load of your network and that will decrease your 23 00:01:52,860 --> 00:01:56,669 will decrease the load in the network, and then increase the probability you're 24 00:01:56,669 --> 00:02:00,530 actually going to have two different messages contending for a same, link in 25 00:02:00,530 --> 00:02:04,294 the network. Leniency can actually affect our 26 00:02:04,294 --> 00:02:06,494 bandwidth, which is, interesting, 27 00:02:06,494 --> 00:02:10,340 or rather it can affect our delivered bandwidth. 28 00:02:10,340 --> 00:02:13,600 It's not going to make our, if we change the latency, it's not 29 00:02:13,600 --> 00:02:17,677 going to make our lengths wider or our clock speed of our latency faster, 30 00:02:17,677 --> 00:02:20,414 but if you make the delivered bandwidth higher. 31 00:02:20,414 --> 00:02:25,829 Now how this can happen is let's say you have something like a round-trip Your 32 00:02:25,829 --> 00:02:28,508 trying to communicate from point A to point B, 33 00:02:28,508 --> 00:02:32,143 back to point A. And, this is pretty common, you want to 34 00:02:32,143 --> 00:02:36,650 send a message from one node to another node, it's going to do some math on it. 35 00:02:36,650 --> 00:02:41,037 It's going to do some work it and it's going to send back the reply, and if you 36 00:02:41,037 --> 00:02:45,196 can't cover that latency, if the latency were to get longer, the 37 00:02:45,196 --> 00:02:47,137 sender will sit there and just stall more, 38 00:02:47,137 --> 00:02:51,204 and it will effectively increase the it will decrease the bandwidth of the amount 39 00:02:51,204 --> 00:02:55,559 of data that can be sent. [COUGH] Now if you are good at hiding 40 00:02:55,559 --> 00:02:59,518 the, this latency by doing other work, that may not happen. 41 00:02:59,518 --> 00:03:05,889 You may not be limited by latency. But then, another good example is if you 42 00:03:05,889 --> 00:03:09,099 are worried about end to end flow control. 43 00:03:09,099 --> 00:03:12,996 So a good example of this is in TCP/IP networks. 44 00:03:12,996 --> 00:03:18,193 So like our ethernet networks. There's actually a round trip flow 45 00:03:18,193 --> 00:03:23,695 control between the two end points, which rates limits, the, the bandwidth. 46 00:03:23,695 --> 00:03:29,634 And it's actually tied to the latency. Because you need to have, more, traffic 47 00:03:29,634 --> 00:03:35,924 in flight to cover the round trip lanes see, and this starts to get, in j, starts 48 00:03:35,924 --> 00:03:41,696 to be called, what's called the ben with delay product, where you multiply your 49 00:03:41,696 --> 00:03:47,468 been with by the delay or the latency of your network, and if you increase the 50 00:03:47,468 --> 00:03:52,308 latency. The bandwidth will effectively go down if 51 00:03:52,308 --> 00:03:59,074 you do not allow for more traffic in flight before you wait, before you can 52 00:03:59,074 --> 00:04:02,083 hear a float control response. [COUGH]. 53 00:04:02,083 --> 00:04:06,366 So you'll see this if you have, let's say, two points on the internet. 54 00:04:06,366 --> 00:04:10,397 And you put'em farther apart. And you have the same amount of in 55 00:04:10,397 --> 00:04:13,861 flight, data, or what's called, the window is the same. 56 00:04:13,861 --> 00:04:16,947 [COUGH]. The bandwidth is going to go down as you 57 00:04:16,947 --> 00:04:20,726 increase the latency. But, if you were to increase the window, 58 00:04:20,726 --> 00:04:24,064 it would actually, stay high. because bandwidth delayed, probably. 59 00:04:24,064 --> 00:04:29,229 And the reason for that is you'd be waiting for acts to come back from the, 60 00:04:29,229 --> 00:04:36,785 the receive side. Okay, so let's take a look at, an example 61 00:04:36,785 --> 00:04:41,413 here, to understand these different parameters. 62 00:04:41,413 --> 00:04:51,640 We have a four node omega network here, with two inputs, two output, routers. 63 00:04:51,640 --> 00:04:55,396 Each of these circles here, represents input nodes, 64 00:04:55,396 --> 00:05:01,330 and these are the output nodes, and they basically wrap around, they're the same 65 00:05:01,330 --> 00:05:05,988 sort of thing, [COUGH]. We have little slashes here, which we'll 66 00:05:05,988 --> 00:05:09,218 represent as serializers and deserializers. 67 00:05:09,218 --> 00:05:14,477 So what this means is, you're transmitting some long piece of data, and 68 00:05:14,477 --> 00:05:18,140 it gets sent as smaller fits, if you will. 69 00:05:18,140 --> 00:05:23,704 So we're setting let's say a 32 bit word, and it gets serialized into four 8-bit 70 00:05:23,704 --> 00:05:27,789 chunks across our network, across the links, because the links in the network 71 00:05:27,789 --> 00:05:30,100 are only four, or excuse me, eight bits wide, 72 00:05:30,100 --> 00:05:31,516 we'll say. [COUGH]. 73 00:05:31,516 --> 00:05:40,240 And in this network we're going to have our latencies be, non, non-unit. 74 00:05:41,300 --> 00:05:46,978 So let's say our link traversal here, each link here takes two cycles, 75 00:05:46,978 --> 00:05:51,093 takes L0 and L1. And our routers take three cycles. 76 00:05:51,093 --> 00:05:55,290 R0, R1, and R2. And, to go from any point to any other 77 00:05:55,290 --> 00:06:01,050 point in this network, you have to go through two routers and one link. 78 00:06:01,050 --> 00:06:04,589 So we can draw a pipeline diagram for this. 79 00:06:04,589 --> 00:06:10,475 So for a given packet, we can see, let's say, it can split into 80 00:06:10,475 --> 00:06:15,360 four fits here, of the head fit, two body fits and a tail fit. 81 00:06:16,620 --> 00:06:22,622 We started the source and we sent, well it takes three cycles to make a routing 82 00:06:22,622 --> 00:06:28,397 decision through here, two cycles across the link, three cycles across one of 83 00:06:28,397 --> 00:06:32,500 these routers here and then we get to the destination. 84 00:06:33,860 --> 00:06:36,660 And if we look at this in time, it's pipelined. 85 00:06:36,660 --> 00:06:40,434 We can have multiple of these things go down at the same time. 86 00:06:40,434 --> 00:06:44,391 So if you have the next FLITs one cycle off, or one cycle delayed. 87 00:06:44,391 --> 00:06:48,896 And the reason we want to draw this, is we want to look at what our latency is 88 00:06:48,896 --> 00:06:51,877 for. This send sending this one packet. 89 00:06:51,877 --> 00:06:56,061 because it's a little bit hard to reason about, because we're effectively, have a 90 00:06:56,061 --> 00:06:58,760 pipeline here. We're overlapping different things. 91 00:07:00,660 --> 00:07:04,740 And we'll see that one of the things you'd think would be up there doesn't 92 00:07:04,740 --> 00:07:07,787 show up down here. So first let's take a look at this, we 93 00:07:07,787 --> 00:07:12,194 have four cycles here at the beginning which is just our serialization of a to z 94 00:07:12,194 --> 00:07:16,660 or the length of the packet divided by the bandwidth of the packet. 95 00:07:16,660 --> 00:07:20,287 If you were to increase the bandwidth here, the serialization latency would go 96 00:07:20,287 --> 00:07:25,017 down. [COUGH] Then we have, time in the router, 97 00:07:25,017 --> 00:07:30,243 which is our, router pipeline latency. So it's three cycles here, 98 00:07:30,243 --> 00:07:34,059 and another three cycles in the second, router. 99 00:07:34,059 --> 00:07:37,460 And if we have more hops, this will go up. 100 00:07:38,840 --> 00:07:43,880 And then two cycles here for the channel latency which we'll tall, call t c. 101 00:07:45,340 --> 00:07:49,329 So, you can see that it's the summation of all of these different latencies, 102 00:07:49,329 --> 00:07:52,460 is our latency, but what is interesting to see is that there is no 103 00:07:52,460 --> 00:07:57,610 deserialization latency here. So, that's the one that's missing, and 104 00:07:57,610 --> 00:08:02,458 it's because we've overlapped that because it's pipelined, we're counting 105 00:08:02,458 --> 00:08:13,100 that in the serialization latency. Questions about that so far. 106 00:08:17,600 --> 00:08:26,894 Okay, so now let's take a look at our message latency, and go into a little 107 00:08:26,894 --> 00:08:32,224 more detail here. If you look at our overall latency which 108 00:08:32,224 --> 00:08:39,686 we'll denote as t, it's the latency for the head to get to the receiver, so 109 00:08:39,686 --> 00:08:46,800 that's all of this stuff here, plus the serialization latency. 110 00:08:46,800 --> 00:08:55,570 Now, T head has our TC and our TR, and a number of hops, 111 00:08:55,570 --> 00:09:01,780 but it also has something here that is a contention, 112 00:09:01,780 --> 00:09:05,305 which we haven't shown. So, in this number here, there was no 113 00:09:05,305 --> 00:09:07,635 contention. This is an unloaded network. 114 00:09:07,635 --> 00:09:12,534 There was not multiple nodes or multiple messages trying to use one outbound link 115 00:09:12,534 --> 00:09:16,060 or use any one given link in this design. But it can happen. 116 00:09:17,700 --> 00:09:22,277 Let's say these two nodes here send at the same time and they both need to use 117 00:09:22,277 --> 00:09:25,060 this link. You're going to get contention, 118 00:09:25,060 --> 00:09:32,380 and that will increase our latency. But if we, if we rule out the contention 119 00:09:32,380 --> 00:09:36,253 for a little bit of time, we'll start to see the unloaded latency 120 00:09:36,253 --> 00:09:38,935 here. And we just decompose this into sort of 121 00:09:38,935 --> 00:09:48,200 sub-components here that we have the [COUGH] routing time, times the number of 122 00:09:48,200 --> 00:09:56,106 router hops that we need to go, plus the channel latency times the number 123 00:09:56,106 --> 00:10:01,000 of channel links we need to hop across, plus the serialization latency. 124 00:10:02,160 --> 00:10:06,116 And the reason we decomposed this, is this lets us reason about how to make 125 00:10:06,116 --> 00:10:09,900 networks faster. So we can see that there's a couple 126 00:10:09,900 --> 00:10:12,980 different ways to make our networks faster. 127 00:10:12,980 --> 00:10:15,708 First thing we do is we make shorter routes. 128 00:10:15,708 --> 00:10:18,684 That'll decrease both these upper case H's here. 129 00:10:18,684 --> 00:10:22,094 The reason I have two different upper case H's is it's. 130 00:10:22,094 --> 00:10:25,877 As you can see here, In this example, we basically went two 131 00:10:25,877 --> 00:10:29,907 router hops and one link hop. usually, they're connected, though. 132 00:10:29,907 --> 00:10:34,620 If you have to go farther, you need more links, and you need more router hops. 133 00:10:36,320 --> 00:10:40,389 You can make the routers faster. So you can either increase the clock 134 00:10:40,389 --> 00:10:45,166 frequency of the routers, you can make them wider if they take multiple cycles. 135 00:10:45,166 --> 00:10:47,938 [COUGH]. Now, if they're already, sort of, as fast 136 00:10:47,938 --> 00:10:52,007 as you can go, it may be hard. You might be able to increase the clock 137 00:10:52,007 --> 00:10:55,958 frequency somehow, but it is, it could start to get hard at some point, 138 00:10:55,958 --> 00:11:00,500 if there are already wide channels, and wide muxes, and have a fast clock rate. 139 00:11:00,500 --> 00:11:03,539 Faster channels. So if you go in between multiple chips, 140 00:11:03,539 --> 00:11:07,738 usually you're limited sort of by the signal integrity of the communication 141 00:11:07,738 --> 00:11:11,828 links between the different chips. And this sometimes even happens on chip. 142 00:11:11,828 --> 00:11:16,027 So you have to think about that, that going to a higher clock frequency could 143 00:11:16,027 --> 00:11:19,453 be, could be problematic. But if you make a faster channel, your 144 00:11:19,453 --> 00:11:23,560 latency's going to go down, and then, finally, 145 00:11:23,560 --> 00:11:26,116 this is our serialization sort of cost here. 146 00:11:26,116 --> 00:11:30,008 And we bake into it here either wider channels or shorter messages. 147 00:11:30,008 --> 00:11:32,855 Maybe you have a lot of overhead on each message. 148 00:11:32,855 --> 00:11:35,702 You have a, a, a really big header for the message. 149 00:11:35,702 --> 00:11:40,233 If you try to shrink that, that'll make your network go faster and reduce your 150 00:11:40,233 --> 00:11:42,208 latency. Just by sending less work, 151 00:11:42,208 --> 00:11:45,752 or sending less data, but that may not always be possible. 152 00:11:45,752 --> 00:11:50,632 but I'll give you an example of this. If you look at something like, TCP on top 153 00:11:50,632 --> 00:11:53,770 of IP networks, in sort of our internet class networks, 154 00:11:53,770 --> 00:11:57,222 people have proposed a whole bunch of, revisions to that, 155 00:11:57,222 --> 00:12:00,090 where they try to sort of squeeze out some bytes. 156 00:12:00,090 --> 00:12:04,947 Or use some sort of encoding standards to reduce the amount of data in the headers. 157 00:12:04,947 --> 00:12:08,048 Because TCP header's pretty, pretty long, for instance. 158 00:12:08,048 --> 00:12:10,565 And you already see a good example of that, 159 00:12:10,565 --> 00:12:15,188 they actually have an optional field in TCP headers, which you, is typically not 160 00:12:15,188 --> 00:12:17,880 sent, thereby reducing that in the common case. 161 00:12:20,520 --> 00:12:30,500 Okay, so now let's talk about the effects of congestion. 162 00:12:32,700 --> 00:12:43,502 [COUGH], so what I, what I drew here is. A plot of our latency versus the amount 163 00:12:43,502 --> 00:12:48,300 of bandwidth that is achieved, or offer bandwidth. 164 00:12:49,560 --> 00:12:57,380 And this is for a given network. So it's pretty common that as you 165 00:12:58,820 --> 00:13:03,460 increase the bandwidth that you're using at any given network, 166 00:13:03,460 --> 00:13:08,754 the latency of the network goes up because you start to see more congestion 167 00:13:08,754 --> 00:13:14,240 in the network. So the probability that any two points 168 00:13:14,240 --> 00:13:21,500 are can, are contended for will go up as you get closer to the maximum amount of 169 00:13:21,500 --> 00:13:25,660 maximum achieve in the bandwidth in the bandwidth. 170 00:13:25,660 --> 00:13:31,016 [COUGH] Now there are some networks that people build where this is not the graph 171 00:13:31,016 --> 00:13:35,180 does not look like this [COUGH] or does not look like that. 172 00:13:35,180 --> 00:13:39,030 So for instance, if you have a start apology, you don't have any congestion. 173 00:13:39,030 --> 00:13:43,089 So, you're going to get something that looks much more like an ideal plot here 174 00:13:43,089 --> 00:13:47,540 where if you're going to have a straight line and another straight line. 175 00:13:47,540 --> 00:13:52,853 Because as you increase your load to the network, everyone can send to everyone 176 00:13:52,853 --> 00:13:56,600 else, so it's not going to be congestion in the network. 177 00:13:57,820 --> 00:14:04,139 And I have a few lines here that sort of show interesting things that sort of hack 178 00:14:04,139 --> 00:14:08,022 down at this. So, in a perfect world you'd have your 179 00:14:08,022 --> 00:14:13,808 zero load latency, so this is the latency of the unloaded network, and as you 180 00:14:13,808 --> 00:14:18,440 increase the bandwidth. It wouldn't change. 181 00:14:19,980 --> 00:14:25,420 On our unloaded network. And with so. 182 00:14:25,420 --> 00:14:29,320 And, and if you had no, no conjecture in the network. 183 00:14:30,440 --> 00:14:34,773 But we start and, and but that's, that's not usually what you see on, on real 184 00:14:34,773 --> 00:14:39,930 world networks. Couple of things sort of, also, decrease 185 00:14:39,930 --> 00:14:46,360 or increase the latency and decrease the bandwidth of a network. 186 00:14:46,360 --> 00:14:50,977 Usually, you have some routing delay that gets introduced into the network. 187 00:14:50,977 --> 00:14:55,657 And that's going to basically push us away from higher bandwidth and lower 188 00:14:55,657 --> 00:14:58,465 latency. So you want to be farther down in this 189 00:14:58,465 --> 00:15:00,962 plot, 'cause that's lower latency. [COUGH]. 190 00:15:00,962 --> 00:15:04,019 And also, if you have flow control in the network. 191 00:15:04,019 --> 00:15:07,701 So, local flow control. That also looks like, some form of 192 00:15:07,701 --> 00:15:10,946 congestion. It'll actually slow down your network in 193 00:15:10,946 --> 00:15:15,080 certain cases. But I just wanted to give you guys the 194 00:15:15,080 --> 00:15:17,988 idea here that. For any, real world network it usually. 195 00:15:17,988 --> 00:15:21,391 Looks something like this. And as you get closer and closer, to 196 00:15:21,391 --> 00:15:24,848 using the whole network. You're using all the bits available, by 197 00:15:24,848 --> 00:15:27,043 the network. The latency starts to shoot 198 00:15:27,043 --> 00:15:28,800 asymptotically, through the roof.