1
00:00:03,420 --> 00:00:09,761
Okay, so now we start talking about 
performance of our networks, and there's 

2
00:00:09,761 --> 00:00:13,610
two. 
Main thoughts I want to get across here, 

3
00:00:13,610 --> 00:00:18,306
there are two really godo ways to measure 
our networks, bef, when we talked about 

4
00:00:18,306 --> 00:00:22,767
performance here we were talking about 
parameters rather of the toplogy now 

5
00:00:22,767 --> 00:00:26,160
we're going to look at the overall 
network performance. 

6
00:00:26,160 --> 00:00:31,781
First thing is, bandwidth. 
So bandwidth is the rate of data that can 

7
00:00:31,781 --> 00:00:39,834
be transmitted over a given network link, 
divided by amount of time. 

8
00:00:39,834 --> 00:00:45,750
Okay, that sounds pretty reasonable. 
Latency is how long it takes to 

9
00:00:45,750 --> 00:00:49,955
communicate. 
And send a completed message between a 

10
00:00:49,955 --> 00:00:54,673
sender and a receiver, in seconds. 
So, the unit on this is seconds, 

11
00:00:54,673 --> 00:00:59,907
the unit on this is something like bits 
per second, or bytes per second, 

12
00:00:59,907 --> 00:01:06,655
so the amount of data per second. 
These two things are linked. 

13
00:01:06,655 --> 00:01:12,615
So, if we take a look at something like 
bandwidth, it can actually effect our 

14
00:01:12,615 --> 00:01:16,222
layency. 
And the reason for this is, [COUGH], if 

15
00:01:16,222 --> 00:01:22,323
you increase the bandwidth, 
you are, are going to have to send fewer 

16
00:01:22,323 --> 00:01:28,244
pieces of data for a long message. 
Because you can send it in wider chunks, 

17
00:01:28,244 --> 00:01:30,833
or faster chunks, 
or something like that. 

18
00:01:30,833 --> 00:01:35,884
So it can actually help with latency. 
It can also help with latency because it 

19
00:01:35,884 --> 00:01:38,991
can help reduce your congestion on your 
network. 

20
00:01:38,991 --> 00:01:43,977
Now we haven't talked about congestion 
yet, we'll talk about it in a few more 

21
00:01:43,977 --> 00:01:47,085
slides. 
But by having more bandwidth it will, you 

22
00:01:47,085 --> 00:01:52,860
can effectively reduce the load of your 
network and that will decrease your 

23
00:01:52,860 --> 00:01:56,669
will decrease the load in the network, 
and then increase the probability you're 

24
00:01:56,669 --> 00:02:00,530
actually going to have two different 
messages contending for a same, link in 

25
00:02:00,530 --> 00:02:04,294
the network. 
Leniency can actually affect our 

26
00:02:04,294 --> 00:02:06,494
bandwidth, 
which is, interesting, 

27
00:02:06,494 --> 00:02:10,340
or rather it can affect our delivered 
bandwidth. 

28
00:02:10,340 --> 00:02:13,600
It's not going to make our, 
if we change the latency, it's not 

29
00:02:13,600 --> 00:02:17,677
going to make our lengths wider or our 
clock speed of our latency faster, 

30
00:02:17,677 --> 00:02:20,414
but if you make the delivered bandwidth 
higher. 

31
00:02:20,414 --> 00:02:25,829
Now how this can happen is let's say you 
have something like a round-trip Your 

32
00:02:25,829 --> 00:02:28,508
trying to communicate from point A to 
point B, 

33
00:02:28,508 --> 00:02:32,143
back to point A. 
And, this is pretty common, you want to 

34
00:02:32,143 --> 00:02:36,650
send a message from one node to another 
node, it's going to do some math on it. 

35
00:02:36,650 --> 00:02:41,037
It's going to do some work it and it's 
going to send back the reply, and if you 

36
00:02:41,037 --> 00:02:45,196
can't cover that latency, 
if the latency were to get longer, the 

37
00:02:45,196 --> 00:02:47,137
sender will sit there and just stall 
more, 

38
00:02:47,137 --> 00:02:51,204
and it will effectively increase the it 
will decrease the bandwidth of the amount 

39
00:02:51,204 --> 00:02:55,559
of data that can be sent. 
[COUGH] Now if you are good at hiding 

40
00:02:55,559 --> 00:02:59,518
the, this latency by doing other work, 
that may not happen. 

41
00:02:59,518 --> 00:03:05,889
You may not be limited by latency. 
But then, another good example is if you 

42
00:03:05,889 --> 00:03:09,099
are worried about end to end flow 
control. 

43
00:03:09,099 --> 00:03:12,996
So a good example of this is in TCP/IP 
networks. 

44
00:03:12,996 --> 00:03:18,193
So like our ethernet networks. 
There's actually a round trip flow 

45
00:03:18,193 --> 00:03:23,695
control between the two end points, which 
rates limits, the, the bandwidth. 

46
00:03:23,695 --> 00:03:29,634
And it's actually tied to the latency. 
Because you need to have, more, traffic 

47
00:03:29,634 --> 00:03:35,924
in flight to cover the round trip lanes 
see, and this starts to get, in j, starts 

48
00:03:35,924 --> 00:03:41,696
to be called, what's called the ben with 
delay product, where you multiply your 

49
00:03:41,696 --> 00:03:47,468
been with by the delay or the latency of 
your network, and if you increase the 

50
00:03:47,468 --> 00:03:52,308
latency. 
The bandwidth will effectively go down if 

51
00:03:52,308 --> 00:03:59,074
you do not allow for more traffic in 
flight before you wait, before you can 

52
00:03:59,074 --> 00:04:02,083
hear a float control response. 
[COUGH]. 

53
00:04:02,083 --> 00:04:06,366
So you'll see this if you have, let's 
say, two points on the internet. 

54
00:04:06,366 --> 00:04:10,397
And you put'em farther apart. 
And you have the same amount of in 

55
00:04:10,397 --> 00:04:13,861
flight, data, or what's called, the 
window is the same. 

56
00:04:13,861 --> 00:04:16,947
[COUGH]. 
The bandwidth is going to go down as you 

57
00:04:16,947 --> 00:04:20,726
increase the latency. 
But, if you were to increase the window, 

58
00:04:20,726 --> 00:04:24,064
it would actually, stay high. 
because bandwidth delayed, probably. 

59
00:04:24,064 --> 00:04:29,229
And the reason for that is you'd be 
waiting for acts to come back from the, 

60
00:04:29,229 --> 00:04:36,785
the receive side. 
Okay, so let's take a look at, an example 

61
00:04:36,785 --> 00:04:41,413
here, to understand these different 
parameters. 

62
00:04:41,413 --> 00:04:51,640
We have a four node omega network here, 
with two inputs, two output, routers. 

63
00:04:51,640 --> 00:04:55,396
Each of these circles here, represents 
input nodes, 

64
00:04:55,396 --> 00:05:01,330
and these are the output nodes, and they 
basically wrap around, they're the same 

65
00:05:01,330 --> 00:05:05,988
sort of thing, [COUGH]. 
We have little slashes here, which we'll 

66
00:05:05,988 --> 00:05:09,218
represent as serializers and 
deserializers. 

67
00:05:09,218 --> 00:05:14,477
So what this means is, you're 
transmitting some long piece of data, and 

68
00:05:14,477 --> 00:05:18,140
it gets sent as smaller fits, if you 
will. 

69
00:05:18,140 --> 00:05:23,704
So we're setting let's say a 32 bit word, 
and it gets serialized into four 8-bit 

70
00:05:23,704 --> 00:05:27,789
chunks across our network, across the 
links, because the links in the network 

71
00:05:27,789 --> 00:05:30,100
are only four, or excuse me, eight bits 
wide, 

72
00:05:30,100 --> 00:05:31,516
we'll say. 
[COUGH]. 

73
00:05:31,516 --> 00:05:40,240
And in this network we're going to have 
our latencies be, non, non-unit. 

74
00:05:41,300 --> 00:05:46,978
So let's say our link traversal here, 
each link here takes two cycles, 

75
00:05:46,978 --> 00:05:51,093
takes L0 and L1. 
And our routers take three cycles. 

76
00:05:51,093 --> 00:05:55,290
R0, R1, and R2. 
And, to go from any point to any other 

77
00:05:55,290 --> 00:06:01,050
point in this network, you have to go 
through two routers and one link. 

78
00:06:01,050 --> 00:06:04,589
So we can draw a pipeline diagram for 
this. 

79
00:06:04,589 --> 00:06:10,475
So for a given packet, 
we can see, let's say, it can split into 

80
00:06:10,475 --> 00:06:15,360
four fits here, of the head fit, two body 
fits and a tail fit. 

81
00:06:16,620 --> 00:06:22,622
We started the source and we sent, well 
it takes three cycles to make a routing 

82
00:06:22,622 --> 00:06:28,397
decision through here, two cycles across 
the link, three cycles across one of 

83
00:06:28,397 --> 00:06:32,500
these routers here and then we get to the 
destination. 

84
00:06:33,860 --> 00:06:36,660
And if we look at this in time, it's 
pipelined. 

85
00:06:36,660 --> 00:06:40,434
We can have multiple of these things go 
down at the same time. 

86
00:06:40,434 --> 00:06:44,391
So if you have the next FLITs one cycle 
off, or one cycle delayed. 

87
00:06:44,391 --> 00:06:48,896
And the reason we want to draw this, is 
we want to look at what our latency is 

88
00:06:48,896 --> 00:06:51,877
for. 
This send sending this one packet. 

89
00:06:51,877 --> 00:06:56,061
because it's a little bit hard to reason 
about, because we're effectively, have a 

90
00:06:56,061 --> 00:06:58,760
pipeline here. 
We're overlapping different things. 

91
00:07:00,660 --> 00:07:04,740
And we'll see that one of the things 
you'd think would be up there doesn't 

92
00:07:04,740 --> 00:07:07,787
show up down here. 
So first let's take a look at this, we 

93
00:07:07,787 --> 00:07:12,194
have four cycles here at the beginning 
which is just our serialization of a to z 

94
00:07:12,194 --> 00:07:16,660
or the length of the packet divided by 
the bandwidth of the packet. 

95
00:07:16,660 --> 00:07:20,287
If you were to increase the bandwidth 
here, the serialization latency would go 

96
00:07:20,287 --> 00:07:25,017
down. 
[COUGH] Then we have, time in the router, 

97
00:07:25,017 --> 00:07:30,243
which is our, router pipeline latency. 
So it's three cycles here, 

98
00:07:30,243 --> 00:07:34,059
and another three cycles in the second, 
router. 

99
00:07:34,059 --> 00:07:37,460
And if we have more hops, this will go 
up. 

100
00:07:38,840 --> 00:07:43,880
And then two cycles here for the channel 
latency which we'll tall, call t c. 

101
00:07:45,340 --> 00:07:49,329
So, you can see that it's the summation 
of all of these different latencies, 

102
00:07:49,329 --> 00:07:52,460
is our latency, but what is interesting 
to see is that there is no 

103
00:07:52,460 --> 00:07:57,610
deserialization latency here. 
So, that's the one that's missing, and 

104
00:07:57,610 --> 00:08:02,458
it's because we've overlapped that 
because it's pipelined, we're counting 

105
00:08:02,458 --> 00:08:13,100
that in the serialization latency. 
Questions about that so far. 

106
00:08:17,600 --> 00:08:26,894
Okay, so now let's take a look at our 
message latency, and go into a little 

107
00:08:26,894 --> 00:08:32,224
more detail here. 
If you look at our overall latency which 

108
00:08:32,224 --> 00:08:39,686
we'll denote as t, it's the latency for 
the head to get to the receiver, so 

109
00:08:39,686 --> 00:08:46,800
that's all of this stuff here, plus the 
serialization latency. 

110
00:08:46,800 --> 00:08:55,570
Now, T head has our TC and our TR, and a 
number of hops, 

111
00:08:55,570 --> 00:09:01,780
but it also has something here that is a 
contention, 

112
00:09:01,780 --> 00:09:05,305
which we haven't shown. 
So, in this number here, there was no 

113
00:09:05,305 --> 00:09:07,635
contention. 
This is an unloaded network. 

114
00:09:07,635 --> 00:09:12,534
There was not multiple nodes or multiple 
messages trying to use one outbound link 

115
00:09:12,534 --> 00:09:16,060
or use any one given link in this design. 
But it can happen. 

116
00:09:17,700 --> 00:09:22,277
Let's say these two nodes here send at 
the same time and they both need to use 

117
00:09:22,277 --> 00:09:25,060
this link. 
You're going to get contention, 

118
00:09:25,060 --> 00:09:32,380
and that will increase our latency. 
But if we, if we rule out the contention 

119
00:09:32,380 --> 00:09:36,253
for a little bit of time, 
we'll start to see the unloaded latency 

120
00:09:36,253 --> 00:09:38,935
here. 
And we just decompose this into sort of 

121
00:09:38,935 --> 00:09:48,200
sub-components here that we have the 
[COUGH] routing time, times the number of 

122
00:09:48,200 --> 00:09:56,106
router hops that we need to go, 
plus the channel latency times the number 

123
00:09:56,106 --> 00:10:01,000
of channel links we need to hop across, 
plus the serialization latency. 

124
00:10:02,160 --> 00:10:06,116
And the reason we decomposed this, is 
this lets us reason about how to make 

125
00:10:06,116 --> 00:10:09,900
networks faster. 
So we can see that there's a couple 

126
00:10:09,900 --> 00:10:12,980
different ways to make our networks 
faster. 

127
00:10:12,980 --> 00:10:15,708
First thing we do is we make shorter 
routes. 

128
00:10:15,708 --> 00:10:18,684
That'll decrease both these upper case 
H's here. 

129
00:10:18,684 --> 00:10:22,094
The reason I have two different upper 
case H's is it's. 

130
00:10:22,094 --> 00:10:25,877
As you can see here, 
In this example, we basically went two 

131
00:10:25,877 --> 00:10:29,907
router hops and one link hop. 
usually, they're connected, though. 

132
00:10:29,907 --> 00:10:34,620
If you have to go farther, you need more 
links, and you need more router hops. 

133
00:10:36,320 --> 00:10:40,389
You can make the routers faster. 
So you can either increase the clock 

134
00:10:40,389 --> 00:10:45,166
frequency of the routers, you can make 
them wider if they take multiple cycles. 

135
00:10:45,166 --> 00:10:47,938
[COUGH]. 
Now, if they're already, sort of, as fast 

136
00:10:47,938 --> 00:10:52,007
as you can go, it may be hard. 
You might be able to increase the clock 

137
00:10:52,007 --> 00:10:55,958
frequency somehow, but it is, it could 
start to get hard at some point, 

138
00:10:55,958 --> 00:11:00,500
if there are already wide channels, and 
wide muxes, and have a fast clock rate. 

139
00:11:00,500 --> 00:11:03,539
Faster channels. 
So if you go in between multiple chips, 

140
00:11:03,539 --> 00:11:07,738
usually you're limited sort of by the 
signal integrity of the communication 

141
00:11:07,738 --> 00:11:11,828
links between the different chips. 
And this sometimes even happens on chip. 

142
00:11:11,828 --> 00:11:16,027
So you have to think about that, that 
going to a higher clock frequency could 

143
00:11:16,027 --> 00:11:19,453
be, could be problematic. 
But if you make a faster channel, your 

144
00:11:19,453 --> 00:11:23,560
latency's going to go down, 
and then, finally, 

145
00:11:23,560 --> 00:11:26,116
this is our serialization sort of cost 
here. 

146
00:11:26,116 --> 00:11:30,008
And we bake into it here either wider 
channels or shorter messages. 

147
00:11:30,008 --> 00:11:32,855
Maybe you have a lot of overhead on each 
message. 

148
00:11:32,855 --> 00:11:35,702
You have a, a, a really big header for 
the message. 

149
00:11:35,702 --> 00:11:40,233
If you try to shrink that, that'll make 
your network go faster and reduce your 

150
00:11:40,233 --> 00:11:42,208
latency. 
Just by sending less work, 

151
00:11:42,208 --> 00:11:45,752
or sending less data, 
but that may not always be possible. 

152
00:11:45,752 --> 00:11:50,632
but I'll give you an example of this. 
If you look at something like, TCP on top 

153
00:11:50,632 --> 00:11:53,770
of IP networks, 
in sort of our internet class networks, 

154
00:11:53,770 --> 00:11:57,222
people have proposed a whole bunch of, 
revisions to that, 

155
00:11:57,222 --> 00:12:00,090
where they try to sort of squeeze out 
some bytes. 

156
00:12:00,090 --> 00:12:04,947
Or use some sort of encoding standards to 
reduce the amount of data in the headers. 

157
00:12:04,947 --> 00:12:08,048
Because TCP header's pretty, pretty long, 
for instance. 

158
00:12:08,048 --> 00:12:10,565
And you already see a good example of 
that, 

159
00:12:10,565 --> 00:12:15,188
they actually have an optional field in 
TCP headers, which you, is typically not 

160
00:12:15,188 --> 00:12:17,880
sent, 
thereby reducing that in the common case. 

161
00:12:20,520 --> 00:12:30,500
Okay, so now let's talk about the effects 
of congestion. 

162
00:12:32,700 --> 00:12:43,502
[COUGH], so what I, what I drew here is. 
A plot of our latency versus the amount 

163
00:12:43,502 --> 00:12:48,300
of bandwidth that is achieved, or offer 
bandwidth. 

164
00:12:49,560 --> 00:12:57,380
And this is for a given network. 
So it's pretty common that as you 

165
00:12:58,820 --> 00:13:03,460
increase the bandwidth that you're using 
at any given network, 

166
00:13:03,460 --> 00:13:08,754
the latency of the network goes up 
because you start to see more congestion 

167
00:13:08,754 --> 00:13:14,240
in the network. 
So the probability that any two points 

168
00:13:14,240 --> 00:13:21,500
are can, are contended for will go up as 
you get closer to the maximum amount of 

169
00:13:21,500 --> 00:13:25,660
maximum achieve in the bandwidth in the 
bandwidth. 

170
00:13:25,660 --> 00:13:31,016
[COUGH] Now there are some networks that 
people build where this is not the graph 

171
00:13:31,016 --> 00:13:35,180
does not look like this [COUGH] or does 
not look like that. 

172
00:13:35,180 --> 00:13:39,030
So for instance, if you have a start 
apology, you don't have any congestion. 

173
00:13:39,030 --> 00:13:43,089
So, you're going to get something that 
looks much more like an ideal plot here 

174
00:13:43,089 --> 00:13:47,540
where if you're going to have a straight 
line and another straight line. 

175
00:13:47,540 --> 00:13:52,853
Because as you increase your load to the 
network, everyone can send to everyone 

176
00:13:52,853 --> 00:13:56,600
else, so it's not going to be congestion 
in the network. 

177
00:13:57,820 --> 00:14:04,139
And I have a few lines here that sort of 
show interesting things that sort of hack 

178
00:14:04,139 --> 00:14:08,022
down at this. 
So, in a perfect world you'd have your 

179
00:14:08,022 --> 00:14:13,808
zero load latency, so this is the latency 
of the unloaded network, and as you 

180
00:14:13,808 --> 00:14:18,440
increase the bandwidth. 
It wouldn't change. 

181
00:14:19,980 --> 00:14:25,420
On our unloaded network. 
And with so. 

182
00:14:25,420 --> 00:14:29,320
And, and if you had no, no conjecture in 
the network. 

183
00:14:30,440 --> 00:14:34,773
But we start and, and but that's, that's 
not usually what you see on, on real 

184
00:14:34,773 --> 00:14:39,930
world networks. 
Couple of things sort of, also, decrease 

185
00:14:39,930 --> 00:14:46,360
or increase the latency and decrease the 
bandwidth of a network. 

186
00:14:46,360 --> 00:14:50,977
Usually, you have some routing delay that 
gets introduced into the network. 

187
00:14:50,977 --> 00:14:55,657
And that's going to basically push us 
away from higher bandwidth and lower 

188
00:14:55,657 --> 00:14:58,465
latency. 
So you want to be farther down in this 

189
00:14:58,465 --> 00:15:00,962
plot, 'cause that's lower latency. 
[COUGH]. 

190
00:15:00,962 --> 00:15:04,019
And also, if you have flow control in the 
network. 

191
00:15:04,019 --> 00:15:07,701
So, local flow control. 
That also looks like, some form of 

192
00:15:07,701 --> 00:15:10,946
congestion. 
It'll actually slow down your network in 

193
00:15:10,946 --> 00:15:15,080
certain cases. 
But I just wanted to give you guys the 

194
00:15:15,080 --> 00:15:17,988
idea here that. 
For any, real world network it usually. 

195
00:15:17,988 --> 00:15:21,391
Looks something like this. 
And as you get closer and closer, to 

196
00:15:21,391 --> 00:15:24,848
using the whole network. 
You're using all the bits available, by 

197
00:15:24,848 --> 00:15:27,043
the network. 
The latency starts to shoot 

198
00:15:27,043 --> 00:15:28,800
asymptotically, through the roof.