Okay, so now we start talking about 
performance of our networks, and there's two. 
Main thoughts I want to get across here, there are two really godo ways to measure 
our networks, bef, when we talked about performance here we were talking about 
parameters rather of the toplogy now we're going to look at the overall 
network performance. First thing is, bandwidth. 
So bandwidth is the rate of data that can be transmitted over a given network link, 
divided by amount of time. Okay, that sounds pretty reasonable. 
Latency is how long it takes to communicate. 
And send a completed message between a sender and a receiver, in seconds. 
So, the unit on this is seconds, the unit on this is something like bits 
per second, or bytes per second, so the amount of data per second. 
These two things are linked. So, if we take a look at something like 
bandwidth, it can actually effect our layency. 
And the reason for this is, [COUGH], if you increase the bandwidth, 
you are, are going to have to send fewer pieces of data for a long message. 
Because you can send it in wider chunks, or faster chunks, 
or something like that. So it can actually help with latency. 
It can also help with latency because it can help reduce your congestion on your 
network. Now we haven't talked about congestion 
yet, we'll talk about it in a few more slides. 
But by having more bandwidth it will, you can effectively reduce the load of your 
network and that will decrease your will decrease the load in the network, 
and then increase the probability you're actually going to have two different 
messages contending for a same, link in the network. 
Leniency can actually affect our bandwidth, 
which is, interesting, or rather it can affect our delivered 
bandwidth. It's not going to make our, 
if we change the latency, it's not going to make our lengths wider or our 
clock speed of our latency faster, but if you make the delivered bandwidth 
higher. Now how this can happen is let's say you 
have something like a round-trip Your trying to communicate from point A to 
point B, back to point A. 
And, this is pretty common, you want to send a message from one node to another 
node, it's going to do some math on it. It's going to do some work it and it's 
going to send back the reply, and if you can't cover that latency, 
if the latency were to get longer, the sender will sit there and just stall 
more, and it will effectively increase the it 
will decrease the bandwidth of the amount of data that can be sent. 
[COUGH] Now if you are good at hiding the, this latency by doing other work, 
that may not happen. You may not be limited by latency. 
But then, another good example is if you are worried about end to end flow 
control. So a good example of this is in TCP/IP 
networks. So like our ethernet networks. 
There's actually a round trip flow control between the two end points, which 
rates limits, the, the bandwidth. And it's actually tied to the latency. 
Because you need to have, more, traffic in flight to cover the round trip lanes 
see, and this starts to get, in j, starts to be called, what's called the ben with 
delay product, where you multiply your been with by the delay or the latency of 
your network, and if you increase the latency. 
The bandwidth will effectively go down if you do not allow for more traffic in 
flight before you wait, before you can hear a float control response. 
[COUGH]. So you'll see this if you have, let's 
say, two points on the internet. And you put'em farther apart. 
And you have the same amount of in flight, data, or what's called, the 
window is the same. [COUGH]. 
The bandwidth is going to go down as you increase the latency. 
But, if you were to increase the window, it would actually, stay high. 
because bandwidth delayed, probably. And the reason for that is you'd be 
waiting for acts to come back from the, the receive side. 
Okay, so let's take a look at, an example here, to understand these different 
parameters. We have a four node omega network here, 
with two inputs, two output, routers. Each of these circles here, represents 
input nodes, and these are the output nodes, and they 
basically wrap around, they're the same sort of thing, [COUGH]. 
We have little slashes here, which we'll represent as serializers and 
deserializers. So what this means is, you're 
transmitting some long piece of data, and it gets sent as smaller fits, if you 
will. So we're setting let's say a 32 bit word, 
and it gets serialized into four 8-bit chunks across our network, across the 
links, because the links in the network are only four, or excuse me, eight bits 
wide, we'll say. 
[COUGH]. And in this network we're going to have 
our latencies be, non, non-unit. So let's say our link traversal here, 
each link here takes two cycles, takes L0 and L1. 
And our routers take three cycles. R0, R1, and R2. 
And, to go from any point to any other point in this network, you have to go 
through two routers and one link. So we can draw a pipeline diagram for 
this. So for a given packet, 
we can see, let's say, it can split into four fits here, of the head fit, two body 
fits and a tail fit. We started the source and we sent, well 
it takes three cycles to make a routing decision through here, two cycles across 
the link, three cycles across one of these routers here and then we get to the 
destination. And if we look at this in time, it's 
pipelined. We can have multiple of these things go 
down at the same time. So if you have the next FLITs one cycle 
off, or one cycle delayed. And the reason we want to draw this, is 
we want to look at what our latency is for. 
This send sending this one packet. because it's a little bit hard to reason 
about, because we're effectively, have a pipeline here. 
We're overlapping different things. And we'll see that one of the things 
you'd think would be up there doesn't show up down here. 
So first let's take a look at this, we have four cycles here at the beginning 
which is just our serialization of a to z or the length of the packet divided by 
the bandwidth of the packet. If you were to increase the bandwidth 
here, the serialization latency would go down. 
[COUGH] Then we have, time in the router, which is our, router pipeline latency. 
So it's three cycles here, and another three cycles in the second, 
router. And if we have more hops, this will go 
up. And then two cycles here for the channel 
latency which we'll tall, call t c. So, you can see that it's the summation 
of all of these different latencies, is our latency, but what is interesting 
to see is that there is no deserialization latency here. 
So, that's the one that's missing, and it's because we've overlapped that 
because it's pipelined, we're counting that in the serialization latency. 
Questions about that so far. Okay, so now let's take a look at our 
message latency, and go into a little more detail here. 
If you look at our overall latency which we'll denote as t, it's the latency for 
the head to get to the receiver, so that's all of this stuff here, plus the 
serialization latency. Now, T head has our TC and our TR, and a 
number of hops, but it also has something here that is a 
contention, which we haven't shown. 
So, in this number here, there was no contention. 
This is an unloaded network. There was not multiple nodes or multiple 
messages trying to use one outbound link or use any one given link in this design. 
But it can happen. Let's say these two nodes here send at 
the same time and they both need to use this link. 
You're going to get contention, and that will increase our latency. 
But if we, if we rule out the contention for a little bit of time, 
we'll start to see the unloaded latency here. 
And we just decompose this into sort of sub-components here that we have the 
[COUGH] routing time, times the number of router hops that we need to go, 
plus the channel latency times the number of channel links we need to hop across, 
plus the serialization latency. And the reason we decomposed this, is 
this lets us reason about how to make networks faster. 
So we can see that there's a couple different ways to make our networks 
faster. First thing we do is we make shorter 
routes. That'll decrease both these upper case 
H's here. The reason I have two different upper 
case H's is it's. As you can see here, 
In this example, we basically went two router hops and one link hop. 
usually, they're connected, though. If you have to go farther, you need more 
links, and you need more router hops. You can make the routers faster. 
So you can either increase the clock frequency of the routers, you can make 
them wider if they take multiple cycles. [COUGH]. 
Now, if they're already, sort of, as fast as you can go, it may be hard. 
You might be able to increase the clock frequency somehow, but it is, it could 
start to get hard at some point, if there are already wide channels, and 
wide muxes, and have a fast clock rate. Faster channels. 
So if you go in between multiple chips, usually you're limited sort of by the 
signal integrity of the communication links between the different chips. 
And this sometimes even happens on chip. So you have to think about that, that 
going to a higher clock frequency could be, could be problematic. 
But if you make a faster channel, your latency's going to go down, 
and then, finally, this is our serialization sort of cost 
here. And we bake into it here either wider 
channels or shorter messages. Maybe you have a lot of overhead on each 
message. You have a, a, a really big header for 
the message. If you try to shrink that, that'll make 
your network go faster and reduce your latency. 
Just by sending less work, or sending less data, 
but that may not always be possible. but I'll give you an example of this. 
If you look at something like, TCP on top of IP networks, 
in sort of our internet class networks, people have proposed a whole bunch of, 
revisions to that, where they try to sort of squeeze out 
some bytes. Or use some sort of encoding standards to 
reduce the amount of data in the headers. Because TCP header's pretty, pretty long, 
for instance. And you already see a good example of 
that, they actually have an optional field in 
TCP headers, which you, is typically not sent, 
thereby reducing that in the common case. Okay, so now let's talk about the effects 
of congestion. [COUGH], so what I, what I drew here is. 
A plot of our latency versus the amount of bandwidth that is achieved, or offer 
bandwidth. And this is for a given network. 
So it's pretty common that as you increase the bandwidth that you're using 
at any given network, the latency of the network goes up 
because you start to see more congestion in the network. 
So the probability that any two points are can, are contended for will go up as 
you get closer to the maximum amount of maximum achieve in the bandwidth in the 
bandwidth. [COUGH] Now there are some networks that 
people build where this is not the graph does not look like this [COUGH] or does 
not look like that. So for instance, if you have a start 
apology, you don't have any congestion. So, you're going to get something that 
looks much more like an ideal plot here where if you're going to have a straight 
line and another straight line. Because as you increase your load to the 
network, everyone can send to everyone else, so it's not going to be congestion 
in the network. And I have a few lines here that sort of 
show interesting things that sort of hack down at this. 
So, in a perfect world you'd have your zero load latency, so this is the latency 
of the unloaded network, and as you increase the bandwidth. 
It wouldn't change. On our unloaded network. 
And with so. And, and if you had no, no conjecture in 
the network. But we start and, and but that's, that's 
not usually what you see on, on real world networks. 
Couple of things sort of, also, decrease or increase the latency and decrease the 
bandwidth of a network. Usually, you have some routing delay that 
gets introduced into the network. And that's going to basically push us 
away from higher bandwidth and lower latency. 
So you want to be farther down in this plot, 'cause that's lower latency. 
[COUGH]. And also, if you have flow control in the 
network. So, local flow control. 
That also looks like, some form of congestion. 
It'll actually slow down your network in certain cases. 
But I just wanted to give you guys the idea here that. 
For any, real world network it usually. Looks something like this. 
And as you get closer and closer, to using the whole network. 
You're using all the bits available, by the network. 
The latency starts to shoot asymptotically, through the roof.