So today we're going to start off and it 
is our final installment of ELE475. We have to cover all of the rest of 
computer architecture in this one lecture. 
So, there's a lot to cover. A lot of things to discuss. 
But, more seriously, today, we are going to, going to be finishing up what we were 
talking about, with interconnection networks. 
Mainly, credit based flow control. A little bit about deadlock and that will 
complete our interconnection networks. And then we'll go on to more scalable 
cash coherent systems. So cash coherent systems that have more 
than let's say eight nodes. So we'll look at how to scale up to 
thousands of nodes, and we'll touch on one coherence protocol that, that works 
for that. And that's called directory based cash 
coherence. So we left of last time we were talking 
about flow control between two separate nodes in a in of action network. 
And we talked about sort of local link based or hop based full control which is 
where we spend the end of last class talking about. 
We also mentioned this end to end full control and end to end full control is 
important a good example of this is something where you have a core which is 
trying to communicate to a memory controller. 
And you don't want to overrun the buffer in the memory controller cause if you 
overrun the buffer in the memory controller your memory transactions just 
drop on the floor. so it's possible that your network 
connection is. Link level flow controlled or hop based 
flow controlled. But, you still need a end to end flow 
control inside of your chip or your set of chips in your system to be able to 
prevent you from overrunning some other buffer that's farther away. 
Now you could, for instance, back up into the network and have the local flow 
control all the way, back up all the way to the core. 
You may now want to do that for a variety of reasons. 
One. If you look at these memory protocols 
very carefully you could end up with something that actually starts to look 
like a deadlock pretty quick as you start to backup into networking get sort of 
priorities mixed. Also more, more insidiously here is that 
as you back up this is probably not good for performance you probably want to stem 
the flow of traffic as soon as you can because if you start jamming more data in 
there your just going to increase the contention on your network. 
And the latency will shoot through the roof on your network and all of a sudden 
you're, you're sort of in a very poor operating regime. 
So it's probably better just to preemptively back off and not overrun the 
buffers that are far away. So, you have to worry about end to end 
flow control. and this, this, there's lots of different 
schemes for this. Probably one of the better ones is that 
you send some data and you wait for acknowledgments to come back and you 
count your acknowledgments and this is effectively some credit based flow 
control. We talked a little bit about different 
ways to flow control link level. So just to recall here we had one Q, 
another Q and some link in the middle. This link may be pipelined. 
And we sent data this way. And at some point the receiver says oh, I 
can't take any more data. So it sends a stall wire. 
But if you do this around your entire chip, where it's all combinational. 
Where all these little blobs here are combinational logic. 
You're critical path gets very long so you can start to think about trying to 
put registers on this path. Unfortunately when you do that all of a 
sudden. This FIFO and this register can't react 
in time, if they're, a stall signal comes back. 
So if a stall signal is asserted, it's going to send the data no matter what. 
It takes a cycle for that to show up so you end up with something where you need 
to queue this last piece of data here into a buffer because this stall is not 
seen until a cycle later. And this is, we call this skid buffering. 
And you can have similar sorts of things where if you have let's say a flip flop 
here but you don't feed into this register you might need multiple entries 
of skid buffering. Now, if you have the wrong number of 
buffers here on the receiver in your skid buffering what's going to happen is you 
actually end up dropping data. So if you your protocol mean lets say two 
buffers and instead you put one buffer and you assert the storm as data is 
trying to transmit across the link of that time. 
You're going to loose a piece of data and that's, that's not very desirable. 
So this brings us to the end of what we were talking about last time which was 
credit based flow control and credit based flow control instead of having a 
stop signal or a on off flow control signal coming back or a stall signal 
instead, you keep a counter at the sender side, 
which keeps track of how many entries there are over here in the receiver side. 
And this can take into account you know thi-, this register here doesn't get 
counted it's, it's the end point FIFO space that will back up and the data can 
be stored into. So when it starts out, you, in, you, you 
set the counter if you want full band with you to be the same number as entry 
you had in the receiver, you just have to send data. 
Whenever you send the word, you decomate your counter. 
When the counter reaches zero, you stop sending because you know that. 
All of these buff all of the round-trip latency here of the, the data and the 
responses coming back, or the credits coming back. 
If the stall signal were to be asserted, or if you were not to get back credit in 
the instantaneous cycle you would need all those entries to skid into. 
[COUGH] When a word gets read out of this buffer here or out of this fifo here you 
send back a credit and this will increment your counter. 
And depending on how you implement this you could have multiple flip flops here 
multiple flip flops there. And really all this really ends up doing 
is it ends up figuring out your credit loop and how big this counter needs to 
be. One other nice benefit of this credit 
based flow control system is you can actually size the credit counter 
different then the number of actual entries. 
Now, why would you want to do this? Well, one reason is, you could actually 
build a network which has, only, let's say, half the bandwidth. 
By reducing the number of entries over here, and reducing the credit counter. 
Now, the round trip latency is longer. So then, the number of credits that you 
can have outstanding so what's going to happen is, you're going to send some 
data. And you're going to stall early, 
wait for some credits to come back and then start sending more data. 
So you can effectively give less than ideal bandwidth of cost of link but you 
can do of less offer space on the receive side and this is a lot better than the 
other an off base for control where if you don't have the right number of 
buffers. You actually end up using data so its 
like incorrect design. Here's is a performance concern.