1
00:00:03,200 --> 00:00:09,750
So today we're going to start off and it 
is our final installment of ELE475. 

2
00:00:09,750 --> 00:00:14,865
We have to cover all of the rest of 
computer architecture in this one 

3
00:00:14,865 --> 00:00:17,248
lecture. 
So, there's a lot to cover. 

4
00:00:17,248 --> 00:00:22,013
A lot of things to discuss. 
But, more seriously, today, we are going 

5
00:00:22,013 --> 00:00:27,549
to, going to be finishing up what we were 
talking about, with interconnection 

6
00:00:27,549 --> 00:00:30,492
networks. 
Mainly, credit based flow control. 

7
00:00:30,492 --> 00:00:36,098
A little bit about deadlock and that will 
complete our interconnection networks. 

8
00:00:36,098 --> 00:00:39,916
And then we'll go on to more scalable 
cash coherent systems. 

9
00:00:39,916 --> 00:00:43,892
So cash coherent systems that have more 
than let's say eight nodes. 

10
00:00:43,892 --> 00:00:48,283
So we'll look at how to scale up to 
thousands of nodes, and we'll touch on 

11
00:00:48,283 --> 00:00:51,132
one coherence protocol that, that works 
for that. 

12
00:00:51,132 --> 00:00:54,960
And that's called directory based cash 
coherence. 

13
00:00:54,960 --> 00:01:02,422
So we left of last time we were talking 
about flow control between two separate 

14
00:01:02,422 --> 00:01:07,301
nodes in a in of action network. 
And we talked about sort of local link 

15
00:01:07,301 --> 00:01:12,660
based or hop based full control which is 
where we spend the end of last class 

16
00:01:12,660 --> 00:01:16,302
talking about. 
We also mentioned this end to end full 

17
00:01:16,302 --> 00:01:21,662
control and end to end full control is 
important a good example of this is 

18
00:01:21,662 --> 00:01:27,268
something where you have a core which is 
trying to communicate to a memory 

19
00:01:27,268 --> 00:01:30,266
controller. 
And you don't want to overrun the buffer 

20
00:01:30,266 --> 00:01:34,242
in the memory controller cause if you 
overrun the buffer in the memory 

21
00:01:34,242 --> 00:01:37,210
controller your memory transactions just 
drop on the floor. 

22
00:01:37,210 --> 00:01:40,290
so it's possible that your network 
connection is. 

23
00:01:40,290 --> 00:01:44,413
Link level flow controlled or hop based 
flow controlled. 

24
00:01:44,413 --> 00:01:49,795
But, you still need a end to end flow 
control inside of your chip or your set 

25
00:01:49,795 --> 00:01:55,246
of chips in your system to be able to 
prevent you from overrunning some other 

26
00:01:55,246 --> 00:01:59,929
buffer that's farther away. 
Now you could, for instance, back up into 

27
00:01:59,929 --> 00:02:05,241
the network and have the local flow 
control all the way, back up all the way 

28
00:02:05,241 --> 00:02:09,507
to the core. 
You may now want to do that for a variety 

29
00:02:09,507 --> 00:02:10,650
of reasons. 
One. 

30
00:02:10,650 --> 00:02:14,252
If you look at these memory protocols 
very carefully you could end up with 

31
00:02:14,252 --> 00:02:18,287
something that actually starts to look 
like a deadlock pretty quick as you start 

32
00:02:18,287 --> 00:02:20,930
to backup into networking get sort of 
priorities mixed. 

33
00:02:20,930 --> 00:02:27,793
Also more, more insidiously here is that 
as you back up this is probably not good 

34
00:02:27,793 --> 00:02:32,600
for performance you probably want to stem 
the flow of traffic as soon as you can 

35
00:02:32,600 --> 00:02:37,347
because if you start jamming more data in 
there your just going to increase the 

36
00:02:37,347 --> 00:02:40,897
contention on your network. 
And the latency will shoot through the 

37
00:02:40,897 --> 00:02:44,791
roof on your network and all of a sudden 
you're, you're sort of in a very poor 

38
00:02:44,791 --> 00:02:47,237
operating regime. 
So it's probably better just to 

39
00:02:47,237 --> 00:02:50,782
preemptively back off and not overrun the 
buffers that are far away. 

40
00:02:50,782 --> 00:02:53,329
So, you have to worry about end to end 
flow control. 

41
00:02:53,329 --> 00:02:56,324
and this, this, there's lots of different 
schemes for this. 

42
00:02:56,324 --> 00:03:00,119
Probably one of the better ones is that 
you send some data and you wait for 

43
00:03:00,119 --> 00:03:03,913
acknowledgments to come back and you 
count your acknowledgments and this is 

44
00:03:03,913 --> 00:03:07,340
effectively some credit based flow 
control. 

45
00:03:07,340 --> 00:03:14,016
We talked a little bit about different 
ways to flow control link level. 

46
00:03:14,016 --> 00:03:18,693
So just to recall here we had one Q, 
another Q and some link in the middle. 

47
00:03:18,693 --> 00:03:22,043
This link may be pipelined. 
And we sent data this way. 

48
00:03:22,043 --> 00:03:26,215
And at some point the receiver says oh, I 
can't take any more data. 

49
00:03:26,215 --> 00:03:30,197
So it sends a stall wire. 
But if you do this around your entire 

50
00:03:30,197 --> 00:03:34,748
chip, where it's all combinational. 
Where all these little blobs here are 

51
00:03:34,748 --> 00:03:38,477
combinational logic. 
You're critical path gets very long so 

52
00:03:38,477 --> 00:03:42,649
you can start to think about trying to 
put registers on this path. 

53
00:03:42,649 --> 00:03:46,620
Unfortunately when you do that all of a 
sudden. 

54
00:03:46,620 --> 00:03:51,981
This FIFO and this register can't react 
in time, if they're, a stall signal comes 

55
00:03:51,981 --> 00:03:54,862
back. 
So if a stall signal is asserted, it's 

56
00:03:54,862 --> 00:03:59,889
going to send the data no matter what. 
It takes a cycle for that to show up so 

57
00:03:59,889 --> 00:04:05,183
you end up with something where you need 
to queue this last piece of data here 

58
00:04:05,183 --> 00:04:09,539
into a buffer because this stall is not 
seen until a cycle later. 

59
00:04:09,539 --> 00:04:14,414
And this is, we call this skid buffering. 
And you can have similar sorts of things 

60
00:04:14,414 --> 00:04:17,833
where if you have let's say a flip flop 
here but you don't feed into this 

61
00:04:17,833 --> 00:04:21,460
register you might need multiple entries 
of skid buffering. 

62
00:04:21,460 --> 00:04:26,029
Now, if you have the wrong number of 
buffers here on the receiver in your skid 

63
00:04:26,029 --> 00:04:29,955
buffering what's going to happen is you 
actually end up dropping data. 

64
00:04:29,955 --> 00:04:34,700
So if you your protocol mean lets say two 
buffers and instead you put one buffer 

65
00:04:34,700 --> 00:04:39,035
and you assert the storm as data is 
trying to transmit across the link of 

66
00:04:39,035 --> 00:04:41,906
that time. 
You're going to loose a piece of data and 

67
00:04:41,906 --> 00:04:46,914
that's, that's not very desirable. 
So this brings us to the end of what we 

68
00:04:46,914 --> 00:04:51,121
were talking about last time which was 
credit based flow control and credit 

69
00:04:51,121 --> 00:04:55,328
based flow control instead of having a 
stop signal or a on off flow control 

70
00:04:55,328 --> 00:04:58,100
signal coming back or a stall signal 
instead, 

71
00:04:58,100 --> 00:05:04,178
you keep a counter at the sender side, 
which keeps track of how many entries 

72
00:05:04,178 --> 00:05:09,747
there are over here in the receiver side. 
And this can take into account you know 

73
00:05:09,747 --> 00:05:14,587
thi-, this register here doesn't get 
counted it's, it's the end point FIFO 

74
00:05:14,587 --> 00:05:19,100
space that will back up and the data can 
be stored into. 

75
00:05:19,100 --> 00:05:23,657
So when it starts out, you, in, you, you 
set the counter if you want full band 

76
00:05:23,657 --> 00:05:28,579
with you to be the same number as entry 
you had in the receiver, you just have to 

77
00:05:28,579 --> 00:05:31,617
send data. 
Whenever you send the word, you decomate 

78
00:05:31,617 --> 00:05:34,777
your counter. 
When the counter reaches zero, you stop 

79
00:05:34,777 --> 00:05:41,127
sending because you know that. 
All of these buff all of the round-trip 

80
00:05:41,127 --> 00:05:45,739
latency here of the, the data and the 
responses coming back, or the credits 

81
00:05:45,739 --> 00:05:48,981
coming back. 
If the stall signal were to be asserted, 

82
00:05:48,981 --> 00:05:53,967
or if you were not to get back credit in 
the instantaneous cycle you would need 

83
00:05:53,967 --> 00:05:59,166
all those entries to skid into. 
[COUGH] When a word gets read out of this 

84
00:05:59,166 --> 00:06:03,689
buffer here or out of this fifo here you 
send back a credit and this will 

85
00:06:03,689 --> 00:06:07,539
increment your counter. 
And depending on how you implement this 

86
00:06:07,539 --> 00:06:11,573
you could have multiple flip flops here 
multiple flip flops there. 

87
00:06:11,573 --> 00:06:16,463
And really all this really ends up doing 
is it ends up figuring out your credit 

88
00:06:16,463 --> 00:06:19,030
loop and how big this counter needs to 
be. 

89
00:06:19,030 --> 00:06:24,374
One other nice benefit of this credit 
based flow control system is you can 

90
00:06:24,374 --> 00:06:29,101
actually size the credit counter 
different then the number of actual 

91
00:06:29,101 --> 00:06:31,625
entries. 
Now, why would you want to do this? 

92
00:06:31,625 --> 00:06:36,255
Well, one reason is, you could actually 
build a network which has, only, let's 

93
00:06:36,255 --> 00:06:40,083
say, half the bandwidth. 
By reducing the number of entries over 

94
00:06:40,083 --> 00:06:44,651
here, and reducing the credit counter. 
Now, the round trip latency is longer. 

95
00:06:44,651 --> 00:06:49,343
So then, the number of credits that you 
can have outstanding so what's going to 

96
00:06:49,343 --> 00:06:51,689
happen is, you're going to send some 
data. 

97
00:06:51,689 --> 00:06:55,922
And you're going to stall early, 
wait for some credits to come back and 

98
00:06:55,922 --> 00:06:59,845
then start sending more data. 
So you can effectively give less than 

99
00:06:59,845 --> 00:07:04,720
ideal bandwidth of cost of link but you 
can do of less offer space on the receive 

100
00:07:04,720 --> 00:07:09,535
side and this is a lot better than the 
other an off base for control where if 

101
00:07:09,535 --> 00:07:12,091
you don't have the right number of 
buffers. 

102
00:07:12,091 --> 00:07:15,717
You actually end up using data so its 
like incorrect design. 

103
00:07:15,717 --> 00:07:17,620
Here's is a performance concern.