1 00:00:03,200 --> 00:00:09,750 So today we're going to start off and it is our final installment of ELE475. 2 00:00:09,750 --> 00:00:14,865 We have to cover all of the rest of computer architecture in this one 3 00:00:14,865 --> 00:00:17,248 lecture. So, there's a lot to cover. 4 00:00:17,248 --> 00:00:22,013 A lot of things to discuss. But, more seriously, today, we are going 5 00:00:22,013 --> 00:00:27,549 to, going to be finishing up what we were talking about, with interconnection 6 00:00:27,549 --> 00:00:30,492 networks. Mainly, credit based flow control. 7 00:00:30,492 --> 00:00:36,098 A little bit about deadlock and that will complete our interconnection networks. 8 00:00:36,098 --> 00:00:39,916 And then we'll go on to more scalable cash coherent systems. 9 00:00:39,916 --> 00:00:43,892 So cash coherent systems that have more than let's say eight nodes. 10 00:00:43,892 --> 00:00:48,283 So we'll look at how to scale up to thousands of nodes, and we'll touch on 11 00:00:48,283 --> 00:00:51,132 one coherence protocol that, that works for that. 12 00:00:51,132 --> 00:00:54,960 And that's called directory based cash coherence. 13 00:00:54,960 --> 00:01:02,422 So we left of last time we were talking about flow control between two separate 14 00:01:02,422 --> 00:01:07,301 nodes in a in of action network. And we talked about sort of local link 15 00:01:07,301 --> 00:01:12,660 based or hop based full control which is where we spend the end of last class 16 00:01:12,660 --> 00:01:16,302 talking about. We also mentioned this end to end full 17 00:01:16,302 --> 00:01:21,662 control and end to end full control is important a good example of this is 18 00:01:21,662 --> 00:01:27,268 something where you have a core which is trying to communicate to a memory 19 00:01:27,268 --> 00:01:30,266 controller. And you don't want to overrun the buffer 20 00:01:30,266 --> 00:01:34,242 in the memory controller cause if you overrun the buffer in the memory 21 00:01:34,242 --> 00:01:37,210 controller your memory transactions just drop on the floor. 22 00:01:37,210 --> 00:01:40,290 so it's possible that your network connection is. 23 00:01:40,290 --> 00:01:44,413 Link level flow controlled or hop based flow controlled. 24 00:01:44,413 --> 00:01:49,795 But, you still need a end to end flow control inside of your chip or your set 25 00:01:49,795 --> 00:01:55,246 of chips in your system to be able to prevent you from overrunning some other 26 00:01:55,246 --> 00:01:59,929 buffer that's farther away. Now you could, for instance, back up into 27 00:01:59,929 --> 00:02:05,241 the network and have the local flow control all the way, back up all the way 28 00:02:05,241 --> 00:02:09,507 to the core. You may now want to do that for a variety 29 00:02:09,507 --> 00:02:10,650 of reasons. One. 30 00:02:10,650 --> 00:02:14,252 If you look at these memory protocols very carefully you could end up with 31 00:02:14,252 --> 00:02:18,287 something that actually starts to look like a deadlock pretty quick as you start 32 00:02:18,287 --> 00:02:20,930 to backup into networking get sort of priorities mixed. 33 00:02:20,930 --> 00:02:27,793 Also more, more insidiously here is that as you back up this is probably not good 34 00:02:27,793 --> 00:02:32,600 for performance you probably want to stem the flow of traffic as soon as you can 35 00:02:32,600 --> 00:02:37,347 because if you start jamming more data in there your just going to increase the 36 00:02:37,347 --> 00:02:40,897 contention on your network. And the latency will shoot through the 37 00:02:40,897 --> 00:02:44,791 roof on your network and all of a sudden you're, you're sort of in a very poor 38 00:02:44,791 --> 00:02:47,237 operating regime. So it's probably better just to 39 00:02:47,237 --> 00:02:50,782 preemptively back off and not overrun the buffers that are far away. 40 00:02:50,782 --> 00:02:53,329 So, you have to worry about end to end flow control. 41 00:02:53,329 --> 00:02:56,324 and this, this, there's lots of different schemes for this. 42 00:02:56,324 --> 00:03:00,119 Probably one of the better ones is that you send some data and you wait for 43 00:03:00,119 --> 00:03:03,913 acknowledgments to come back and you count your acknowledgments and this is 44 00:03:03,913 --> 00:03:07,340 effectively some credit based flow control. 45 00:03:07,340 --> 00:03:14,016 We talked a little bit about different ways to flow control link level. 46 00:03:14,016 --> 00:03:18,693 So just to recall here we had one Q, another Q and some link in the middle. 47 00:03:18,693 --> 00:03:22,043 This link may be pipelined. And we sent data this way. 48 00:03:22,043 --> 00:03:26,215 And at some point the receiver says oh, I can't take any more data. 49 00:03:26,215 --> 00:03:30,197 So it sends a stall wire. But if you do this around your entire 50 00:03:30,197 --> 00:03:34,748 chip, where it's all combinational. Where all these little blobs here are 51 00:03:34,748 --> 00:03:38,477 combinational logic. You're critical path gets very long so 52 00:03:38,477 --> 00:03:42,649 you can start to think about trying to put registers on this path. 53 00:03:42,649 --> 00:03:46,620 Unfortunately when you do that all of a sudden. 54 00:03:46,620 --> 00:03:51,981 This FIFO and this register can't react in time, if they're, a stall signal comes 55 00:03:51,981 --> 00:03:54,862 back. So if a stall signal is asserted, it's 56 00:03:54,862 --> 00:03:59,889 going to send the data no matter what. It takes a cycle for that to show up so 57 00:03:59,889 --> 00:04:05,183 you end up with something where you need to queue this last piece of data here 58 00:04:05,183 --> 00:04:09,539 into a buffer because this stall is not seen until a cycle later. 59 00:04:09,539 --> 00:04:14,414 And this is, we call this skid buffering. And you can have similar sorts of things 60 00:04:14,414 --> 00:04:17,833 where if you have let's say a flip flop here but you don't feed into this 61 00:04:17,833 --> 00:04:21,460 register you might need multiple entries of skid buffering. 62 00:04:21,460 --> 00:04:26,029 Now, if you have the wrong number of buffers here on the receiver in your skid 63 00:04:26,029 --> 00:04:29,955 buffering what's going to happen is you actually end up dropping data. 64 00:04:29,955 --> 00:04:34,700 So if you your protocol mean lets say two buffers and instead you put one buffer 65 00:04:34,700 --> 00:04:39,035 and you assert the storm as data is trying to transmit across the link of 66 00:04:39,035 --> 00:04:41,906 that time. You're going to loose a piece of data and 67 00:04:41,906 --> 00:04:46,914 that's, that's not very desirable. So this brings us to the end of what we 68 00:04:46,914 --> 00:04:51,121 were talking about last time which was credit based flow control and credit 69 00:04:51,121 --> 00:04:55,328 based flow control instead of having a stop signal or a on off flow control 70 00:04:55,328 --> 00:04:58,100 signal coming back or a stall signal instead, 71 00:04:58,100 --> 00:05:04,178 you keep a counter at the sender side, which keeps track of how many entries 72 00:05:04,178 --> 00:05:09,747 there are over here in the receiver side. And this can take into account you know 73 00:05:09,747 --> 00:05:14,587 thi-, this register here doesn't get counted it's, it's the end point FIFO 74 00:05:14,587 --> 00:05:19,100 space that will back up and the data can be stored into. 75 00:05:19,100 --> 00:05:23,657 So when it starts out, you, in, you, you set the counter if you want full band 76 00:05:23,657 --> 00:05:28,579 with you to be the same number as entry you had in the receiver, you just have to 77 00:05:28,579 --> 00:05:31,617 send data. Whenever you send the word, you decomate 78 00:05:31,617 --> 00:05:34,777 your counter. When the counter reaches zero, you stop 79 00:05:34,777 --> 00:05:41,127 sending because you know that. All of these buff all of the round-trip 80 00:05:41,127 --> 00:05:45,739 latency here of the, the data and the responses coming back, or the credits 81 00:05:45,739 --> 00:05:48,981 coming back. If the stall signal were to be asserted, 82 00:05:48,981 --> 00:05:53,967 or if you were not to get back credit in the instantaneous cycle you would need 83 00:05:53,967 --> 00:05:59,166 all those entries to skid into. [COUGH] When a word gets read out of this 84 00:05:59,166 --> 00:06:03,689 buffer here or out of this fifo here you send back a credit and this will 85 00:06:03,689 --> 00:06:07,539 increment your counter. And depending on how you implement this 86 00:06:07,539 --> 00:06:11,573 you could have multiple flip flops here multiple flip flops there. 87 00:06:11,573 --> 00:06:16,463 And really all this really ends up doing is it ends up figuring out your credit 88 00:06:16,463 --> 00:06:19,030 loop and how big this counter needs to be. 89 00:06:19,030 --> 00:06:24,374 One other nice benefit of this credit based flow control system is you can 90 00:06:24,374 --> 00:06:29,101 actually size the credit counter different then the number of actual 91 00:06:29,101 --> 00:06:31,625 entries. Now, why would you want to do this? 92 00:06:31,625 --> 00:06:36,255 Well, one reason is, you could actually build a network which has, only, let's 93 00:06:36,255 --> 00:06:40,083 say, half the bandwidth. By reducing the number of entries over 94 00:06:40,083 --> 00:06:44,651 here, and reducing the credit counter. Now, the round trip latency is longer. 95 00:06:44,651 --> 00:06:49,343 So then, the number of credits that you can have outstanding so what's going to 96 00:06:49,343 --> 00:06:51,689 happen is, you're going to send some data. 97 00:06:51,689 --> 00:06:55,922 And you're going to stall early, wait for some credits to come back and 98 00:06:55,922 --> 00:06:59,845 then start sending more data. So you can effectively give less than 99 00:06:59,845 --> 00:07:04,720 ideal bandwidth of cost of link but you can do of less offer space on the receive 100 00:07:04,720 --> 00:07:09,535 side and this is a lot better than the other an off base for control where if 101 00:07:09,535 --> 00:07:12,091 you don't have the right number of buffers. 102 00:07:12,091 --> 00:07:15,717 You actually end up using data so its like incorrect design. 103 00:07:15,717 --> 00:07:17,620 Here's is a performance concern.