1 00:00:03,081 --> 00:00:05,832 Okay. So, let's take a look at our data path 2 00:00:05,832 --> 00:00:08,911 here and see where predicates thin the datapath. 3 00:00:08,911 --> 00:00:14,476 And we're, we're going to focus, actually, just on the conditional move predicate or, 4 00:00:14,476 --> 00:00:19,474 or predication instruction here. We're not going to look at full 5 00:00:19,474 --> 00:00:25,052 predication just yet on the datapath. But, it follows a similar idea. 6 00:00:25,052 --> 00:00:29,038 Okay. So, what's, what do we need to, to do? 7 00:00:29,038 --> 00:00:36,211 What do we, what do we add to our, sort of, boring nipstyle five stage pipeline to 8 00:00:36,211 --> 00:00:41,070 add this instruction. Hm. 9 00:00:42,052 --> 00:00:46,094 Okay. Instruction comes in, moves down the pipe. 10 00:00:46,094 --> 00:00:49,083 Oh, this is interesting. I know. 11 00:00:49,083 --> 00:00:56,027 This is a really cool trick. Let's just, if, if this condition is not 12 00:00:56,027 --> 00:01:01,075 true, let's just kill the right back to the register file. 13 00:01:02,025 --> 00:01:05,084 It's brilliant. We just have, we just suppress the right 14 00:01:05,084 --> 00:01:08,605 back. We don't have to actually change datapath 15 00:01:08,605 --> 00:01:13,923 at all, we just put an end gate in here. And this end gate depends on, you know, 16 00:01:13,923 --> 00:01:16,087 this condition. Simple, it's easy. 17 00:01:16,087 --> 00:01:21,718 Maybe this is what we should do. Looks simple, just add an end gate where 18 00:01:21,718 --> 00:01:30,343 the big X is, and life, life is done. Okay. 19 00:01:30,343 --> 00:01:34,098 Well, that looks, that looks good. Can we bypass this value? 20 00:01:38,051 --> 00:01:45,002 So, can we have an instruction that directly follows this move zero, or this 21 00:01:45,002 --> 00:01:51,071 conditional move zero, and reads rd? Well, where are we changing rd, or not 22 00:01:51,071 --> 00:01:54,074 changing rd? Where are we making that decision? 23 00:01:54,074 --> 00:01:59,535 Well, in this pipeline, because we did it in the write back stage, it doesn't happen 24 00:01:59,535 --> 00:02:03,084 'till down here. Down here, or right back stage, or this 25 00:02:03,084 --> 00:02:09,213 wire, the right back wire runs all the way back into the right enable on a registry 26 00:02:09,213 --> 00:02:10,439 file. Huh, okay. 27 00:02:10,439 --> 00:02:14,330 Well, that doesn't really help us a whole lot. 28 00:02:14,582 --> 00:02:20,897 Especially, if we're trying to bypass and we're trying to bypass out of here, our 29 00:02:20,897 --> 00:02:25,560 ALU, back around. Because at this point, we haven't figured 30 00:02:25,560 --> 00:02:31,182 out any way to suppress this. So, we don't actually, we're not able to 31 00:02:31,182 --> 00:02:34,193 actually suppress that. Hm. 32 00:02:34,193 --> 00:02:36,422 So, what do we, what do we think about this? 33 00:02:36,422 --> 00:02:39,295 So, how do we, how do we go about doing this? 34 00:02:39,295 --> 00:02:43,855 So, let's, let's think about how to actually bypass out for this conditional 35 00:02:43,855 --> 00:02:47,879 move instruction. Cuz condition move, it's a, just a simple 36 00:02:47,879 --> 00:02:51,900 comparison of zero which we could do that in one cycle. 37 00:02:51,900 --> 00:02:56,948 We don't really have to wait until the end of the pipe to do that. 38 00:02:56,948 --> 00:03:01,030 And we will [inaudible] to go bypass it into a back to back instruction. 39 00:03:01,030 --> 00:03:03,271 Okay. So, how do we, how do we do that? 40 00:03:03,271 --> 00:03:16,069 Well, bypassing doesn't work. What if, we somehow pipe forward the 41 00:03:16,069 --> 00:03:22,470 original value and the new value. Okay. 42 00:03:22,470 --> 00:03:26,449 So, what do I mean by this? So, this, this instruction is very 43 00:03:26,449 --> 00:03:30,040 interesting. It is much more interesting than your 44 00:03:30,040 --> 00:03:34,058 standard like add instruction. So, why is it interesting? 45 00:03:34,058 --> 00:03:38,025 Well, let's look at the semantics very closely here. 46 00:03:39,093 --> 00:03:53,721 Move zero is going to write rs to rd, or It's going to write rd to rd. 47 00:03:53,721 --> 00:03:56,379 I could say, why do I need to write rd to rd? 48 00:03:56,379 --> 00:04:03,858 Well, in the bypass path, when we provide this value around back to our bypass 49 00:04:03,858 --> 00:04:09,383 registers or forwarding logic here, or bypass luxes or for, forwarding logic, we 50 00:04:09,383 --> 00:04:16,739 need the old value of rd. So, the traditional, sort of, something 51 00:04:16,739 --> 00:04:21,619 sort of risk pipelined here. We're only going to fetch our two sources, 52 00:04:21,619 --> 00:04:26,710 and we can only write one location. So, we're going to fetch rs or rt, and 53 00:04:26,710 --> 00:04:33,734 then we're going to write to rd. Now, all of a sudden, in this instruction, 54 00:04:33,734 --> 00:04:38,048 we need to read rs, okay? We need to read that cuz we need to 55 00:04:38,048 --> 00:04:44,284 overwrite rd with rs if we need, if, if the condition is true, and we need to read 56 00:04:44,284 --> 00:04:47,063 the condition rt. But, aha. 57 00:04:48,080 --> 00:04:57,076 We may also need to read rd here. This is because when we get to this stage 58 00:04:57,076 --> 00:05:04,789 here and we're going to use this bypass path to forward the value of what rd is 59 00:05:04,789 --> 00:05:08,069 going to be in the future, we need the original rd. 60 00:05:10,000 --> 00:05:18,501 So, sort of to draw this a little bit more succinctly because this is pretty 61 00:05:18,501 --> 00:05:30,160 important. We have if, the registered value of rt 62 00:05:30,160 --> 00:05:49,837 equals zero. We have R of rd gets rs. 63 00:05:49,837 --> 00:05:57,578 That's the easy one. We can count the registers here. 64 00:05:57,578 --> 00:06:01,076 Once one source, two sources, one destination. 65 00:06:01,076 --> 00:06:04,548 That's simple. And what no one ever forgets, everyone 66 00:06:04,548 --> 00:06:12,420 always forgets is the else case here. And what does this else case say? 67 00:06:12,420 --> 00:06:23,025 Well, the else case is going to say, register rd gets register rd. 68 00:06:23,069 --> 00:06:27,027 And you might say, well r, rd already had rd. 69 00:06:27,027 --> 00:06:33,025 That's true, but our bypassing, or our forwarding logic didn't have that. 70 00:06:33,025 --> 00:06:39,040 So, we need to actually read this rd. So, that means we have to read one, two, 71 00:06:39,040 --> 00:06:42,098 three, and we need to write in one location. 72 00:06:44,019 --> 00:06:47,000 Okay. So, that's going to cause us some problems 73 00:06:47,000 --> 00:06:52,004 over here because all of a sudden, we had a register file which had two read ports, 74 00:06:52,004 --> 00:06:56,789 and we need to now have three read ports. So, we need to add an extra read port on 75 00:06:56,789 --> 00:06:59,494 our register file, and this can be expensive. 76 00:06:59,494 --> 00:07:03,708 So, if we actually want to build predication, it's going to have some 77 00:07:03,708 --> 00:07:06,836 costs. We might, if we want to build predication 78 00:07:06,836 --> 00:07:11,822 actually bypass something like a predicated conditional move, we're going 79 00:07:11,822 --> 00:07:15,363 to have to add another report to our register file. 80 00:07:15,363 --> 00:07:21,170 And, that, that actually has some cost. And this is especially costly if you look 81 00:07:21,170 --> 00:07:25,640 at something like a VLIW. So, let's, let's take, for example, a 82 00:07:25,640 --> 00:07:29,942 three way VLIW something like the, the Tilera processor. 83 00:07:29,942 --> 00:07:36,532 So, it's a three way, three wide VLIW. Each of those is going, each of those ways 84 00:07:36,532 --> 00:07:42,853 or each of those pipelines is going to read two, if you don't have conditional 85 00:07:42,853 --> 00:07:46,294 move, we'll say. And, it's going to write one value. 86 00:07:46,294 --> 00:07:50,110 So, it's going to have six read ports, and three write ports. 87 00:07:50,110 --> 00:07:55,403 So, it's a ten port register file. No, excuse me, it's a nine port register 88 00:07:55,403 --> 00:07:59,784 file to begin with. And of all the sudden, we add something 89 00:07:59,784 --> 00:08:04,389 like conditional move here, and we need to add these extra read ports. 90 00:08:04,389 --> 00:08:08,742 We're going to go from a nine port register file to a twelve port register 91 00:08:08,742 --> 00:08:11,650 file. We're going to have let's see, we're going 92 00:08:11,650 --> 00:08:14,863 to have three write ports and nine read ports. 93 00:08:14,863 --> 00:08:19,577 That's a, that's a hard to do. It's, you know, it's hard to build this 94 00:08:19,577 --> 00:08:23,426 really heavily ported register files. Okay. 95 00:08:23,426 --> 00:08:29,116 So, to, to sum up here, a problem, problems with full predication is that you 96 00:08:29,116 --> 00:08:35,466 need to add another cork to the administer file, you need to bypass the predicates. 97 00:08:35,466 --> 00:08:40,424 So, what I mean by that is you're computing predicates, and you want to use 98 00:08:40,424 --> 00:08:45,012 it in the next instruction. So, if we go back to this instruction 99 00:08:45,012 --> 00:08:49,200 sequence here. We compute these predicates and what we 100 00:08:49,200 --> 00:08:51,939 use it very carefully, a very, very quickly after it. 101 00:08:51,939 --> 00:08:55,668 We don't have to wait at the end of the pipeline for this predicates to be 102 00:08:55,668 --> 00:08:58,553 computed. So, the effectively its going to make a, 103 00:08:58,553 --> 00:09:02,926 make it, so that we are going to have a predicate register files, sitting 104 00:09:02,926 --> 00:09:06,746 somewhere here, and have a bypassing around the predicate register file 105 00:09:06,746 --> 00:09:12,082 forwarding of the predicates to, to get the, the, the predicates there are faster. 106 00:09:12,082 --> 00:09:16,027 Or, get the, the predicates to be used in the next instruction. 107 00:09:19,385 --> 00:09:23,588 And, you're going to have to add extra pipeline registers to pipe forward the old 108 00:09:23,588 --> 00:09:27,444 value cuz you might need to keep the old value in the bypass. 109 00:09:27,620 --> 00:09:31,982 And, in fact, actually a lot of times when people do these things, they actually 110 00:09:31,982 --> 00:09:36,701 always write the register file and just pipe forward both and at the end make the 111 00:09:36,701 --> 00:09:39,543 decision. And, or, or along the way they make the 112 00:09:39,543 --> 00:09:43,428 decision to go into the, the bypass or not but then sort of when, when the 113 00:09:43,428 --> 00:09:47,028 instruction finishes, that's going to make the decision. 114 00:09:47,028 --> 00:09:50,413 So, we're going to actually going to have to add more pipeline registers to pipe 115 00:09:50,413 --> 00:10:08,055 forward the old value that was in, in this case, rd.