1 00:00:03,027 --> 00:00:08,032 Let's, let's look at a baseline 2-way in-order superscalar. 2 00:00:08,055 --> 00:00:13,075 That's a mouthful to say. So, difference than the pipelines you've 3 00:00:13,075 --> 00:00:15,290 seen before. We have two ALU's. 4 00:00:15,290 --> 00:00:19,829 It's a big difference. We can execute two integer ops at the same 5 00:00:19,829 --> 00:00:23,286 time in this pipe. Drawn here, we are going to actually 6 00:00:23,286 --> 00:00:27,552 differentiate these two pipes. We are going to call this pipeline A and 7 00:00:27,552 --> 00:00:31,919 this pipeline B, and pipeline A lets say, can do integer ops and branches, and 8 00:00:31,919 --> 00:00:34,561 pipeline B can do integer ops and memory access. 9 00:00:34,561 --> 00:00:39,769 But you can't, you can't do memory up here and you can't do branches down there. 10 00:00:39,769 --> 00:00:44,508 That's, there's nothing fundamental. We're just going to look at it to sort of, 11 00:00:44,508 --> 00:00:47,542 basic example here, we have two asymmetric pipelines. 12 00:00:47,542 --> 00:00:51,643 An important, important point of this, first is that, compared to our 5-stage you 13 00:00:51,643 --> 00:00:56,077 know MIPS processor is that with will it fetch two instructions at the same time. 14 00:00:56,077 --> 00:01:00,099 If we want to actually be able to execute two things, we need to able somehow get 15 00:01:00,099 --> 00:01:04,002 that out of the instruction cache or the instruction memory. 16 00:01:04,002 --> 00:01:05,673 Hm, okay. Well, that's interesting. 17 00:01:05,673 --> 00:01:09,456 So, the program counter kind of sort of go in here and instead of being one 18 00:01:09,456 --> 00:01:14,270 instruction now we actually get two to go in these two different instruction 19 00:01:14,270 --> 00:01:19,040 registers. We also need to add more ports to our 20 00:01:19,040 --> 00:01:21,831 register file. Instead of, in our basic pipeline that we 21 00:01:21,831 --> 00:01:25,063 have talked about earlier, we had only two read ports. 22 00:01:25,063 --> 00:01:29,050 You gave two different addresses and it outputted two registers. 23 00:01:29,050 --> 00:01:34,010 Now, because we have two different instructions at the same time, we actually 24 00:01:34,010 --> 00:01:39,013 have to pull out four different read ports or four different read registers at the 25 00:01:39,013 --> 00:01:41,878 same time. And, if we want to be able to retire or, 26 00:01:42,074 --> 00:01:46,068 commit instructions two at a time, we need to add more write ports. 27 00:01:46,068 --> 00:01:50,035 So I, I show the register file here sort of split into two. 28 00:01:50,035 --> 00:01:55,027 But, the register files kind of, it, it's together, but logically I just drew it 29 00:01:55,027 --> 00:01:59,060 apart so, that you can actually make heads or tails of the drawing. 30 00:01:59,080 --> 00:02:04,092 So that's, that's something interesting to think about is, you have to, to worry 31 00:02:04,092 --> 00:02:07,633 about that. Okay, so the first question I have here, 32 00:02:07,846 --> 00:02:12,033 is this good enough? Is this pipeline diagram good enough in, 33 00:02:12,033 --> 00:02:17,003 let's say, the fetch stage? We stick some addresses in, we get two 34 00:02:17,003 --> 00:02:19,090 instructions out. So, that's a good question. 35 00:02:19,090 --> 00:02:24,278 Do we do PC and PC + four? So, let's say we can, there's some logic 36 00:02:24,278 --> 00:02:28,150 which we pull out PC and PC + four at the same time. 37 00:02:28,150 --> 00:02:33,659 So, we're executing two instructions. So, so roughly, you know, we need to worry 38 00:02:33,659 --> 00:02:37,954 about alignment issues here. We need to worry about branches is, let's 39 00:02:37,954 --> 00:02:43,011 say, the first instruction in, that we pull out of the two instructions. 40 00:02:43,011 --> 00:02:48,029 In this next part here, Our pipes are not symmetric. 41 00:02:48,029 --> 00:02:57,928 So, is this, is this good enough? So, what happens if the first instruction 42 00:02:57,928 --> 00:03:03,501 that comes out here is a load. So, instruction IR0 here. 43 00:03:03,501 --> 00:03:07,545 The instruction register just loaded with the bits from the load. 44 00:03:07,545 --> 00:03:12,076 What, what happens down the stream here? Can the load happen here? 45 00:03:12,076 --> 00:03:13,284 [inaudible]. Yeah. 46 00:03:13,284 --> 00:03:16,059 That's a problem. So, we're starting to go with the 47 00:03:16,059 --> 00:03:19,486 superscalar here. We need to start thinking about having, 48 00:03:19,486 --> 00:03:23,029 let's look back and forth here and take a look at this. 49 00:03:23,029 --> 00:03:28,515 You need something here that can take an instruction that will show up here and 50 00:03:28,515 --> 00:03:31,348 route all the operand values down over here. 51 00:03:31,348 --> 00:03:36,045 Largely, a lot of times people call this issue logic or instruction steering logic. 52 00:03:36,045 --> 00:03:40,717 So, you have to sort of steer the operands and you're going to, you can basically 53 00:03:40,717 --> 00:03:45,000 swap the two operands, the two instructions that are going down the pipe 54 00:03:45,000 --> 00:03:49,085 at the same time. Okay, so, so that's, that's interesting, 55 00:03:49,085 --> 00:03:52,060 and this is, could actually cost some time to do this. 56 00:03:52,060 --> 00:03:55,815 So, this might motivate us to have longer pipelines. 57 00:03:55,815 --> 00:04:01,068 So, we'll talk about that in a second. Another thing you have to do, is on the 58 00:04:01,068 --> 00:04:06,011 control side, is you have to actually start thinking about duplicating control. 59 00:04:06,204 --> 00:04:10,933 So here, we actually have two decoders because we're decoding two instructions at 60 00:04:10,933 --> 00:04:14,018 the same time. So, the instruction register wires up to 61 00:04:14,018 --> 00:04:18,484 Decode A and this instruction register wires up to Decode B and then they're 62 00:04:18,484 --> 00:04:22,047 going to drive singles down across the respective A and B data paths. 63 00:04:22,047 --> 00:04:27,007 So, that's, something not drawn here, is you may also, if you have to interchange 64 00:04:27,176 --> 00:04:29,926 these sort of instruction register zero to the B pipe. 65 00:04:29,926 --> 00:04:33,986 You might have to, you definitely have some, you know, communication or some, 66 00:04:34,155 --> 00:04:43,043 swapping of the instruction inputs here. So that's, that's, that's sort of the 67 00:04:43,043 --> 00:04:46,022 baseline, 2-way processor.