1 00:00:03,058 --> 00:00:06,566 Welcome. So this is our second lecture of ELE 475, 2 00:00:06,566 --> 00:00:11,007 Computer Architecture. And in today's lecture, we are going to be 3 00:00:11,007 --> 00:00:15,774 learning about microcoded microprocessors, so these are microprocessors which allow 4 00:00:15,774 --> 00:00:20,901 you to reuse components and reuse datapath components in order to make a smaller 5 00:00:20,901 --> 00:00:24,207 processor. And we're going to start our review, our 6 00:00:24,207 --> 00:00:28,032 first review of three reviews. And like I said before in the previous 7 00:00:28,032 --> 00:00:32,281 lecture, if you are watching this lecture, and think, well, I've seen this all before 8 00:00:32,460 --> 00:00:35,910 stick through the first three or three and a half lectures. 9 00:00:35,910 --> 00:00:39,887 That's designed to be reviewed to get everyone on the same page for this class. 10 00:00:39,887 --> 00:00:42,890 So I'm David Wentzlaff. I'm a professor here at Princeton 11 00:00:42,890 --> 00:00:45,721 University in the electrical engineering department. 12 00:00:45,721 --> 00:00:50,529 And let's get started talking about our second lecture on computer architecture. 13 00:00:50,529 --> 00:00:54,623 So a little bit about an agenda for what today is going to be about. 14 00:00:54,623 --> 00:00:59,609 We're going to start off by talking about microcoded microarchitectures, and then 15 00:00:59,609 --> 00:01:04,202 we're going to transition to talk about pipelines, basic pipelines, and review of 16 00:01:04,202 --> 00:01:08,182 basic pipelines. And we're going to talk about the basics 17 00:01:08,182 --> 00:01:14,068 of pipelining, that we are going to talk about and, and talk about the frameworks 18 00:01:14,068 --> 00:01:19,017 which we are going to need to analyze other pipelines or analyze pipelines. 19 00:01:19,017 --> 00:01:23,011 Then we're going talk about structural hazards in pipelines. 20 00:01:23,011 --> 00:01:28,023 So these are problems where you might need to use a resource in two different 21 00:01:28,023 --> 00:01:33,005 locations in a pipeline at the same time. Then, we're going to talk about data 22 00:01:33,005 --> 00:01:35,073 hazards. Data hazards are where you have one 23 00:01:35,073 --> 00:01:40,022 instruction or one operation or transaction which is dependent on another 24 00:01:40,022 --> 00:01:44,072 operation or instruction in your pipeline, but they're in different stages. 25 00:01:44,072 --> 00:01:49,064 So you might need to somehow interlock or control or stall between the different 26 00:01:49,064 --> 00:01:52,068 stages. And then we're going to talk about control 27 00:01:52,068 --> 00:01:55,053 hazards. We'll probably hold off on the control 28 00:01:55,053 --> 00:02:00,033 hazards depending on how far we get into today's lecture and it might end up a 29 00:02:00,033 --> 00:02:02,744 little bit, in another little snippet on the end. 30 00:02:02,744 --> 00:02:07,027 But in control hazard we're going to talk about when you have an instruction which 31 00:02:07,027 --> 00:02:10,609 has to redirects the pipeline or change what the pipeline is doing or the 32 00:02:10,609 --> 00:02:15,062 instructions slowing down the pipeline are doing, that is another form of hazard. 33 00:02:16,068 --> 00:02:19,887 Okay, so let's get off start off by talking about microcoded 34 00:02:19,887 --> 00:02:23,692 microarchitectures. One of the, one of the big problems is 35 00:02:23,692 --> 00:02:28,442 you're going along, this is a, a problem that happened back when processors were 36 00:02:28,442 --> 00:02:33,526 first being designed, and you couldn't fit a lot of the processor in either a room, 37 00:02:33,526 --> 00:02:39,040 let alone, or nowadays on a, on a chip. But one of the questions that comes up is 38 00:02:39,040 --> 00:02:44,056 what happens when a processor is just too large to fit in a room, or fit in a chip. 39 00:02:44,056 --> 00:02:49,486 How, how do you go about solving this? And this is not a question of how do you 40 00:02:49,486 --> 00:02:54,444 solve putting multiple processors on a chip or a room, but rather if you really 41 00:02:54,444 --> 00:02:59,291 can't fit a whole processor in a room, so this would be if you're building your 42 00:02:59,291 --> 00:03:03,576 chips out of, or you're building your, excuse me, processors out of something 43 00:03:03,576 --> 00:03:07,078 like vacuum tubes or switches or, or electromechanical relays. 44 00:03:07,078 --> 00:03:12,051 You have to think pretty hard about what happens if it's, your processor's too 45 00:03:12,051 --> 00:03:17,056 large, how do you cut down the size? Well you can time multiplex the resources. 46 00:03:17,056 --> 00:03:22,077 So that's right, you can use a resource and instead of physically building a bunch 47 00:03:22,077 --> 00:03:26,095 of resources and then collecting them together instead you can build one 48 00:03:26,095 --> 00:03:30,210 resource, which is in someway general purpose and then you can use that 49 00:03:30,210 --> 00:03:34,391 resource, and then use it again, and use it again multiple times for one 50 00:03:34,391 --> 00:03:40,000 instruction. And this is called a microcoded processor. 51 00:03:40,000 --> 00:03:46,020 So let's look at a microcontrol unit, and how one of these things works. 52 00:03:46,020 --> 00:03:50,625 So, in this, in this design you'll actually have, here's the control lines 53 00:03:50,625 --> 00:03:54,914 which run to your processor. So these are bunch of wires they're going 54 00:03:54,914 --> 00:03:59,896 to come on to your processor and are going to assert different things on AL use, 55 00:03:59,896 --> 00:04:05,065 multiplexers, registers. And then you're going to have some way you 56 00:04:05,065 --> 00:04:08,659 need to try this. So in a microcoded control unit you're 57 00:04:08,659 --> 00:04:14,322 going to actually have a microcode address that gets decoded and lights up in a RAM 58 00:04:14,322 --> 00:04:18,398 fashion a bunch of wires. And some of those are going to turn on 59 00:04:18,398 --> 00:04:23,680 these control lines and they're also going to be used in this calculation of the next 60 00:04:23,680 --> 00:04:26,516 state. Now, if you look at this, you can see that 61 00:04:26,516 --> 00:04:30,620 this actually implements something like a small finite state machine. 62 00:04:30,620 --> 00:04:34,994 But the, the one interesting thing here is that you actually have op code from the 63 00:04:34,994 --> 00:04:38,444 program coming in here. And effectively what you can do is for 64 00:04:38,444 --> 00:04:42,955 each instruction you are trying to execute in a program, you can cycle through a 65 00:04:42,955 --> 00:04:47,352 little state machine for it, and you can have different state machines depending on 66 00:04:47,352 --> 00:04:51,084 the op code that comes in here. So the op code comes in here, you might 67 00:04:51,084 --> 00:04:54,650 have for instance, condition, flags or something like that, coming out your 68 00:04:54,650 --> 00:04:57,659 processor. So whether the instruction's a branch, 69 00:04:57,659 --> 00:05:01,039 and, whether the instruction is taking that branch or not. 70 00:05:01,039 --> 00:05:05,583 And, what you can do then is, take that along with a little flip-flop here which 71 00:05:05,583 --> 00:05:08,610 would keep it in some piece of state or an address. 72 00:05:08,610 --> 00:05:13,094 You can actually step through, step through a little state machine, which will 73 00:05:13,094 --> 00:05:16,294 do multiple things. And we'll look at this a little more 74 00:05:16,294 --> 00:05:20,655 example of how, one of these microcontrol, microcontrol units will hook up to a 75 00:05:20,655 --> 00:05:23,926 microcoded microprocessor, or a microcoded processor. 76 00:05:23,926 --> 00:05:30,024 The last thing I wanted to get across here is, this is typically a RAM and you can 77 00:05:30,024 --> 00:05:35,141 actually cycle through it multiple times, and the RAM is going to tell you where the 78 00:05:35,141 --> 00:05:39,094 next micro coded instruction is. So you'll beel, basically cycling around 79 00:05:39,094 --> 00:05:44,031 this loop multiple times, or maybe only one time, depending on the actual 80 00:05:44,031 --> 00:05:49,038 instruction you're trying to execute. Here is the microcoded control unit, or 81 00:05:49,038 --> 00:05:52,565 the microcontrol unit hooked up to a microcontrolled process, or microcoded 82 00:05:52,565 --> 00:05:55,342 processor. So, we have the datapath sitting in the 83 00:05:55,342 --> 00:05:58,820 middle here. And we have a bunch of wires coming out of 84 00:05:58,820 --> 00:06:03,359 our control unit here, which assert different things on the data path. 85 00:06:03,359 --> 00:06:08,746 For instance, are you doing subtract, are you doing add and it's going to swing 86 00:06:08,746 --> 00:06:14,461 different multiplexers inside of this data path to select what operation is actually 87 00:06:14,461 --> 00:06:18,629 occurring. And I wanted to contrast the 88 00:06:18,629 --> 00:06:25,007 microcontroller, or the microcontrol unit with the memory where you're actually 89 00:06:25,007 --> 00:06:29,848 going to store your user programs. So your user programs are gonna be stored 90 00:06:29,848 --> 00:06:34,677 in a RAM structure or a randomly accessible memory structure versus a read 91 00:06:34,677 --> 00:06:39,147 only memory structure. So a good example of this is, you'd have 92 00:06:39,147 --> 00:06:43,630 your actual ISA instructions here like x86 or MIPS sitting in your RAM. 93 00:06:43,630 --> 00:06:47,013 And up here, you're going to have microcode instructions. 94 00:06:47,589 --> 00:06:52,508 So if you go back to this diagram, typically the RAM entries are called 95 00:06:52,508 --> 00:06:56,656 microcode instructions. And sometimes people write little programs 96 00:06:56,656 --> 00:07:02,960 that say, how do you sort of control the different lines that come out of here. 97 00:07:02,960 --> 00:07:10,666 Okay, so let's put this all together and look at how we would build a Bus-based 98 00:07:10,666 --> 00:07:15,106 RISC processor. So we're going build something like a MIPS 99 00:07:15,106 --> 00:07:19,678 processor, which you may recall from your computer organization class. 100 00:07:19,678 --> 00:07:24,753 And we're going to use a datapath where we reuse da datapath elements over time. 101 00:07:24,753 --> 00:07:27,815 So we're going to time multiplex the resources. 102 00:07:27,815 --> 00:07:33,311 This is not a pipeline design and it's not even a single cycle RISC design, but 103 00:07:33,311 --> 00:07:39,006 instead, it is a micro coded RISC design. And we're talking about this because we 104 00:07:39,006 --> 00:07:43,012 are going to contrast this with pipelining in today's lecture. 105 00:07:43,062 --> 00:07:48,037 So let's, let's look at this diagram. We're going to see, that we have a 106 00:07:48,037 --> 00:07:53,032 register file here, which has 32 general purpose registers, cuz we're trying to 107 00:07:53,032 --> 00:07:56,075 implement MIPS, which has 32 general purpose registers. 108 00:07:56,075 --> 00:08:01,745 And we also store the program counter or the instruction pointer in this register 109 00:08:01,745 --> 00:08:05,021 file. So this register file actually has 33 110 00:08:05,021 --> 00:08:10,033 elements, or maybe you can store it in where there, the zero register is, ors 111 00:08:10,505 --> 00:08:16,933 zero register is in MIPS if you, if you try really hard cuz that's typically hard 112 00:08:16,933 --> 00:08:22,530 coded to zero. So let's walk through how a instruction is 113 00:08:22,530 --> 00:08:26,051 going to be executed on such a architecture. 114 00:08:26,051 --> 00:08:30,645 And a couple other things to note here this is main memory. 115 00:08:30,645 --> 00:08:34,976 So we're going to have to fetch our instructions from over here. 116 00:08:34,976 --> 00:08:38,855 Our ALU is here and this is our instruction register. 117 00:08:38,855 --> 00:08:44,488 And everything is connected together on a bus, where only one value could be driven 118 00:08:44,488 --> 00:08:48,489 at a time. And that value could be broadcast over to 119 00:08:48,489 --> 00:08:52,915 multiple locations. So let's start off by thinking about what 120 00:08:52,915 --> 00:08:55,520 do we need to do to execute an instruction? 121 00:08:55,520 --> 00:08:59,450 Well, to execute an instruction, we're going start off by fetching the program 122 00:08:59,450 --> 00:09:05,079 counter for the instruction. So the microcoded control unit, is going 123 00:09:05,079 --> 00:09:11,830 to assert the wires to basically say, do a read out of the register file here for the 124 00:09:11,830 --> 00:09:15,907 program counter. And we know that the prog, if the program 125 00:09:15,907 --> 00:09:22,075 counter's going to be selected, because the microcontrol control unit is going to 126 00:09:22,075 --> 00:09:28,073 set on the register select lines here, assert the entry which we'll choose, 32 127 00:09:28,073 --> 00:09:31,597 we'll say which'll select the program counter. 128 00:09:31,597 --> 00:09:36,663 So the program counter, we, and this is all in one cycle right now, the program 129 00:09:36,663 --> 00:09:41,797 counter is going to be asserted onto this bus and now we need to latch it somewhere 130 00:09:41,797 --> 00:09:44,925 and register. So it's going to be broadcast all the way 131 00:09:44,925 --> 00:09:50,036 in this bus but we actually want to take it and load it into this memory address 132 00:09:50,036 --> 00:09:54,370 register here. And else, that will finish our first cycle 133 00:09:54,370 --> 00:09:59,038 of a microcoded control unit processor or microcoded control unit processor. 134 00:09:59,038 --> 00:10:03,692 The next cycle, this is all within one instruction, we're going to fetch the 135 00:10:03,692 --> 00:10:08,286 instruction we need from the data memory or, and the instruction memory. 136 00:10:08,286 --> 00:10:13,473 So then it's going to get forced onto this bus, and we're going to take it and latch 137 00:10:13,473 --> 00:10:16,757 it here or register it into the instruction register. 138 00:10:16,757 --> 00:10:19,740 So, at this point we fetch the instruction. 139 00:10:19,740 --> 00:10:22,555 Okay? That's, that's done a fair amount so far. 140 00:10:22,555 --> 00:10:26,501 The next thing we need to do is go get the actual operands. 141 00:10:26,501 --> 00:10:34,114 So on the third cycle here, we are going to take the rd, we'll say, or actually 142 00:10:34,114 --> 00:10:41,513 we'll take the two sources, rs1 and rs2. Now, we're going to fetch first rs1 from 143 00:10:41,513 --> 00:10:47,015 our register file and latch that or register that into the A operand. 144 00:10:47,015 --> 00:10:51,028 Then we're going do the same thing for rs2 on the fourth cycle. 145 00:10:51,028 --> 00:10:53,098 And now we can actually do, let's say an add. 146 00:10:53,098 --> 00:10:58,004 So let's say we're assuming, we are doing an add instruction here. 147 00:10:58,004 --> 00:11:01,049 So now we can do the add. So we let A and B actually add. 148 00:11:01,049 --> 00:11:05,994 And we need to store the result somewhere. Well, conveniently, we know we want to 149 00:11:05,994 --> 00:11:11,604 store it into the destination register. So we don't have to store it into A or B, 150 00:11:11,604 --> 00:11:15,473 we can actually store it back into the register file here. 151 00:11:15,473 --> 00:11:20,966 So we will be asserting rd is the address, and we will be, the microcoded control 152 00:11:20,966 --> 00:11:24,648 unit will be asserting register right on the register file. 153 00:11:24,648 --> 00:11:29,582 Okay, so the, we've now actually done our actual operation, almost completely. 154 00:11:29,582 --> 00:11:33,844 We next need to figure out how to increment the program counter. 155 00:11:33,844 --> 00:11:36,961 Because we want to go fetch the next instruction. 156 00:11:36,961 --> 00:11:41,757 So as we said, we didn't store the program counter anyway, anywhere here. 157 00:11:41,757 --> 00:11:46,189 So we need to go re-fetch the program counter out of the registry file. 158 00:11:46,189 --> 00:11:49,787 And we're going to reuse or time multiplex the ALU here. 159 00:11:49,787 --> 00:11:55,611 So we're going fetch that value, put it into A, and we're going to increment it by 160 00:11:55,611 --> 00:11:58,538 four. So the next cycle we'll do, let's say an 161 00:11:58,538 --> 00:12:03,453 operation here, let's say the ALU has a special operation just for adding four. 162 00:12:03,453 --> 00:12:07,624 Alternatively you could try to load four into the B register here. 163 00:12:07,624 --> 00:12:12,487 And we're going to add four because our instruction is four bytes long. 164 00:12:12,487 --> 00:12:17,608 And we're going to store that value back into the program counter. 165 00:12:17,608 --> 00:12:22,863 So at this point, we've actually executed a complete one instruction. 166 00:12:22,863 --> 00:12:28,981 And it takes, I think we added up something like five, six, no, I think six 167 00:12:28,981 --> 00:12:35,145 or seven cycles at that point. Now, one of the things I wanted to point 168 00:12:35,145 --> 00:12:40,082 out before we most off this slide, is that, depending on the instruction you're 169 00:12:40,082 --> 00:12:44,015 executing, you can have variable amounts of time taken. 170 00:12:44,015 --> 00:12:47,085 So for instance, if you're trying to do a branch instruction. 171 00:12:47,085 --> 00:12:50,068 Branch instructions are a little bit different. 172 00:12:50,068 --> 00:12:54,059 We're not going to actually just add four to the program counter. 173 00:12:54,059 --> 00:12:59,072 We might need to fetch a different value into a compare, and then fetch either the 174 00:12:59,072 --> 00:13:04,072 program counter or add a different value to the program counter when we go to do a 175 00:13:04,072 --> 00:13:09,036 branch operation, likewise for, for jumps. And it can have different numbers of 176 00:13:09,036 --> 00:13:12,031 cycles. Another good example of different numbers 177 00:13:12,031 --> 00:13:16,071 of cycles is if we're going to be operating on a unary instruction, or 178 00:13:16,071 --> 00:13:21,433 instruction which only has one input. So a good example of this is a, oh, let's 179 00:13:21,433 --> 00:13:25,017 think, what is this example of this? This is something like a logical negation 180 00:13:25,017 --> 00:13:27,095 where you just flip all the bits, in a, in a, in a value. 181 00:13:27,095 --> 00:13:30,077 I don't think MIPS actually has that instruction. 182 00:13:30,077 --> 00:13:35,309 Anyway, what I'm trying to get across here is, you can actually have different number 183 00:13:35,309 --> 00:13:38,082 of cycles to execute instructions depending on the instruction. 184 00:13:38,082 --> 00:13:42,079 So example, of something like a load instruction, would also to a lot more 185 00:13:42,079 --> 00:13:46,931 cycles because you're going to have to cycle the memory unit here more times so 186 00:13:46,931 --> 00:13:51,233 you have to execute instruction where you actually go and fetch the data from the 187 00:13:51,233 --> 00:13:54,850 memory and then put it back in the register and maybe do some math with it 188 00:13:54,850 --> 00:13:58,012 and then store it back in, into the room, the general purpose registry file.