1 00:00:04,075 --> 00:00:09,041 So, now we're going to start talking about the fundamental things inside of 2 00:00:09,041 --> 00:00:13,087 instruction set architecture, and the things you need to build inside of 3 00:00:13,087 --> 00:00:17,654 instruction set architecture. And, we're going to start off by talking 4 00:00:17,654 --> 00:00:21,025 about machine models. So, I have a question here. 5 00:00:21,025 --> 00:00:25,615 Where do operands come from? And, where do operands, where do results 6 00:00:25,615 --> 00:00:28,278 go to? So, when I say, operands, these are the 7 00:00:28,278 --> 00:00:33,443 data values you're going to operate on with a single instruction of the processor 8 00:00:33,443 --> 00:00:39,132 and the results are where they, and where the data that gets calculated, and where 9 00:00:39,132 --> 00:00:42,624 does it go. And, this is actually an instruction set 10 00:00:42,624 --> 00:00:45,501 architectural, big eight architecture concern. 11 00:00:45,501 --> 00:00:50,361 Fundamentally, you're going to have some form of arithmetic logic unit, or some 12 00:00:50,361 --> 00:00:57,618 sort of calculation unit. And you're going to have some type of 13 00:00:57,618 --> 00:01:01,776 memory storage. And, in a, in a process you're probably 14 00:01:01,776 --> 00:01:07,417 going to want to take stuff from the storage, move it to the ALU, compute on 15 00:01:07,417 --> 00:01:11,042 it, and then put it back in to storage somehow. 16 00:01:11,093 --> 00:01:19,762 And this is, the processor here that we wrap around the ALU, and the machine model 17 00:01:19,762 --> 00:01:27,765 is, what is the implementation, what is the semantics, or excuse me, not the 18 00:01:27,765 --> 00:01:32,893 implementation, what is the organization of the registers? 19 00:01:32,893 --> 00:01:36,712 How do you go to access memory? What sort of instructions and operations 20 00:01:36,712 --> 00:01:39,729 are allowed? And, we're going to talk a little about, 21 00:01:39,729 --> 00:01:42,750 where do the operands come from? Where do the results go? 22 00:01:42,750 --> 00:01:45,463 And, different machine models that people have built. 23 00:01:45,463 --> 00:01:50,135 So, this is different instruction set of architectures and, and even more than 24 00:01:50,135 --> 00:01:54,290 instruction set architectures cuz instruction set architectures talk about 25 00:01:54,290 --> 00:01:59,104 how you encode instructions even, this is even a little bit more fundamental of how 26 00:01:59,104 --> 00:02:06,564 do you go about, reasoning about how to move data from memory to the ALU and back. 27 00:02:06,564 --> 00:02:11,431 And, how do you go and store the data close to the ALU. 28 00:02:11,431 --> 00:02:18,402 So, let's start off with a very simplistic example here, a very simplistic processor 29 00:02:18,402 --> 00:02:25,035 that people have actually built. Believe it or not, you don't have to have 30 00:02:25,035 --> 00:02:30,799 registers in your processor, or you don't have to have name registers in your 31 00:02:30,799 --> 00:02:36,096 instruction set architecture. Instead, you can have a stack. 32 00:02:37,069 --> 00:02:43,037 A stack is just a storage, element where you put things onto the stack, and then 33 00:02:43,037 --> 00:02:49,095 they come off in the order that was the last one that was put on that was taken, 34 00:02:49,095 --> 00:02:54,550 that gets taken off first. Instead, in a very basic sense, we just 35 00:02:54,550 --> 00:02:59,834 take the top two things on the stack. We fetch both of them, operate on them, 36 00:02:59,834 --> 00:03:03,499 and put then on what would be the top of the stack. 37 00:03:03,499 --> 00:03:09,161 Building, building up from there, we can think about something like a accumulator 38 00:03:09,161 --> 00:03:13,090 architecture. So, typically in accumulator architecture, 39 00:03:14,013 --> 00:03:19,828 if you have one accumulator, so there's only one register, one of the operands for 40 00:03:19,828 --> 00:03:25,305 every operation you do is always implicit. It has to come from the accumulator. 41 00:03:25,305 --> 00:03:28,645 The other one, let's say, can come from, from memory. 42 00:03:28,645 --> 00:03:34,138 So, you have to name one of the operands in architecture like that. 43 00:03:34,138 --> 00:03:39,940 Building up from there, we can start thinking about maybe register to memory 44 00:03:39,940 --> 00:03:45,342 operations where you'll name, let's say, the oper, operand that is the source, 45 00:03:45,342 --> 00:03:50,794 let's say, coming from memory, and you'll name, let's say, an operand coming from 46 00:03:50,794 --> 00:03:56,098 your register file. And, optionally, you may even name the 47 00:03:56,098 --> 00:04:00,601 destination. So, these are all, these are all options, 48 00:04:00,601 --> 00:04:04,759 and we'll call this a register memory architecture. 49 00:04:04,759 --> 00:04:11,198 Finally, we have something like a register, register architecture. 50 00:04:11,198 --> 00:04:17,520 So, anyone want to wager a guess here, how many named operands do we have here? 51 00:04:17,520 --> 00:04:19,046 Three. Okay, yeah. 52 00:04:19,046 --> 00:04:24,683 And so, this picture here, we're going to take something from a register, something 53 00:04:24,683 --> 00:04:29,013 from another register, and we have to name where to go put it. 54 00:04:29,051 --> 00:04:34,072 So, that's, that's, I wanted to point out here, actually, the number of explicitly 55 00:04:34,072 --> 00:04:37,333 named operands. For some of these, it's, it's a little bit 56 00:04:37,333 --> 00:04:41,522 more questionable than others. So, as you see here, we have zero, we have 57 00:04:41,522 --> 00:04:45,649 a little more complicated something where we have one, two, or three. 58 00:04:45,649 --> 00:04:48,555 Now, how do I, why do I say two or three here? 59 00:04:48,555 --> 00:04:55,557 So, there, some of these architectures or architectural models don't necessarily 60 00:04:55,557 --> 00:04:59,043 name the resultant, or the result location. 61 00:04:59,043 --> 00:05:03,011 Some of them will implicitly name the result. 62 00:05:06,030 --> 00:05:09,049 So, something like x86 for instance, the first source operand is always the 63 00:05:09,049 --> 00:05:13,025 destination. So, it has to only name two things. 64 00:05:13,025 --> 00:05:18,098 Something like MIPS, or more, most risk architectures will actually name all 65 00:05:18,098 --> 00:05:22,006 three. And you can think about that happening for 66 00:05:22,006 --> 00:05:25,000 both memory and a register, register architecture. 67 00:05:25,000 --> 00:05:29,056 So, a good example of a three operand memory, memory which I don't have drawn 68 00:05:29,056 --> 00:05:32,032 here. This is just, I just have register memory 69 00:05:32,032 --> 00:05:35,216 drawn here. But memory, memory which basically says, 70 00:05:35,216 --> 00:05:39,765 all your operands come from memory and all your destinations are in memory is 71 00:05:39,765 --> 00:05:44,014 something like the VAX architecture which was popular in the '70s. 72 00:05:44,014 --> 00:05:48,094 You could actually have all your operands come from memory, do some operation on 73 00:05:48,094 --> 00:05:54,043 them, and store it back into memory. So, one interesting trade off here is that 74 00:05:54,043 --> 00:05:59,441 once you start to have more operands that explicitly named, you need more encoding 75 00:05:59,441 --> 00:06:01,581 space for that. And this is one of the important 76 00:06:01,581 --> 00:06:04,393 trade-offs. Okay. 77 00:06:04,393 --> 00:06:10,016 So, let's, let's go into a little bit of depth about stack-based architecture. 78 00:06:10,016 --> 00:06:14,018 So, some examples of this are actually Burrough's 5000. 79 00:06:14,042 --> 00:06:20,074 The Symbolics machines were stack-based, and these are sort of machines from the 80 00:06:20,074 --> 00:06:23,098 past. There's a few modern or more modern 81 00:06:23,098 --> 00:06:28,056 examples of this. For instance, the Java Virtual Machine is 82 00:06:28,056 --> 00:06:35,033 actually a stack architecture. And then, also, Intel's old Floating Point 83 00:06:35,033 --> 00:06:40,005 system, x87. They've since sort of, deprecated this, 84 00:06:40,005 --> 00:06:44,074 but it's still relatively modern as a stack-based instruction set architecture. 85 00:06:45,066 --> 00:06:52,038 Let's take a stack-based instruction set architecture and look at how you can go 86 00:06:52,038 --> 00:06:57,812 about evaluating an expression. So, here we have an expression, a plus b 87 00:06:57,812 --> 00:07:04,195 times c, all in parentheses, that whole quantity, divided by a plus d times c 88 00:07:04,195 --> 00:07:07,099 minus e. So, it's some complex math in, in, 89 00:07:07,099 --> 00:07:12,006 instruction, or math operation that we need to do. 90 00:07:12,060 --> 00:07:19,010 Here, we actually break down this into a parse tree of this. 91 00:07:19,010 --> 00:07:24,042 You can see, this is preserving ORS operations. 92 00:07:24,074 --> 00:07:27,808 We have a plus, this sub-quantity here, b times c. 93 00:07:27,808 --> 00:07:35,447 And, you take all of this and divide it by this subexpression here. 94 00:07:35,447 --> 00:07:41,366 And, little bit of a throwback here, this is a way that we're going to take the 95 00:07:41,366 --> 00:07:45,056 operations. And, if we do these operations on a stack 96 00:07:45,056 --> 00:07:48,070 machine model, we're going to get the right result. 97 00:07:48,070 --> 00:07:52,903 So, what this means is, put a on the stack, put b on the stack, put c on the 98 00:07:52,903 --> 00:07:57,142 stack, and then multiply. B times c, and then add the results times 99 00:07:57,142 --> 00:08:00,064 a. And, you can see that we're going to build 100 00:08:00,064 --> 00:08:05,071 this expression on a, on a stack. So, it's sort of a, different way to sort 101 00:08:05,071 --> 00:08:09,684 of, rethink about problems. And, we'll, we'll walk through an example 102 00:08:09,684 --> 00:08:12,400 here. So here, we have an evaluation stack. 103 00:08:12,400 --> 00:08:17,529 And, the top of the stack is going to be whatever is on the top here in this 104 00:08:17,529 --> 00:08:21,980 picture. So, first thing, we're going to do is 105 00:08:21,980 --> 00:08:26,589 we're going to say, push a. So, a shows up on the stack. 106 00:08:26,589 --> 00:08:32,573 Then, we're going to push b, we're going to push c, and then we're going to do a 107 00:08:32,573 --> 00:08:35,306 multiply. So, we're going to take the two entries 108 00:08:35,306 --> 00:08:40,622 that are on the top of the stack, multiply them and put them into this entry in the 109 00:08:40,622 --> 00:08:43,711 stack. Then, we're going to say, add, and it's 110 00:08:43,711 --> 00:08:49,016 going to add the two top things on a stack here, and put the result here. 111 00:08:49,016 --> 00:08:54,078 And, you can see that if you run something like this, you can actually do full 112 00:08:54,078 --> 00:09:00,017 computations on a very small stack. And, what's also nice here is you don't 113 00:09:00,017 --> 00:09:03,038 have to explicitly name any of the operands. 114 00:09:03,038 --> 00:09:07,053 So, this machine model allows you to, to run real programs. 115 00:09:08,604 --> 00:09:13,818 And, one of the things I want to get across here is that the stack is part of 116 00:09:13,818 --> 00:09:19,888 the processor state, and it's usually, so that's the big A instruction, or big A 117 00:09:19,888 --> 00:09:24,101 architecture, or the instruction set architecture. 118 00:09:24,101 --> 00:09:30,047 You have a stack, and many times it's unbounded in the big A architecture. 119 00:09:30,047 --> 00:09:35,671 But in real life, it needs to be bounded somehow because you can't have infinitely 120 00:09:35,671 --> 00:09:40,087 long, physical stacks in your machine. So, it's conceptually unbounded, but you 121 00:09:40,087 --> 00:09:46,029 probably want to have it overflow to main memory if you put too many things on the 122 00:09:46,029 --> 00:09:48,083 stack. And this is, this is an important 123 00:09:48,083 --> 00:09:53,790 characteristic cuz other times, otherwise, there are certain computations you can't 124 00:09:53,790 --> 00:09:56,708 do. If, for instance, the parse tree is too 125 00:09:56,708 --> 00:10:01,787 long here, or the, the depth of the tree is too long, your stack might, might grow 126 00:10:01,787 --> 00:10:03,814 too long. Okay. 127 00:10:03,814 --> 00:10:09,011 So, let's say, we have a micro-architecture implementation of a 128 00:10:09,011 --> 00:10:13,621 instruction set architecture, which is a stack-based architecture. 129 00:10:13,621 --> 00:10:19,059 And, the top two elements of the stack are kept at registers, and the rest is in main 130 00:10:19,059 --> 00:10:23,911 memory, and it spills and fills. Well, each push has a memory reference. 131 00:10:23,911 --> 00:10:28,507 So, when you put something on the stack, it has a memory reference. 132 00:10:28,507 --> 00:10:34,109 And, each pop, has a memory reference cuz you just need to put something back into 133 00:10:34,109 --> 00:10:38,356 main memory. But, more so than that, you might have 134 00:10:38,356 --> 00:10:43,350 extra memory references here as the stack spills over into main memory, and you 135 00:10:43,350 --> 00:10:46,918 might have to pull something back in from main memory. 136 00:10:46,918 --> 00:10:52,595 So, that's not, not very good cuz you might end up with a push having more than 137 00:10:52,595 --> 00:10:55,889 one number reference. So, one optimization from a 138 00:10:55,889 --> 00:11:00,957 micro-architecture perspective, you could think about having N elements of your 139 00:11:00,957 --> 00:11:05,578 stack in registers very close to the processor, and only have to go access main 140 00:11:05,578 --> 00:11:08,753 memory when the register stack overflows and underflows. 141 00:11:08,753 --> 00:11:13,364 So, instead of having to do a memory reference, basically, every single time 142 00:11:13,364 --> 00:11:18,340 you do an operation or every single time you do a push or a pop, you do it only 143 00:11:18,340 --> 00:11:23,337 when the stack depth gets too large, that you can't fit everything on your stack. 144 00:11:23,337 --> 00:11:26,084 So, let's, let's walk through a brief example here. 145 00:11:26,084 --> 00:11:31,083 We have the same expression, calculation that we are doing before. 146 00:11:31,083 --> 00:11:36,341 And, you can see, we can do, push, push, push, multiply, add, push, push, push, 147 00:11:36,341 --> 00:11:41,024 multiply, add, push e, subtract, and do all the divides at the end. 148 00:11:41,024 --> 00:11:45,438 But, if we have a stack size of two, what's going to happen here is, at the 149 00:11:45,438 --> 00:11:50,030 beginning, we're only going to, you know, when we do a push, we're going to do a 150 00:11:50,030 --> 00:11:52,886 load from a, push we're going to do a load from b. 151 00:11:52,886 --> 00:11:57,504 Now, we do a push of load of c, but our stack already has two things on it. 152 00:11:57,504 --> 00:12:03,278 So, if we try to push the third thing, we have to overflow the bottom of the stack 153 00:12:03,278 --> 00:12:06,285 somewhere. So, we're actually going to do a stack 154 00:12:06,285 --> 00:12:11,775 save of a out to main memory. Here, we do this multiply, and we actually 155 00:12:11,775 --> 00:12:15,034 do a stack fetch of a and get it back from the main memory. 156 00:12:15,034 --> 00:12:21,018 So, it's a lot of extra memory operations. So, you might want to think of a different 157 00:12:21,018 --> 00:12:24,044 micro-architecture. And if you sort of, walk through this 158 00:12:24,044 --> 00:12:28,093 whole example here, we're going to see that we have four stores and four fetches 159 00:12:28,093 --> 00:12:33,065 from main memory, which are all implicit, plus the explicit ones that we're trying 160 00:12:33,065 --> 00:12:37,612 to do with the pushes and the pops. Hm, well that's eight extra memory 161 00:12:37,612 --> 00:12:41,005 operations. When we think about how to build a 162 00:12:41,005 --> 00:12:45,361 micro-architecture which has less, but has the exact same instruction set 163 00:12:45,361 --> 00:12:49,069 architecture. Well, let's say, we have a machine which 164 00:12:49,069 --> 00:12:53,049 has a stack size of four. Well, if the stack size's a four, at the 165 00:12:53,049 --> 00:12:57,029 worse case here, we'll ever have to fit four things on our stack. 166 00:12:57,029 --> 00:13:00,055 So, we never have to, spill out to main memory our stack. 167 00:13:00,055 --> 00:13:05,018 Pushes and pops still have to access memory, but that's what they're trying to 168 00:13:05,018 --> 00:13:07,079 do. They're actually trying to access memory. 169 00:13:09,027 --> 00:13:14,722 So, to sum up here about stack-based machine models, they look, they look 170 00:13:14,722 --> 00:13:19,054 pretty cool. But, not, not all things are, are great at 171 00:13:19,054 --> 00:13:23,028 all times. So, one of the, the interesting things 172 00:13:23,028 --> 00:13:27,058 here to see is, we actually push a, and we push a again. 173 00:13:27,058 --> 00:13:32,324 So, we're doing redundant work. And we push c, and we push C again. 174 00:13:32,324 --> 00:13:35,850 C and a have the same value both times we push them. 175 00:13:35,850 --> 00:13:41,088 So, all of a sudden, we're doing extra work and maybe we could have done less 176 00:13:41,088 --> 00:13:46,467 work if we had an architecture or machine model, a big A architecture, which allows 177 00:13:46,467 --> 00:13:50,548 you to store multiple things and name different operands. 178 00:13:50,548 --> 00:13:55,412 So, what I want to get across here is, while a stack-based architecture is very 179 00:13:55,412 --> 00:14:00,466 simple, the instructions are very dense, the, it may not be good for performance 180 00:14:00,466 --> 00:14:04,638 because you might end up having to reload values multiple times. 181 00:14:04,638 --> 00:14:09,616 Versus, if you had instruction set architecture, something like MIPS, where 182 00:14:09,616 --> 00:14:14,717 you actually have 32 general purpose registers and you can name any register 183 00:14:14,717 --> 00:14:20,217 for any instruction, you could have loaded a, b, c, d, and e all into the register 184 00:14:20,217 --> 00:14:26,001 space, once, and then, done all your operations and never have had to reload. 185 00:14:26,001 --> 00:14:31,637 And, this is actually an instruction sett issue and not, or a fundamental machine 186 00:14:31,637 --> 00:14:35,065 model issue, and not a micro-architecture issue. 187 00:14:37,074 --> 00:14:40,318 Okay. So, to summarize different machine models, 188 00:14:40,318 --> 00:14:45,310 we have our stack, accumulator, register and memory, and register, register, 189 00:14:45,310 --> 00:14:51,231 sometimes called load store architecture. If we want to take some expression here, 190 00:14:51,231 --> 00:14:57,119 like, c equals a plus b, we can look at the instructions that would have to be 191 00:14:57,119 --> 00:15:01,976 generated on an abstract architecture. So, here, we're going to have push a, push 192 00:15:01,976 --> 00:15:06,037 b, add, pop c. As we add more naming, we actually can, 193 00:15:06,037 --> 00:15:11,011 potentially, have fewer registers, or fewer instructions, excuse me. 194 00:15:11,011 --> 00:15:15,018 So, we load a, add b. Note here, we don't have to name what 195 00:15:15,018 --> 00:15:19,026 we're adding with cuz we're adding with the accumulator. 196 00:15:19,026 --> 00:15:23,052 And then, store c. Start to get a little bit more compact. 197 00:15:23,052 --> 00:15:29,066 If we have register memory, we can load one of the values, add, store, it's pretty 198 00:15:29,066 --> 00:15:33,000 simple. If we want to load store architecture, we 199 00:15:33,000 --> 00:15:36,035 actually have to do a little bit more work. 200 00:15:36,035 --> 00:15:41,097 We have to load, load, add, store. But, which, which seems inefficient, but 201 00:15:41,097 --> 00:15:47,056 if you have to reuse any of these values a, b, you don't have to go reload them. 202 00:15:47,056 --> 00:16:06,056 Versus, in these two architectures you have to go reload the value.