1 00:00:02,050 --> 00:00:05,048 Okay. So now, we're going to change topics and 2 00:00:05,048 --> 00:00:09,013 start talking about our first technical subject of this course. 3 00:00:09,013 --> 00:00:14,253 And, as an introduction to computer architecture, we're going to be talking 4 00:00:14,253 --> 00:00:17,905 about what is architecture versus microarchitecture. 5 00:00:17,905 --> 00:00:22,031 And, I want to just briefly say that, as you take this class, the first three 6 00:00:22,031 --> 00:00:25,585 lectures or so should be review. So, if you're sitting in the class and 7 00:00:25,585 --> 00:00:27,528 you're saying, oh, I've seen all this before. 8 00:00:27,528 --> 00:00:30,763 Don't get up. Wait 'till the fourth or fifth lecture, 9 00:00:30,763 --> 00:00:34,087 and then the content will become new. And this is because I want to teach 10 00:00:34,087 --> 00:00:38,020 everything from first principles and get everyone up to speed. 11 00:00:38,020 --> 00:00:41,052 But, it's that, the first three lectures are going to go very fast. 12 00:00:41,052 --> 00:00:45,049 So, if you're lost in the first three lectures, which should be review, then 13 00:00:45,049 --> 00:00:50,897 that's probably a bad in, indicator. So, we'll start off by talking about 14 00:00:50,897 --> 00:00:57,858 architecture versus micro-architecture. And I wanted to say briefly what I mean by 15 00:00:57,858 --> 00:01:01,972 architecture. And I, I have, in this slide here, a very 16 00:01:01,972 --> 00:01:06,622 large A for what I'll sometimes call, big A architecture. 17 00:01:06,622 --> 00:01:13,881 So, your, Patterson Hennessy calls this, instruction set architecture, and when I 18 00:01:13,881 --> 00:01:20,098 contrast this with micro architecture, or Patterson Hennessy calls organization. 19 00:01:21,054 --> 00:01:26,007 So, big A architecture is an abstraction layer provided to software, or 20 00:01:26,007 --> 00:01:30,081 instructions set architectures or abstraction layer provided to software 21 00:01:30,081 --> 00:01:37,051 which is designed to not change very much. And, it doesn't say, it, it says how a 22 00:01:37,051 --> 00:01:43,084 theoretical fundamental, sort of, machine executes programs. 23 00:01:44,023 --> 00:01:52,205 It does not say exactly the size of different structures, how fast those 24 00:01:52,205 --> 00:01:58,350 things would run, the exact implementation issues, that falls into organization. 25 00:01:58,350 --> 00:02:04,408 And, one of the things I wanted to emphasize is that computer architecture is 26 00:02:04,408 --> 00:02:08,422 all about trade-offs. So, when I say it's all about tradeoffs, 27 00:02:08,422 --> 00:02:12,960 you can make different design decisions up here in the big A architecture or the 28 00:02:12,960 --> 00:02:17,840 instruction set architecture, and that'll influence the application or influence the 29 00:02:17,840 --> 00:02:22,299 microarchitecture, but also you can make different design decisions down here and 30 00:02:22,299 --> 00:02:26,278 make a lot of different tradeoffs on how to go about implementing a particular 31 00:02:26,278 --> 00:02:30,044 instruction set architecture. And, largely, when you go to look at 32 00:02:30,044 --> 00:02:34,055 computer architecture and computer architecture implementation, the design 33 00:02:34,055 --> 00:02:38,010 space is relatively flat. There's sort of an optimum point where 34 00:02:38,010 --> 00:02:42,032 you, you want to be, but the other points around it are many times not horribly, 35 00:02:42,032 --> 00:02:45,027 horribly bad. Though there are, you know, at the, at the 36 00:02:45,027 --> 00:02:47,093 extremes, probably horribly bad design decisions. 37 00:02:47,093 --> 00:02:52,333 But, you know, a lot of different design points are, are equally good or, or close 38 00:02:52,333 --> 00:02:56,192 to optimal. And, the job of a computer architect is to 39 00:02:56,192 --> 00:03:01,554 make the very subtle design decisions around how do you move around this point 40 00:03:01,554 --> 00:03:06,563 to make it both easier to program, lives on for many years, is low power, and this 41 00:03:06,563 --> 00:03:12,123 sort of other, a little bit of aesthetic characteristics mixed together with just 42 00:03:12,123 --> 00:03:15,657 making your computer processor go fast, we'll say. 43 00:03:15,657 --> 00:03:20,790 And these tradeoffs, I, I will re, will reiterate this over and over again in this 44 00:03:20,790 --> 00:03:25,045 class that, because there is multiple different metrics. 45 00:03:25,045 --> 00:03:29,507 So, for instance, speed, energy, cost, and they tradeoff against each other, many 46 00:03:29,507 --> 00:03:33,009 times, there is no necessary optimal point. 47 00:03:33,009 --> 00:03:38,024 It depends on, you know, are you more cost driven, or energy driven, or speed driven. 48 00:03:38,024 --> 00:03:43,052 And, within that point, there's sort of some times Pareto optical curves where all 49 00:03:43,052 --> 00:03:48,074 of the points are, are equally good if you're trying to trade off these different 50 00:03:48,074 --> 00:03:50,639 things for different cost models. Okay. 51 00:03:50,639 --> 00:03:56,434 So, let's, let's talk about what is a instruction set architecture, and what is 52 00:03:56,434 --> 00:04:01,599 a microarchitecture. So, a instruction set architecture, or big 53 00:04:01,599 --> 00:04:08,091 A architecture is trying to provide the programmer some abstract machine model. 54 00:04:08,091 --> 00:04:13,964 And many times what it, what it really boils to is it's all the programmer 55 00:04:13,964 --> 00:04:18,103 visible state. So, for instance, how, does the machine 56 00:04:18,103 --> 00:04:21,003 have memory? Does it have registers? 57 00:04:21,003 --> 00:04:24,868 So that's the, that's the programmer visible state. 58 00:04:24,868 --> 00:04:29,921 It also encompasses the fundamental operations that the computer can run, so 59 00:04:29,921 --> 00:04:35,063 these are called instructions. And, it defines the instructions and how 60 00:04:35,063 --> 00:04:37,348 they operate. So, for instance, add. 61 00:04:37,348 --> 00:04:42,782 Add might be a fundamental instruction or fundamental operation in your compu, 62 00:04:42,782 --> 00:04:49,063 instructional set architecture. And, it says, the exact semantics on how 63 00:04:49,063 --> 00:04:55,386 to take one word in a register and add it to another word in a register, and where 64 00:04:55,386 --> 00:05:00,068 it ends, ends up. Then, there's more complicated execution 65 00:05:00,068 --> 00:05:04,053 semantics. So, what do we mean by execution 66 00:05:04,053 --> 00:05:07,236 semantics? Well, if you just say adds take two 67 00:05:07,236 --> 00:05:11,814 numbers and add them together and put them in another register, that many times does 68 00:05:11,814 --> 00:05:15,033 not encompass all of the instruction set architecture. 69 00:05:15,033 --> 00:05:19,061 You'll have other things going on, for instance, IO interrupts, and you have to 70 00:05:19,061 --> 00:05:23,023 define in your instructions set architecture, or your big A computer 71 00:05:23,023 --> 00:05:26,792 architecture what is the exact semantics of an interrupter, a instruction, or a 72 00:05:26,792 --> 00:05:31,036 piece of data coming in on an IO. How does that interact with the rest of 73 00:05:31,036 --> 00:05:34,025 the processor? So, many times instruction execution 74 00:05:34,025 --> 00:05:38,049 semantics is only half of i, and we have to worry about is the, the rest of the 75 00:05:38,049 --> 00:05:44,032 machine execution semantics. Big A architecture has to define how the 76 00:05:44,032 --> 00:05:49,307 inputs and outputs work. And finally, it has to define the data 77 00:05:49,307 --> 00:05:54,387 types and the sizes of the fundamental, the, the fundamental data words that you 78 00:05:54,387 --> 00:05:58,248 operate on. So, for instance, if you operate on a byte 79 00:05:58,248 --> 00:06:01,902 at a time, four bytes at a time, two bytes at a time. 80 00:06:01,902 --> 00:06:05,763 How big is a byte if you actually have bytes? 81 00:06:05,763 --> 00:06:12,044 So, this just gets into sizes. And then, data types here might mean that 82 00:06:12,044 --> 00:06:17,850 you have other types of fundamental data. So, for instance, the most basic one is 83 00:06:17,850 --> 00:06:23,232 you have just some bits sitting on, on, in a, in a register in your processor. 84 00:06:23,232 --> 00:06:28,996 But, it could be much more complex so you can have, for instance, something like 85 00:06:28,996 --> 00:06:33,732 floating point numbers. Where it's not just a bunch of bits, it's 86 00:06:33,732 --> 00:06:38,195 bits formatted in a particular way, and has very specific meaning. 87 00:06:38,195 --> 00:06:43,971 That's a floating point number that can range over, let's say, most of the, the 88 00:06:43,971 --> 00:06:45,480 real numbers. Okay. 89 00:06:45,480 --> 00:06:51,000 So, in today's lecture, we're going to, step through all these different 90 00:06:51,081 --> 00:06:56,076 characteristics and requirements of building an instruction set architecture. 91 00:06:56,076 --> 00:07:02,002 I wanted to, I will talk about how it's different than microarchitecture or 92 00:07:02,002 --> 00:07:05,004 organization. So, let's take up some examples of 93 00:07:05,004 --> 00:07:10,025 microarchitecture and organization. So, what microarchitecture and 94 00:07:10,025 --> 00:07:15,500 organization is really thinking about here is the tradeoffs as you're going to 95 00:07:15,500 --> 00:07:19,093 implement a fixed instruction set architecture. 96 00:07:19,093 --> 00:07:26,047 So, for instance, something like Intel's x86 is an instruction set architecture. 97 00:07:26,047 --> 00:07:30,000 And there's many different microarchitecture implementations. 98 00:07:30,000 --> 00:07:34,034 There's the AMD versions of the chips, and then there's the Intel versions of the 99 00:07:34,034 --> 00:07:37,372 chips, and even inside of, let's say, the Intel versions of the chips. 100 00:07:37,372 --> 00:07:42,120 They have their high performance version for the laptop which looks one way, or, or 101 00:07:42,120 --> 00:07:46,375 high performance version for, let's say, a server or a high end laptop which looks 102 00:07:46,375 --> 00:07:48,518 one way. And then, there's another chip for 103 00:07:48,518 --> 00:07:51,079 tablets. Intel's trying to chips for tablets these 104 00:07:51,079 --> 00:07:56,035 days and they have their Atom processors. And, internally, they look very different 105 00:07:56,035 --> 00:07:59,062 cuz they have very different speed, energy, cost, tradeoffs. 106 00:07:59,098 --> 00:08:05,864 But, they'll execute the same code, and they all implement the same instruction 107 00:08:05,864 --> 00:08:09,750 set architecture. So, let's look at some examples of things 108 00:08:09,750 --> 00:08:14,010 that you might tradeoff in a microarchitecture. 109 00:08:14,010 --> 00:08:18,224 So, you might have different pipeline depth, numbers of pipelines. 110 00:08:18,224 --> 00:08:24,459 So, you might have one processor pipeline, or you might have six , like something 111 00:08:24,459 --> 00:08:30,677 like the Core i7's today, cache sizes, how big the chip is, the silicone area, how, 112 00:08:30,677 --> 00:08:34,011 what's your peak power. Execution ordering. 113 00:08:34,011 --> 00:08:38,059 Well, does the code run in order, or can you execute the code out of order? 114 00:08:38,059 --> 00:08:41,061 That's right. It is possible to take a sequential 115 00:08:41,061 --> 00:08:46,016 program, and actually execute later portions of the program before earlier 116 00:08:46,016 --> 00:08:50,009 portions of the program. That's kind of mind boggling, but it's a 117 00:08:50,009 --> 00:08:54,575 way to go about getting parallelism. And if you keep your ordering correct, 118 00:08:54,575 --> 00:08:58,657 things, things, work out. Bus widths, ALU widths, if you, if you 119 00:08:58,657 --> 00:09:03,523 have, let's say, 64-bit machine, you can actually go and implement that as a bunch 120 00:09:03,523 --> 00:09:08,095 of 1-bit adder, for instance, and people have done things like that in the micro 121 00:09:08,095 --> 00:09:11,185 architecture. And, this allows you to build more 122 00:09:11,185 --> 00:09:15,024 expensive or less expensive versions of the same processor. 123 00:09:16,077 --> 00:09:22,048 So, let's talk about the history of why we came up with these two differentiations 124 00:09:22,048 --> 00:09:25,048 between architecture and microarchitecture. 125 00:09:25,084 --> 00:09:31,346 And, it came about, because software is sort of, pushed it on us and ended up 126 00:09:31,346 --> 00:09:40,113 being a nice abstraction layer. So, back in the early '50s, late '40s, you 127 00:09:40,113 --> 00:09:45,997 had software that people mostly programmed either in assembly language, or machine 128 00:09:45,997 --> 00:09:48,720 code language. So, you had to write ones and zeros, or 129 00:09:48,720 --> 00:09:52,508 you had to write assembly code. And, sometime in the, the mid '50s we 130 00:09:52,508 --> 00:09:57,055 started to see library showoffs. So, these are sort of, floating point 131 00:09:57,055 --> 00:10:01,588 operations were made easier, we had transcendentals as the sine, cosine 132 00:10:01,588 --> 00:10:05,166 libraries, you had some matrix and equation solvers. 133 00:10:05,166 --> 00:10:10,163 And, you started to see some libraries that people could call, but people were 134 00:10:10,163 --> 00:10:15,070 not necessarily writing code by themselves or writing large bodies of code in 135 00:10:15,070 --> 00:10:18,794 assembly programming because it's, it was pretty painful. 136 00:10:18,794 --> 00:10:24,321 And then, at some point, there was the invention of higher-level languages. 137 00:10:24,321 --> 00:10:30,233 So, a good example of this was Fortran that came out in 1956, and a lot of things 138 00:10:30,233 --> 00:10:34,434 came along with this. We had assemblers, loaders, linkers, 139 00:10:34,434 --> 00:10:40,281 compilers, bunch of other software to track how your software's being used even. 140 00:10:40,281 --> 00:10:47,294 And, because we started to see these higher-level languages, this started to 141 00:10:47,294 --> 00:10:54,065 give some portability to programming. It wasn't that you had to write your 142 00:10:54,065 --> 00:10:58,046 program and have it only mapped to one prog, one processor ever. 143 00:10:58,091 --> 00:11:03,191 And, back in the, the, the '50s, even '60s time frame here, machines required 144 00:11:03,191 --> 00:11:06,989 experienced operators who could write the programs. 145 00:11:06,989 --> 00:11:12,485 And, you know, you, you got these machines and they had to be sold with a lot of 146 00:11:12,485 --> 00:11:17,946 software along with them so you had to, basically, run all the software that was 147 00:11:17,946 --> 00:11:23,065 given cuz it was, you had to be a, a master programmer or someone who worked 148 00:11:23,065 --> 00:11:28,230 for the company to even, that built the machines to even be able to program these 149 00:11:28,230 --> 00:11:33,398 machines back in, in the day. And, the idea of instruction set 150 00:11:33,398 --> 00:11:38,935 architectures, and these breaking the microarchitecture from the architecture 151 00:11:38,935 --> 00:11:46,035 didn't really exist back then. And, back in the early '60s, IBM had four 152 00:11:46,035 --> 00:11:51,154 different product lines. And, they're all incompatible. 153 00:11:51,154 --> 00:11:55,287 So, you couldn't run code that you ran on one on the other. 154 00:11:55,287 --> 00:12:00,574 So, to give you an example here, the, the IBM 701 was for scientific computing. 155 00:12:00,574 --> 00:12:03,996 The, the 1401 was mostly for business computation. 156 00:12:03,996 --> 00:12:09,860 I think they even had a second one that was sort of for business, but different 157 00:12:09,860 --> 00:12:14,459 types of business computation. And, people sort of, bought into a line. 158 00:12:14,459 --> 00:12:19,164 And then, as you, as the line matured and developed, they had to either rewrite 159 00:12:19,164 --> 00:12:21,801 their code, or they had to stick into one line. 160 00:12:21,801 --> 00:12:26,957 But, IBM had some, had some crazy insights here is that, they didn't want to have to, 161 00:12:26,957 --> 00:12:31,233 when they went to the next generation of processor, they wanted one to propagate 162 00:12:31,233 --> 00:12:34,613 these four lines. They wanted to try to unify the four 163 00:12:34,613 --> 00:12:37,493 lines. But, one of the problems was, these 164 00:12:37,493 --> 00:12:42,567 different lines had very different implementations and different cross 165 00:12:42,567 --> 00:12:45,390 points. So, the thing you were building for 166 00:12:45,390 --> 00:12:50,496 scientific computing wasn't necessarily the thing you want to build for business 167 00:12:50,496 --> 00:12:54,041 computing. And, the one that you built for business 168 00:12:54,041 --> 00:12:59,707 computing, let's say, didn't, you wanted to not have it have very good floating 169 00:12:59,707 --> 00:13:03,686 point performance. So, how do, how do they go about solving 170 00:13:03,686 --> 00:13:06,263 this? And their solution was they came up 171 00:13:06,263 --> 00:13:11,337 something called the IBM 360. And, the IBM 360 is probably the first 172 00:13:11,337 --> 00:13:17,325 true instruction set architecture that was implemented to be instruction set 173 00:13:17,325 --> 00:13:20,831 architecture. And, the idea here was they wanted to 174 00:13:20,831 --> 00:13:26,718 unify all these product lines into one platform, but then implement different 175 00:13:26,718 --> 00:13:31,072 versions that were specialized for the different market matrix. 176 00:13:32,012 --> 00:13:37,096 So, they can build, they could unify a lot of their software system, unify a lot of 177 00:13:37,096 --> 00:13:41,064 what they built, but still build different versions. 178 00:13:41,064 --> 00:13:46,677 So, let's, let's take a look at the IBM 360 Instruction Set Architecture, and then 179 00:13:46,677 --> 00:13:52,048 talk about different microarchitectures that have been built of the IBM 360. 180 00:13:53,015 --> 00:13:58,052 So, the IBM 360 is a general purpose register machine, and we'll talk more 181 00:13:58,052 --> 00:14:04,019 about that later in this lecture. But, to give you an idea, this is what the 182 00:14:04,019 --> 00:14:07,027 programmer saw, or what the software system saw. 183 00:14:07,027 --> 00:14:12,079 This isn't what was actually built in the hardware, because that would be a 184 00:14:12,079 --> 00:14:17,086 microarchitecture constraint. But, the processor state had sixteen 185 00:14:17,086 --> 00:14:22,267 general purpose 32-bit registers. It had four floating point registers. 186 00:14:22,859 --> 00:14:30,051 It had control, flags if you will, had a, a condition codes and control flags. 187 00:14:30,051 --> 00:14:34,200 And, it was a 24-bit address machine, and at the time that was huge. 188 00:14:34,200 --> 00:14:39,856 So, two to the 24 was a very large number. Nowadays, it's not so large and they've 189 00:14:39,856 --> 00:14:42,982 since expanded that on the IBM 360 successors. 190 00:14:42,982 --> 00:14:48,381 But , they thought it was good for many, many years, and it was good for many, many 191 00:14:48,381 --> 00:14:52,064 years. And they define a bunch of different data 192 00:14:52,064 --> 00:14:55,059 formats. So, there's 8-bit bytes, 16-bit half 193 00:14:55,059 --> 00:15:01,016 words, 32-bit words, 64-bit double words. And these were the fundamental data types 194 00:15:01,016 --> 00:15:06,045 that you can work on, and you can name these different fundamental data types. 195 00:15:06,045 --> 00:15:12,001 And, it was actually the IBM 360 that came up with this idea that bytes should be 196 00:15:12,001 --> 00:15:18,106 8-bits long, and that's lived on, on to, for today, Cuz before that, we had lots of 197 00:15:18,106 --> 00:15:24,033 different choices. There was binary code decimal systems 198 00:15:24,033 --> 00:15:29,095 where the, you actually would encode a number between zero and nine and then you 199 00:15:29,095 --> 00:15:34,048 have the, each digits and this is sometimes good for, sort of, spreadsheet 200 00:15:34,048 --> 00:15:39,045 calculations, or business calculations, or if you want to be very precise on your 201 00:15:39,045 --> 00:15:43,080 rounding to the penny. And sometimes, bit-based things don't 202 00:15:43,080 --> 00:15:47,077 actually round appropriately or the, do a, or you'll lose pennies off the end. 203 00:15:47,077 --> 00:15:51,631 And, so you have these binary code decimal systems and, well, in IBM 360, they, they 204 00:15:51,631 --> 00:15:56,594 unified it all and said, well, no, we're going to throw out certain things and make 205 00:15:56,594 --> 00:15:59,867 choices. Now, they, of course, because it's the IBM 206 00:15:59,867 --> 00:16:04,187 360 and they did have business applications, they still supported binary 207 00:16:04,187 --> 00:16:09,998 code and decimal in a, a certain way. And, let's look at the microarchitecture 208 00:16:09,998 --> 00:16:14,700 implementations of this first instruction set architecture. 209 00:16:14,700 --> 00:16:20,105 So, at, and this is in the same time frame, the same generation here. 210 00:16:20,105 --> 00:16:25,587 There was the model 30 and the model 70 and this was very, very different 211 00:16:25,587 --> 00:16:30,647 performance characteristics. So, if we, we look at the machine, let's 212 00:16:30,647 --> 00:16:35,781 start off by looking at the storage. The, the low end model here had between 213 00:16:35,781 --> 00:16:42,001 eight and 64 kilobytes, and the high end model had between 256 and 512 kilobytes. 214 00:16:42,001 --> 00:16:47,007 So, very, very different sizes. And, this is what I'm trying to get across 215 00:16:47,007 --> 00:16:51,955 here is that microarchitecture can actually change quite a bit even though 216 00:16:51,955 --> 00:16:57,698 the architecture supports 64-bit adds in additions, you can actually implement 217 00:16:57,698 --> 00:17:02,038 different size data paths. So, in the low end machine, they had an 218 00:17:02,038 --> 00:17:07,544 8-bit data path, and for ones that use 64-bit operation, it had to do eight, 219 00:17:07,545 --> 00:17:10,582 8-bit operations to make up a 64-bit operation. 220 00:17:10,582 --> 00:17:15,801 And then, probably, actually even do more than that to handle all the carries 221 00:17:15,801 --> 00:17:20,746 correctly, versus the high-end implementation had a full adder there. 222 00:17:20,746 --> 00:17:26,993 You can actually do a 64-bit add by itself without having to do lots of 223 00:17:26,993 --> 00:17:32,635 micro-sequenced operations. And, oh, yes, with minor modifications, it 224 00:17:32,635 --> 00:17:36,645 lives on today. So, this was designed in the '60s, and 225 00:17:36,645 --> 00:17:40,867 even today we still have System 360 derivative machines. 226 00:17:40,867 --> 00:17:47,247 And the piece of code you ran, or you wrote back in 1965, will still run on 227 00:17:47,247 --> 00:17:52,377 these machines today, which is pretty, pretty amazing, natively. 228 00:17:52,377 --> 00:18:00,248 So, how does this survive on today? So, here's actually, the IBM 360 47 years 229 00:18:00,248 --> 00:18:08,702 later as in the Z11 microprocessor. So, the IBM 360 has since, it renamed to 230 00:18:08,702 --> 00:18:14,899 the IBM 370, and then it has been renamed to the IBM 370EX which was in the '80. 231 00:18:14,899 --> 00:18:18,544 There was never any IBM 380, strangely enough. 232 00:18:18,544 --> 00:18:23,571 And then, later on, they just changed the name to the Z series. 233 00:18:23,571 --> 00:18:29,256 So, have a, a cooler modeling, model numbers here so we had the IBM Z series 234 00:18:29,256 --> 00:18:35,351 processors, and this lives on today. So, going back to that 8-bit processor 235 00:18:35,351 --> 00:18:42,019 which had a one microsecond control store read, which is forever, we now have the 236 00:18:42,019 --> 00:18:47,753 Z11 which is running at 5.2 gigahertz. It has 1.4 billion transistors. 237 00:18:47,753 --> 00:18:55,028 They, they have updated the addressing so it's no longer 24-bit addressing, but it 238 00:18:55,028 --> 00:18:59,037 still supports the original 360 addressing. 239 00:18:59,037 --> 00:19:08,027 It has four cores, out of order issue, out of order memory system, big caches on, on 240 00:19:08,027 --> 00:19:14,405 chip, 24 megabytes of your L3 cache. And, you can even put multiple of these 241 00:19:14,405 --> 00:19:20,771 together to build a multiprocessor system out of lots and lots of multicores. 242 00:19:20,771 --> 00:19:26,364 And, what I'm trying to get across here is that, if you go forward over time and you 243 00:19:26,364 --> 00:19:29,092 build your instruction set architecture correct, it can live on. 244 00:19:29,092 --> 00:19:33,405 And you have many different microarchitecture implementations and 245 00:19:33,405 --> 00:19:41,245 still leverage the same software. And, a few, few more examples just to, to 246 00:19:41,245 --> 00:19:45,082 reinforces a little bit more. Let's take a look at an example of 247 00:19:45,082 --> 00:19:49,079 something where you have the same architecture but different 248 00:19:49,079 --> 00:19:54,036 microarchitectures. So, here we have the AMD Phenom X4, and 249 00:19:54,036 --> 00:19:57,517 here we have the Atom, Intel Atom processor. 250 00:19:57,517 --> 00:20:03,482 The first Intel Atom processor. And, what you'll notice, actually, is that 251 00:20:03,482 --> 00:20:07,912 they have the exact same instruction set architecture. 252 00:20:07,912 --> 00:20:14,085 They both run x86 code. And, the Zion implementations, this is, 253 00:20:14,085 --> 00:20:18,557 just to point out here, these are the same time frames. 254 00:20:18,557 --> 00:20:23,063 So, this is a modern, modern, roughly, modern day processors. 255 00:20:23,063 --> 00:20:29,083 This one has four cores, 125 watts. Here, we have, single core two watts. 256 00:20:29,083 --> 00:20:35,059 So, there's design tradeoffs. So, you're going to want to build 257 00:20:35,059 --> 00:20:42,083 different processors in the same design technology, we'll say, but with very 258 00:20:42,083 --> 00:20:46,223 different cost, power, performance tradeoffs. 259 00:20:46,223 --> 00:20:52,666 This one can decode three instructions. This one can decode two instructions so 260 00:20:52,666 --> 00:20:55,803 it's a different micro architecture difference. 261 00:20:55,803 --> 00:21:00,751 This one has a 64 kilobyte cache. L1 is good as a 32 kilobyte L1i cache. 262 00:21:00,751 --> 00:21:06,436 Very different cache sizes, even though they're employing the same architecture, 263 00:21:06,436 --> 00:21:10,951 or big A architecture. Strangely enough, they have the same L2 264 00:21:10,951 --> 00:21:15,653 size, you know, things happen. This ones out of order versus in order, 265 00:21:15,653 --> 00:21:24,263 and clock speeds are very different. And, I want to contrast this with 266 00:21:24,263 --> 00:21:30,888 different architecture, or different big A architecture, and different micro 267 00:21:30,888 --> 00:21:34,809 architecture. So, if we think about some different 268 00:21:34,809 --> 00:21:40,351 examples of instruction set architectures, there's x86, there's PowerPC, there's IBM 269 00:21:40,351 --> 00:21:45,536 360, there's Alpha, there's ARM. You've probably heard all these different 270 00:21:45,536 --> 00:21:49,068 names, and these are different instruction set architectures. 271 00:21:49,068 --> 00:21:54,093 So, you can't run the same software on those two different instruction set 272 00:21:54,093 --> 00:21:58,020 architectures. So, here we have an example of two 273 00:21:58,020 --> 00:22:03,059 different instruction set architectures with two different microarchitectures. 274 00:22:03,059 --> 00:22:08,998 So, we have the Phenom X4 here, versus the IBM Power seven. 275 00:22:09,001 --> 00:22:14,176 And, we already talked about the, the X4 here, but the Power seven has the power 276 00:22:14,176 --> 00:22:18,863 instruction set, which is different than the x86 instruction set. 277 00:22:18,863 --> 00:22:24,621 So, you can't run one piece of code that's compiled for this over here, and vice 278 00:22:24,621 --> 00:22:28,866 versa. And, the microarchitectures are different. 279 00:22:28,866 --> 00:22:33,557 So, here, we have eight core, 200 watts, can decode six instructions per cycle. 280 00:22:33,557 --> 00:22:39,325 Wow, this is a, a pretty beefy processor. It's also out of order and has the same 281 00:22:39,325 --> 00:22:43,544 clock frequency. Something that I, that can also happen is 282 00:22:43,544 --> 00:22:48,757 you can end up with architectures where you have different instruction set 283 00:22:48,757 --> 00:22:52,821 architecture, or different big A architecture, but almost the same 284 00:22:52,821 --> 00:22:56,481 microarchitecture. And, this, this does, this does happen. 285 00:22:56,481 --> 00:23:01,779 So , you end up with, let's say, two processors that are both three wide issue, 286 00:23:01,779 --> 00:23:07,044 same cache sizes, but, let's say, one of the implements PowerPC and the other one 287 00:23:07,044 --> 00:23:10,572 implements x86. And things, things like that do happen. 288 00:23:10,572 --> 00:23:15,364 That's more of a coincidence, but I'm trying to get across the idea that many 289 00:23:15,364 --> 00:23:19,933 times the, that the microarchitectures can be the same and those are more tradeoffs 290 00:23:19,933 --> 00:23:31,889 considerations versus the instruction set architecture which is more of a software 291 00:23:31,889 --> 00:23:39,075 programming design constraint.