1 00:00:03,072 --> 00:00:13,189 Okay, so I wanna just briefly give a case study here of one of the more interesting, 2 00:00:13,189 --> 00:00:20,476 modern day VLIW architectures. Or probably the most famous and possibly 3 00:00:20,476 --> 00:00:25,532 also the most infamous VLIW processor out there. 4 00:00:25,532 --> 00:00:33,115 This is the Intel Itanium, also known as the, Intel I64, or what's known as an EPIC 5 00:00:33,115 --> 00:00:41,733 processor, Explicitly Parallel Instruction Computing architecture. 6 00:00:41,733 --> 00:00:47,970 And a lot of this work actually. Was done. 7 00:00:47,970 --> 00:00:54,682 In collaboration between Intel and HP. Hp uses these a lot in their big servers. 8 00:00:54,682 --> 00:00:58,323 Their sort of big mainframe, well, not quite mainframes. 9 00:00:58,323 --> 00:01:03,600 But big, big, heavy, big iron computers. And Intel was trying to use this to 10 00:01:03,600 --> 00:01:07,477 effectively kill all of the other workstation vendors. 11 00:01:07,477 --> 00:01:11,292 And this was gonna be their 64-bit solution to computing. 12 00:01:11,292 --> 00:01:18,116 So it's a modern, non-classical VILW, and this was going to be Intel's chosen ISA. 13 00:01:18,116 --> 00:01:24,382 There was, they were, they were going to deprecate X-86, and choose IA-64 as the 64 14 00:01:24,382 --> 00:01:27,612 bit ISA. And as we now know, going a few, few years 15 00:01:27,612 --> 00:01:32,368 forward after the creation of all this stuff, that didn't really happen. 16 00:01:32,368 --> 00:01:38,087 Intel went and did this, it built a bunch of processors with this instruction set. 17 00:01:38,087 --> 00:01:44,582 You can still buy processors with this instruction set, but it never got, as, as 18 00:01:44,582 --> 00:01:51,674 good of a, acceptance as competitor. The competitor is, was at the time was 19 00:01:51,674 --> 00:01:56,076 called AMD64, which is a 64 bit extension, to what people already had. 20 00:01:56,076 --> 00:02:00,628 And that's when people ended up wanting, it's just a 64-bit extension to what we 21 00:02:00,628 --> 00:02:04,077 already had versus, you know, something totally different. 22 00:02:04,077 --> 00:02:09,473 Okay, so couple of features here is object code compatible VLIW, so, it's not quite a 23 00:02:09,473 --> 00:02:13,756 VLIW in a classical sense, it's object code compatible, which means different 24 00:02:13,756 --> 00:02:19,715 generations, different micro-architectures in this VLIW can have the same instruction 25 00:02:19,715 --> 00:02:23,611 code in the same binaries, you know, to recompile. 26 00:02:23,611 --> 00:02:29,871 And how they did this is effectively, as I alluded to before, they had the ability to 27 00:02:29,871 --> 00:02:33,722 have parallelism straddle across instruction bundles. 28 00:02:33,722 --> 00:02:39,535 And they had this notion of groups which we'll talk about in a second. 29 00:02:39,535 --> 00:02:45,162 So, the first few implementations of this Merced, was the first Intel Itanium 30 00:02:45,162 --> 00:02:48,332 implementation. It was kind of like the 8086 or x86. 31 00:02:48,332 --> 00:02:53,804 But Merced has, has lots of things that you'll realize, if you look at Intel 32 00:02:53,804 --> 00:02:57,569 codewords and Intel code names named after a river. 33 00:02:57,569 --> 00:03:01,921 Intel likes to name their things after either rivers or places. 34 00:03:01,921 --> 00:03:07,920 I think this has something to do with it, its, you can't trademark a, a place name, 35 00:03:07,920 --> 00:03:13,289 so they, they just sort of get around that and make sure they don't have any 36 00:03:13,289 --> 00:03:19,141 Trademark issues by choosing place names with all their code names, One of the big 37 00:03:19,141 --> 00:03:22,367 problems here, was supposed to ship in 1997. 38 00:03:22,367 --> 00:03:26,223 First customer shipment not until 2001. It's a four year miss. 39 00:03:26,223 --> 00:03:31,085 And superscalar was another thing that sort of had caught up on it at that time. 40 00:03:31,085 --> 00:03:35,091 And it was supposed to be faster and better than everything else. 41 00:03:35,091 --> 00:03:38,079 And the first, the first one was not very good. 42 00:03:38,079 --> 00:03:43,770 It had cold, low clock rates, and was not as high performance as it was supposed to 43 00:03:43,770 --> 00:03:45,097 be. And sort of the, the. 44 00:03:45,097 --> 00:03:50,345 X86 side of Intel's business line, actually, had almost the same performance 45 00:03:50,345 --> 00:03:55,463 as, as the first Itanium, and then very quickly surpassed the first Itanium. 46 00:03:55,463 --> 00:03:58,625 So, their high end processor wasn't actually high end. 47 00:03:58,625 --> 00:04:03,685 Couple, couple other things here, so, McKinley was the second implementation, 48 00:04:03,685 --> 00:04:08,994 shipped pretty quickly after that. This was much better implementation, but, 49 00:04:08,994 --> 00:04:14,311 you know, it's still, still hard to do, but we're still building these things. 50 00:04:14,311 --> 00:04:18,814 So, in 2011 at ISSCC, the Intel introduced the Poulson processor. 51 00:04:18,814 --> 00:04:24,224 Big, big, machine here, eight cores and 32 nanometer, lots and lots of RAM. 52 00:04:24,224 --> 00:04:29,408 We'll, we'll look at that, yeah, so, so 32 megabytes of shared L3 cache, big, big 53 00:04:29,408 --> 00:04:33,551 processor. 544 square millimeters, in 32 nanometer. 54 00:04:33,551 --> 00:04:38,466 So, at the time this came out, this was the biggest processor ever built, most 55 00:04:38,466 --> 00:04:43,916 number of transistors, over three billion transistors, or at least the biggest 56 00:04:43,916 --> 00:04:49,624 commercial thing, Intel might have had a research prototype, I think, that might 57 00:04:49,624 --> 00:04:54,881 have had more transistors than this. I think their, Multicore processor or 58 00:04:54,881 --> 00:05:00,670 they're they call it the SCC their, their single chip cloud computer might have had 59 00:05:00,670 --> 00:05:04,057 it more but I know I should know the transistor count. 60 00:05:04,057 --> 00:05:07,176 But from a commercial processor perspective, huge chip. 61 00:05:07,176 --> 00:05:10,682 But they are selling into extremely expensive sort of sockets. 62 00:05:10,682 --> 00:05:14,644 There is a seller ship for premium was going into big main frames. 63 00:05:14,644 --> 00:05:18,030 That was not what was this was originally destined for. 64 00:05:18,030 --> 00:05:22,027 It was destined for both big main frames and work stations. 65 00:05:22,027 --> 00:05:25,127 But now this is sort of the in 2012 standing here now. 66 00:05:25,127 --> 00:05:30,160 This is not used in lots of other places except for sort of bigger, bigger hardware 67 00:05:30,160 --> 00:05:35,674 or mainframe sort of things. So a few of the interesting here is the 68 00:05:35,674 --> 00:05:41,383 cores are multi-threaded and you can execute six instructions. 69 00:05:41,383 --> 00:05:48,420 You can, you can fetch six instructions per cycle and you can execute up to twelve 70 00:05:48,420 --> 00:05:53,229 instructions per cycle. Per core, and then there's eight cores. 71 00:05:53,229 --> 00:05:58,569 So this is a beast of a machine. Very, very high performance computer. 72 00:05:58,569 --> 00:06:04,273 Okay, so let's dive into some of the details here of Itanium. 73 00:06:04,273 --> 00:06:11,691 Itanium has a 128-bit instruction bundle, and inside of there you can fit the three 74 00:06:11,691 --> 00:06:17,747 operations, and then there is some word called template bits, which sort of says 75 00:06:17,747 --> 00:06:22,411 what is in the instruction bundle. So it's not actually a fixed format 76 00:06:22,411 --> 00:06:26,568 bundle, these instruction boundaries can move around a little bit. 77 00:06:26,568 --> 00:06:31,765 And they did that so you can sort of mix in something like a immediate instruction 78 00:06:31,765 --> 00:06:36,411 with instruction, which doesn't have immediate, and get more space in the 79 00:06:36,411 --> 00:06:41,443 bundle for the immediate bits, so you can have or, or branch offset or something 80 00:06:41,443 --> 00:06:45,947 like that. These template bits also describe how a 81 00:06:45,947 --> 00:06:49,847 particular bundle relates to other bundles around it. 82 00:06:49,847 --> 00:06:55,314 So sometimes these are called begin and end bits, or start and stop bits. 83 00:06:55,314 --> 00:07:01,020 So it says the number of instructions which can execute explicitly in parallel. 84 00:07:01,020 --> 00:07:05,875 And the machine doesn't necessarily have to execute these in parallel. 85 00:07:05,875 --> 00:07:10,437 So for instance, if you say twenty instructions can execute, or twenty 86 00:07:10,437 --> 00:07:16,651 operations can execute in parallel, but your machine's only two wide or they built 87 00:07:16,651 --> 00:07:20,408 it two wide, implementation of Itanium or I-64. 88 00:07:20,408 --> 00:07:26,579 You're just are gonna execute, you know, two wide for ten cycles, or something like 89 00:07:26,579 --> 00:07:29,353 that. But, what's really cool here is the 90 00:07:29,353 --> 00:07:35,420 compiler is able, just like all the other VLIWs, to express the parallelism to the 91 00:07:35,420 --> 00:07:39,363 machine explicitly. Some interesting things about the 92 00:07:39,363 --> 00:07:42,923 registers. They, because this is a VLIW processor, 93 00:07:42,923 --> 00:07:48,822 and because you're gonna have to do code scheduling like what we saw in last class, 94 00:07:48,822 --> 00:07:52,066 that increases the general purpose register pressure. 95 00:07:52,066 --> 00:07:56,371 You don't have a register renamer. So you can't go and use different names 96 00:07:56,371 --> 00:07:58,753 for things. And the hardware's not gonna rename things 97 00:07:58,753 --> 00:08:01,371 for you. So instead, the compiler and the software 98 00:08:01,371 --> 00:08:05,090 gonna have to do the renaming. So they had 128 general purpose registers 99 00:08:05,090 --> 00:08:09,365 and another 128 floating point registers. And they also have these predicate 100 00:08:09,365 --> 00:08:11,861 registers. So, they're not quite full predication, 101 00:08:11,861 --> 00:08:14,511 but they're pretty close to full predication. 102 00:08:14,511 --> 00:08:19,052 So you can have bits that say whether our later instructions are gonna execute or 103 00:08:19,052 --> 00:08:22,379 not and you have to compute that into a little register file. 104 00:08:22,379 --> 00:08:26,024 So they had a predicate register file that you have to bypass. 105 00:08:26,024 --> 00:08:28,355 So that's, that's sort of interesting to see. 106 00:08:28,355 --> 00:08:32,835 And then they had the, really interesting feature here, which is called, ruh, 107 00:08:32,835 --> 00:08:36,314 rotating register file. And let's, let's talk about what a 108 00:08:36,314 --> 00:08:42,468 rotating register file is. So the problem this is trying to solve, is 109 00:08:42,468 --> 00:08:47,283 in a code sequence as we saw before, in last lecture. 110 00:08:47,283 --> 00:08:58,010 If you have, if you have a very long instruction word, scheduled piece of code, 111 00:08:58,010 --> 00:09:03,094 and you want to get good performance, you're going to have to unroll the loop, 112 00:09:03,094 --> 00:09:07,017 and then you're going to have to software pipeline the loop. 113 00:09:07,017 --> 00:09:11,069 But when you do this, this is going to increase your register pressure or 114 00:09:11,069 --> 00:09:15,095 increase your register names, how many register names you need to use. 115 00:09:15,095 --> 00:09:21,002 And, as we saw, you're gonna have to add extra special code in the prologue and the 116 00:09:21,002 --> 00:09:24,029 epilogue, which are different than the main loop body. 117 00:09:25,024 --> 00:09:28,072 So how do you solve this in one fell swoop? 118 00:09:28,072 --> 00:09:35,370 Well, you add a subset of your register space, which will sort of statically 119 00:09:35,370 --> 00:09:43,034 rename itself, every loop iteration. So slightly change the, the loop iteration 120 00:09:43,034 --> 00:09:47,090 or change the, the naming of the registers. 121 00:09:47,090 --> 00:09:53,000 And what this looks like, is if you go to access let's say, register R1. 122 00:09:53,000 --> 00:09:58,018 There's a register, sort of a architectural enabled register called the 123 00:09:58,018 --> 00:10:03,079 rotating register base, or RRB here, which has a value that gets added to this. 124 00:10:03,079 --> 00:10:09,048 And it's marginal arithmetic, so it rolls around at the end and that points to 125 00:10:09,048 --> 00:10:13,012 different locations in the physical register file. 126 00:10:13,071 --> 00:10:18,021 Oh, this is pretty cool. So, what we're gonna do is every single 127 00:10:18,021 --> 00:10:23,074 time we come to a new loop iteration, we're going to change the RRB, and it's 128 00:10:23,074 --> 00:10:27,016 going to point to a different set of registers. 129 00:10:27,016 --> 00:10:32,075 And we can actually effectively software pipeline just by using this one, one 130 00:10:32,075 --> 00:10:36,817 feature. So, here we have the same code sequence we 131 00:10:36,817 --> 00:10:41,022 had from last lecture, so is the, the previous code example. 132 00:10:41,022 --> 00:10:48,681 And if we recall, when we unrolled all of this, what we ended up with was a load, an 133 00:10:48,681 --> 00:10:53,276 add, a store. We'll talk about this in a second. 134 00:10:53,276 --> 00:10:59,384 This was kind of the, the key thing that we were trying to execute, and if we have 135 00:10:59,384 --> 00:11:03,092 to unroll this we just had to unroll the code and then look at the dependencies. 136 00:11:03,092 --> 00:11:11,027 So, let's look at the dependencies here. Well, dependencies that we're gonna have 137 00:11:11,027 --> 00:11:18,053 is, this load writes F1, or, the floating point register F1 here. 138 00:11:18,053 --> 00:11:23,093 And. We know that this is actually getting one 139 00:11:23,093 --> 00:11:27,024 get read. Let's say, the leniency of this is one, 140 00:11:27,024 --> 00:11:30,048 two, three, cycle and doesn't get read til here. 141 00:11:31,053 --> 00:11:37,729 Likewise, this add here computes P10 and the add, let's say, is a floating point 142 00:11:37,729 --> 00:11:45,012 add, it has some long latency and down here is when it's ready into the store. 143 00:11:47,001 --> 00:11:52,069 So on something like a, a team of rotating register file, we don't actually generate 144 00:11:52,069 --> 00:11:56,000 all this code. Instead, we generate one instruction. 145 00:11:56,000 --> 00:12:01,021 This which is going to take care of our epilogue, our prologue, our prologue, our 146 00:12:01,021 --> 00:12:06,044 epilogue, and the main loop. And, what we're gonna do is we encode the 147 00:12:06,044 --> 00:12:11,039 distance in register numbers between these two values here. 148 00:12:11,039 --> 00:12:18,019 So, what this means is, if this writes F1, and one, two, three loop iterations in the 149 00:12:18,019 --> 00:12:24,091 future wanna read that value, we encode that here with a register number that is 150 00:12:24,091 --> 00:12:28,054 that number off. And then likewise here. 151 00:12:28,054 --> 00:12:33,011 So this would be F1 to F4, because, it's off by three. 152 00:12:33,011 --> 00:12:38,008 And here, this writes F5. And we know this one's to be read, one, 153 00:12:38,008 --> 00:12:43,852 two, three, four later, so we encoded it with a register number that's forward into 154 00:12:43,852 --> 00:12:47,086 the future. And now we're going to talk about this 155 00:12:47,086 --> 00:12:51,398 instruction here. So what this is going to do is it's going 156 00:12:51,398 --> 00:12:57,121 to change the routine register base number or the RRB, and it's going to bump it by 157 00:12:57,121 --> 00:13:00,968 one. So we can basically just keep branching to 158 00:13:00,968 --> 00:13:04,683 itself here. And each time we do it, the, all the 159 00:13:04,683 --> 00:13:10,532 registers going to change names. So by the time this is ready, or by the 160 00:13:10,532 --> 00:13:15,062 time the load is ready here. These other values will have sort of 161 00:13:15,062 --> 00:13:19,123 caught up with it, where the physical register that they're actually going to 162 00:13:19,123 --> 00:13:21,650 look at will now point to the correct location. 163 00:13:21,650 --> 00:13:26,016 So we can effectively encode into one instruction here all of this, including 164 00:13:26,016 --> 00:13:29,314 the prologue and the epilogue, using this rotating register file. 165 00:13:29,314 --> 00:13:34,832 Okay, so last, last slide of today. Why do I think Itanium, I think we can 166 00:13:34,832 --> 00:13:40,516 pretty confidently say, failed? I actually don't think it was a lot of the 167 00:13:40,516 --> 00:13:43,583 ideas. I think some of it, a lot of it had to do 168 00:13:43,583 --> 00:13:48,249 with the implementation. So, first off, if you tied the hands of 169 00:13:48,249 --> 00:13:56,657 the micro architect, they're gonna scream. So, I64 added a lot of architectural, 170 00:13:56,657 --> 00:14:00,439 big-A architecture, ISA level features, in order to get specular parallelism. 171 00:14:00,439 --> 00:14:05,044 And a lot of this stuff was implemented and talked about, but never actually built 172 00:14:05,044 --> 00:14:08,104 into real processors. So, people didn't go through the effort, 173 00:14:08,104 --> 00:14:12,246 until basically, the first Itanium, to try to implement some of these things, and 174 00:14:12,246 --> 00:14:15,706 they didn't all mix well together. And they added a lot of states, and they 175 00:14:15,706 --> 00:14:17,722 added a lot of complexity to the processor. 176 00:14:17,722 --> 00:14:22,076 So, we have a-lat, full predication or almost full predication, routine register 177 00:14:22,076 --> 00:14:25,550 files to name a few. This is really complex bundling sequence, 178 00:14:25,550 --> 00:14:30,590 the, probably one of the hardest to decode instruction sets in the world. 179 00:14:30,590 --> 00:14:36,062 Very, very challenging and this was a big, a big a big challenge and it, it 180 00:14:36,062 --> 00:14:41,087 type-hands the micro-architect, and the micro-architect couldn't make a decision. 181 00:14:41,087 --> 00:14:47,082 So a good example of this, a funny, funny story here is that after the DEC Alpha 182 00:14:47,082 --> 00:14:53,210 employees, Digital Equipment Corportation employees, left DEC, they were sort of 183 00:14:53,210 --> 00:14:58,222 assumed into a part of Intel. That same team that used to build out of 184 00:14:58,222 --> 00:15:04,367 order alpha processors, went on to go build sort of the next, next generation of 185 00:15:04,367 --> 00:15:08,535 an Itanium processor. And what they said they wanted to go look 186 00:15:08,535 --> 00:15:12,580 at the Itanium processor. And like wow, this is really complicated. 187 00:15:12,580 --> 00:15:15,247 It took'em much more complicated than alpha. 188 00:15:15,247 --> 00:15:19,276 And then they said oh, well, we could probably do better if we just built it out 189 00:15:19,276 --> 00:15:24,234 of order superscalar, took apart all of the instructions, took apart all of the 190 00:15:24,234 --> 00:15:28,843 dependencies, poured that data into what was effectively a alpha out of order 191 00:15:28,843 --> 00:15:33,271 superscalar core and then execute it. And what was funny, if you're going to 192 00:15:33,271 --> 00:15:37,995 look at this, as like all the, the, you can sit there and just bang your head 193 00:15:37,995 --> 00:15:42,078 because you did all of this work and added all of this architectural state. 194 00:15:42,078 --> 00:15:45,008 To allow the compiler to do all this, this work. 195 00:15:45,008 --> 00:15:47,037 And then they would just wanted to undo it all. 196 00:15:47,052 --> 00:15:51,042 They would do this because they wanted performance but then they wanted to undo 197 00:15:51,042 --> 00:15:55,042 all of the sort of state and all of this hard work the compiler did and just redo 198 00:15:55,042 --> 00:15:59,004 it all academically because they though they could get better performance. 199 00:15:59,004 --> 00:16:02,040 They probably could have. It probably was a good idea but what was 200 00:16:02,040 --> 00:16:05,082 kind of funny there is you built a instruction set that had one micro 201 00:16:05,082 --> 00:16:09,094 architecture in mind? Basically an in order architecture. 202 00:16:10,032 --> 00:16:14,038 And then, all of a sudden, people are thinking about building out of order 203 00:16:14,038 --> 00:16:17,050 variants of it. And it sort of throws everything you had 204 00:16:17,050 --> 00:16:20,034 before away, or all these notions sort of went away. 205 00:16:20,065 --> 00:16:24,043 So it's just a, just a funny story that, that you know people try to build out of, 206 00:16:24,043 --> 00:16:26,737 out of our versions. They ultimately not, do not do, end up 207 00:16:26,737 --> 00:16:29,047 doing that. That same team decided it was basically 208 00:16:29,047 --> 00:16:32,073 too hard, mostly due to predicate registers, and sort of how to bypass 209 00:16:32,073 --> 00:16:34,076 predicate registers of out of order things. 210 00:16:34,076 --> 00:16:38,058 And I think they ultimately ended up not, not doing that, or they definitely ended 211 00:16:38,058 --> 00:16:42,086 up not doing that. And that's just what's sort of known now 212 00:16:42,086 --> 00:16:49,023 is the, Wachusett, or excuse me, not the Wachusett, it's known as the Tukwila 213 00:16:49,023 --> 00:16:53,007 processor from Intel. Now there are other couple of problems 214 00:16:53,007 --> 00:16:55,064 here. First implementation had very low clock 215 00:16:55,064 --> 00:16:59,077 rate, so your first one out the gate was just not very good, this just sort of 216 00:16:59,077 --> 00:17:02,013 hurt. And it was, it's hard to build these 217 00:17:02,013 --> 00:17:04,086 things. They're wide, the speed demons versus the 218 00:17:04,086 --> 00:17:09,020 sort of brainiacs, this is this question of do you want to go wide, or do you want 219 00:17:09,020 --> 00:17:12,057 to go long and narrow. Long and narrow was doing okay at the 220 00:17:12,057 --> 00:17:15,061 time. Big code-size bloat, fundamentally did not 221 00:17:15,061 --> 00:17:21,035 solve all the dynamic scheduling problems that out of order superscalar could get 222 00:17:21,035 --> 00:17:24,078 at. So for instance branching or changing your 223 00:17:24,078 --> 00:17:31,024 instruction schedule based on, based on whether a load hit or miss in the cache, 224 00:17:31,024 --> 00:17:35,053 it couldn't do. Big compiler complexity, need profiling, 225 00:17:35,053 --> 00:17:41,003 and not every one wanted to profile. There's also just not that much in static 226 00:17:41,003 --> 00:17:45,048 level, static instructionable parallelism in all programs, so the compiler couldn't 227 00:17:45,048 --> 00:17:49,422 necessarily find all the parallelism, or it wasn't there statically, and if you're 228 00:17:49,422 --> 00:17:52,082 going for a compiler only approach, you need to be able to do that. 229 00:17:52,082 --> 00:17:56,063 And then, this is what really killed it here is, the, people did go build these 230 00:17:56,063 --> 00:18:00,016 more complex out of order superscalars. So at the time, there was this big 231 00:18:00,016 --> 00:18:02,054 discussion. Can we build more complex out of order 232 00:18:02,054 --> 00:18:05,002 superscalars? And people said, no, those are too hard, 233 00:18:05,002 --> 00:18:08,007 they're too hard to build. They take too much, they cost too much. 234 00:18:08,007 --> 00:18:10,026 We don't know how to solve all these problems. 235 00:18:10,026 --> 00:18:14,017 So instead, we'll try to build something simpler, and push a lot of complexity into 236 00:18:14,017 --> 00:18:15,002 the compiler. Well. 237 00:18:15,002 --> 00:18:20,028 There was money behind this question. So people went and did build these complex 238 00:18:20,028 --> 00:18:24,081 out-of-order superscalars. And, that's what we're basically still 239 00:18:24,081 --> 00:18:27,083 using today in our sort of desktop processors. 240 00:18:27,083 --> 00:18:33,081 We have out-of-order superscalars today. And then finally, the last, last big one 241 00:18:33,081 --> 00:18:36,005 here, AMD64 happened. What is AMD64? 242 00:18:36,005 --> 00:18:40,005 Well, it's a 64 bit extension to X-86, AMD originally did this. 243 00:18:40,005 --> 00:18:44,092 Intel, after sort of dragging their feet for a couple, couple years on this, 244 00:18:44,092 --> 00:18:48,432 finally decided, oh. We're going to, we're going to use that, 245 00:18:48,432 --> 00:18:53,099 because people wanted this. People wanted code compatibility with the 246 00:18:53,099 --> 00:18:59,785 ability to 64 bit sort of wider, both arithmetic operations and wider address, 247 00:18:59,785 --> 00:19:05,606 addressing, so more amounts of memory. And 64 bits is a lot of memory. 248 00:19:05,606 --> 00:19:12,125 So AMD originally came up with this, this is now known as I believe EMT 64 or Intel 249 00:19:12,125 --> 00:19:17,835 64, not to be confused with IA 64, that's what Intel calls now these 64 bit 250 00:19:17,835 --> 00:19:23,764 extension x86, and now Intel is building those processors too So, everyone as of 251 00:19:23,764 --> 00:19:30,314 jumped on that, and that's, and Intel has kind of de-emphasize Itanium now, Itanium 252 00:19:30,314 --> 00:19:35,151 instruction set and instead, we are basically sticking with IA64 and this 253 00:19:35,151 --> 00:19:40,124 instruction, or it's going to be IA32, the 32 bit x86 with extension 64 bit, you 254 00:19:40,124 --> 00:19:43,594 know, that's taken over the work, workstation market. 255 00:19:43,594 --> 00:19:49,543 And what's kind of funny here is, this was, this processor was really designed to 256 00:19:49,543 --> 00:19:55,510 kill or unify all the workstation vendors together under one processor that was 257 00:19:55,510 --> 00:19:58,758 going to beat them all. And it, and it did it's goal to some 258 00:19:58,758 --> 00:20:04,161 extent, Because this processor was coming around, either company's went out of 259 00:20:04,161 --> 00:20:09,202 business, or they jumped on the IA64 bandwagon, and decided they were going to 260 00:20:09,202 --> 00:20:12,738 take that on. But what replaced it, what replaced all 261 00:20:12,738 --> 00:20:17,855 the different little variants of processors that were in workstations. 262 00:20:17,855 --> 00:20:23,503 So Spark, a, PA Risk for HP, SG, SGI sort of MIPS processors, did I already say 263 00:20:23,503 --> 00:20:26,888 Spark? All these sort of different things and 264 00:20:26,888 --> 00:20:31,011 powered by IBM. Power is still around but a lot of the 265 00:20:31,011 --> 00:20:36,095 other ones died through attrition or moved on to I or were supposed to move on to 266 00:20:36,095 --> 00:20:40,043 IA64. But IA64 was, did not end up winning this. 267 00:20:40,043 --> 00:20:43,099 Instead we replaced it with 64 bit XA6 processors. 268 00:20:43,099 --> 00:20:49,073 So it sort of did its job it killed the, killed the workstation processors, but 269 00:20:49,073 --> 00:20:53,083 replaced it with not itself, ended up replacing it with something else. 270 00:20:53,083 --> 00:20:57,637 Anyway, we're gonna, we're gonna stop here for today, and we'll, we'll talk more next