1 00:00:03,050 --> 00:00:08,067 So, course structure. There's going to be recommended readings. 2 00:00:09,011 --> 00:00:13,010 There is going to be in-video or in-lecture questions that will pop up. 3 00:00:13,010 --> 00:00:18,003 There is going to be several problem sets during the term, and these are going to be 4 00:00:18,003 --> 00:00:20,079 very useful for review for exam preparation. 5 00:00:20,079 --> 00:00:25,036 So, I'll give you guys a hint right now, that, if you do the problem sets, if you 6 00:00:25,036 --> 00:00:29,076 actually master the problem sets, the exams are going to be relatively easy 7 00:00:29,076 --> 00:00:32,070 after that. We'll probably use pure evaluation for 8 00:00:32,070 --> 00:00:37,004 grading the problem sets because lot of them are more open-ended problems. 9 00:00:37,004 --> 00:00:39,085 And we are going to have a mid-term and final exam. 10 00:00:40,039 --> 00:00:46,087 One, one other thing I wanted to point out is collaboration in this class is 11 00:00:46,087 --> 00:00:52,638 encouraged, but I want everyone to make their own problem sets in mid-terms and 12 00:00:52,638 --> 00:00:57,060 final exams. So, you know, you can discuss the overall 13 00:00:57,060 --> 00:01:04,007 generalities of the ideas and the concepts going on, but I don't want people 14 00:01:04,007 --> 00:01:07,063 discussing the actual exam questions in particular. 15 00:01:07,063 --> 00:01:12,059 So, for instance, you can discuss the concept if you have some caching question 16 00:01:12,059 --> 00:01:16,281 and you want to understand how caches really work, you know, discuss the 17 00:01:16,281 --> 00:01:21,153 concepts and collaborate on that but don't discuss and collaborate on the actual 18 00:01:21,153 --> 00:01:25,095 problem itself, on the respective problem sets, mid-terms and final exams. 19 00:01:27,070 --> 00:01:30,094 Okay, so, let's, let's talk about the content of this course. 20 00:01:30,094 --> 00:01:34,092 So, we, we have a very high level motivation and now we're going to talk 21 00:01:34,092 --> 00:01:39,018 about what's inside of this course. And I'm going to start of by contrasting 22 00:01:39,018 --> 00:01:41,070 it with what you should have already learned. 23 00:01:41,070 --> 00:01:47,061 So, in a computer organization class, something like ELE 475 at Princeton, 24 00:01:47,061 --> 00:01:52,004 you're going to have learned how to build a basic processor. 25 00:01:52,004 --> 00:01:57,014 So, something like we see here. This was the, this is actually the Risk 26 00:01:57,014 --> 00:02:03,035 one processor from Berkeley. Depending on who you ask, either the Risk 27 00:02:03,035 --> 00:02:09,086 one or the, the first Nips chip was sort of the first academic Risk. 28 00:02:09,086 --> 00:02:17,025 The IBM 801 probably used a lot of those ideas but didn't call it Risk before then. 29 00:02:17,025 --> 00:02:22,157 But, you know, you, you learned how to design stuff that had about 50,000 30 00:02:22,157 --> 00:02:26,012 transistors. So, this entire design here, this is a, a, 31 00:02:26,012 --> 00:02:31,039 a two-stage pipe line processor. But things that you should of learned is, 32 00:02:31,039 --> 00:02:36,098 basic cache ideas, pipe lining. So, how do you pipeline a processor, a 33 00:02:36,098 --> 00:02:42,046 little bit about memory systems. And, and, you suppose to know sort of how 34 00:02:42,046 --> 00:02:50,044 logic works or digital logic works. And then, in this class, to contrast, 35 00:02:50,044 --> 00:02:56,409 instead of learning how to build a very simplistic processor, we're going to learn 36 00:02:56,409 --> 00:02:59,783 how to build cutting edge modern day microprocessors. 37 00:02:59,783 --> 00:03:04,925 That's right, we're going to learn how to build things like this, or at least design 38 00:03:04,925 --> 00:03:08,878 things like this. So, this is a Core I7 from Intel. 39 00:03:08,878 --> 00:03:16,002 We, I guess, this is an original Core I7, we're now in the third generation of Core 40 00:03:16,002 --> 00:03:20,593 I7's standing in 2012 now. So, this is, this is pretty, pretty 41 00:03:20,593 --> 00:03:24,062 recent. And to give you an idea, to contract in 42 00:03:24,062 --> 00:03:30,338 that previous picture, which was 50,000 transistors, this design is about 700 43 00:03:30,338 --> 00:03:36,044 million transistors. So, the complexity has gone up here a lot. 44 00:03:36,044 --> 00:03:40,080 That other processor, or the processors that you learned about in your computer 45 00:03:40,080 --> 00:03:43,071 organization class, there's a tiny little box up here. 46 00:03:44,003 --> 00:03:48,060 And have performance that's sort of equivalent to the size of the little tiny 47 00:03:48,060 --> 00:03:51,034 box relative to these, these, this big processor. 48 00:03:51,034 --> 00:03:55,096 So, we're gonna learn how to, instead of just building little tiny processors or 49 00:03:55,096 --> 00:04:00,041 toy processors, we're going to learn about how to build big processors and high 50 00:04:00,041 --> 00:04:06,097 performance processors. So, before I go down this list, I want to 51 00:04:06,097 --> 00:04:13,011 talk briefly about the course content of ELE 475 and the two main techniques to 52 00:04:13,011 --> 00:04:17,020 make processors go fast. So, how do we, how do we go about making 53 00:04:17,020 --> 00:04:22,669 processors go, go fast cuz people like their computing systems to run, run fast. 54 00:04:22,669 --> 00:04:25,921 Well, one is to exploit parallels, parallelism. 55 00:04:25,921 --> 00:04:31,850 So, we're going to figure out how to exploit lots of concurrent transistors, or 56 00:04:31,850 --> 00:04:37,348 concurrent parallelism in your program, and as you add more transitions or more 57 00:04:37,348 --> 00:04:40,104 parallelism. Hopefully, it will make your computing 58 00:04:40,104 --> 00:04:42,849 system go faster. So there, and there's different techniques 59 00:04:42,849 --> 00:04:46,071 on how to go after parallelism and they're not all explicit parallelism. 60 00:04:46,071 --> 00:04:49,926 So a lot of them are implicit parallelism. So, for instance, instructionable 61 00:04:49,926 --> 00:04:52,435 parallelism is a completely implicit concept. 62 00:04:52,435 --> 00:04:54,609 The programmer doesn't have to do anything. 63 00:04:54,609 --> 00:04:59,410 And then, the other main technique we can think about is just to do less work. 64 00:04:59,410 --> 00:05:04,007 So, if you're trying to do something and you look at let's say, an assembly line of 65 00:05:04,007 --> 00:05:07,603 someone building cars, well, you can either pipeline, and try to get pipeline 66 00:05:07,603 --> 00:05:12,042 parallels I mean, your assembly system or you can try to have multiple people 67 00:05:12,042 --> 00:05:16,061 building different cars at the same time. So, this all falls in the parallelism 68 00:05:16,061 --> 00:05:19,030 category. There's something else you can do if you 69 00:05:19,030 --> 00:05:23,060 want to make a car faster is you just take out steps or you take out components. 70 00:05:23,060 --> 00:05:28,097 So, you do less work. And one way to do less work, is, to have 71 00:05:28,097 --> 00:05:34,345 fancier software systems. So, we can have better compilers and 72 00:05:34,345 --> 00:05:38,055 runtime systems. And a lot of times, they can remove work. 73 00:05:38,055 --> 00:05:41,094 So, this is like the optimization pass in your compiler. 74 00:05:41,094 --> 00:05:46,081 If you turn on -03 or the optimization for GCC, it's going to try to remove 75 00:05:46,081 --> 00:05:51,087 instructions from your program, which are either redundant or not doing any useful 76 00:05:51,087 --> 00:05:54,493 work. Another great example of this, which 77 00:05:54,493 --> 00:06:00,535 people don't really think about as doing less work, but actually is, is something 78 00:06:00,535 --> 00:06:05,871 like a cache in your microprocessor. So, in your cache, it puts memory closer 79 00:06:05,871 --> 00:06:13,026 to the processor than main memory. Well, this is equivalent to, if you had an 80 00:06:13,026 --> 00:06:18,270 assembly system or, a, a, a production line of cars, and let's say, for every 81 00:06:18,270 --> 00:06:21,713 part you had to get, you had to walk down the street three blocks, get the part, and 82 00:06:21,713 --> 00:06:24,061 bring it close. Well, that's, that's pretty slow. 83 00:06:24,061 --> 00:06:28,027 It's doing a lot of work for each part that you need to go fetch. 84 00:06:28,027 --> 00:06:31,893 But, in a cache, you can actually put the data very close and by doing that, or put, 85 00:06:31,893 --> 00:06:37,181 put the parts very close, similar sorts of ideas here and car assembly is you can put 86 00:06:37,181 --> 00:06:41,218 a bin, if you will, of all of the parts you need to build the car and then just 87 00:06:41,218 --> 00:06:45,018 grab out of that bin. You're going to do less work, you'll do 88 00:06:45,018 --> 00:06:47,095 less walking. Similar sorts of things with caches. 89 00:06:47,095 --> 00:06:51,084 So, these are the two primary techniques that we're going to apply. 90 00:06:51,084 --> 00:06:54,935 So now, let's dive into the actual technical content of, of what we're going 91 00:06:54,935 --> 00:07:00,002 to learn in Computer Architecture, in this Computer Architecture class. 92 00:07:00,002 --> 00:07:04,051 And we'll categorize them as either doing less work or parallelism. 93 00:07:04,051 --> 00:07:08,709 So, the first, the, this, this, the first thing we're going to start off in this 94 00:07:08,709 --> 00:07:14,045 class talking about is we're going to talk about instruction level parallelism. 95 00:07:14,045 --> 00:07:19,013 So, we're going to look at superscalar processors, which can execute multiple 96 00:07:19,013 --> 00:07:23,087 instructions at the same time. And it's done implicitly from sequential 97 00:07:23,087 --> 00:07:26,095 code. And we're also going to study very long 98 00:07:26,095 --> 00:07:30,466 instruction word processors or what's called VLIW processors. 99 00:07:30,466 --> 00:07:35,808 We're going to hint a little bit about pipeline parallelism and look at how to 100 00:07:35,808 --> 00:07:41,099 build long, longish pipeline processors. We'll talk about advanced memory and cache 101 00:07:41,099 --> 00:07:44,074 systems. So, this has no parallelism in the word 102 00:07:44,074 --> 00:07:48,239 here, in the title here. So, what this is going to be, is this is 103 00:07:48,239 --> 00:07:52,693 going to be looking at doing less work. And we're going to look how you build 104 00:07:52,693 --> 00:07:57,711 memory systems that either bring the data closer or have higher bandwidth, and, and 105 00:07:57,711 --> 00:08:03,036 a lot of the implementation issues in building these advanced memory systems. 106 00:08:03,089 --> 00:08:08,085 Then, as the term goes on, we're going to be talking about data level parallelism. 107 00:08:08,085 --> 00:08:11,075 So, this is more explicit levels of parallelism. 108 00:08:11,075 --> 00:08:16,070 So, being, these are things like vector computers and graphics processor units, or 109 00:08:16,070 --> 00:08:18,639 general purpose graphics processor units, GPGPUs. 110 00:08:20,018 --> 00:08:24,095 And at the end of the course, we're going to talk about explicit threaded 111 00:08:24,095 --> 00:08:28,034 parallelism. And we'll be talking about multithreading, 112 00:08:28,034 --> 00:08:33,030 how do you build multiprocessor system so this is multiple chip, multiprocessor 113 00:08:33,030 --> 00:08:38,020 systems, multicore and many core systems and how do you interconnect all these 114 00:08:38,020 --> 00:08:42,040 different processors. Roughly, the first third of the course is 115 00:08:42,040 --> 00:08:45,089 going to be talking about construction-level parallelism. 116 00:08:45,089 --> 00:08:50,055 There's going to be sort of a middle third, which is going to talk about caches 117 00:08:50,055 --> 00:08:55,034 and little about data level parallelism, and then the last third is going to talk 118 00:08:55,034 --> 00:09:00,055 about more threaded levels of parallelism. But that's a very coarse cut of this