So, course structure. There's going to be recommended readings. There is going to be in-video or in-lecture questions that will pop up. There is going to be several problem sets during the term, and these are going to be very useful for review for exam preparation. So, I'll give you guys a hint right now, that, if you do the problem sets, if you actually master the problem sets, the exams are going to be relatively easy after that. We'll probably use pure evaluation for grading the problem sets because lot of them are more open-ended problems. And we are going to have a mid-term and final exam. One, one other thing I wanted to point out is collaboration in this class is encouraged, but I want everyone to make their own problem sets in mid-terms and final exams. So, you know, you can discuss the overall generalities of the ideas and the concepts going on, but I don't want people discussing the actual exam questions in particular. So, for instance, you can discuss the concept if you have some caching question and you want to understand how caches really work, you know, discuss the concepts and collaborate on that but don't discuss and collaborate on the actual problem itself, on the respective problem sets, mid-terms and final exams. Okay, so, let's, let's talk about the content of this course. So, we, we have a very high level motivation and now we're going to talk about what's inside of this course. And I'm going to start of by contrasting it with what you should have already learned. So, in a computer organization class, something like ELE 475 at Princeton, you're going to have learned how to build a basic processor. So, something like we see here. This was the, this is actually the Risk one processor from Berkeley. Depending on who you ask, either the Risk one or the, the first Nips chip was sort of the first academic Risk. The IBM 801 probably used a lot of those ideas but didn't call it Risk before then. But, you know, you, you learned how to design stuff that had about 50,000 transistors. So, this entire design here, this is a, a, a two-stage pipe line processor. But things that you should of learned is, basic cache ideas, pipe lining. So, how do you pipeline a processor, a little bit about memory systems. And, and, you suppose to know sort of how logic works or digital logic works. And then, in this class, to contrast, instead of learning how to build a very simplistic processor, we're going to learn how to build cutting edge modern day microprocessors. That's right, we're going to learn how to build things like this, or at least design things like this. So, this is a Core I7 from Intel. We, I guess, this is an original Core I7, we're now in the third generation of Core I7's standing in 2012 now. So, this is, this is pretty, pretty recent. And to give you an idea, to contract in that previous picture, which was 50,000 transistors, this design is about 700 million transistors. So, the complexity has gone up here a lot. That other processor, or the processors that you learned about in your computer organization class, there's a tiny little box up here. And have performance that's sort of equivalent to the size of the little tiny box relative to these, these, this big processor. So, we're gonna learn how to, instead of just building little tiny processors or toy processors, we're going to learn about how to build big processors and high performance processors. So, before I go down this list, I want to talk briefly about the course content of ELE 475 and the two main techniques to make processors go fast. So, how do we, how do we go about making processors go, go fast cuz people like their computing systems to run, run fast. Well, one is to exploit parallels, parallelism. So, we're going to figure out how to exploit lots of concurrent transistors, or concurrent parallelism in your program, and as you add more transitions or more parallelism. Hopefully, it will make your computing system go faster. So there, and there's different techniques on how to go after parallelism and they're not all explicit parallelism. So a lot of them are implicit parallelism. So, for instance, instructionable parallelism is a completely implicit concept. The programmer doesn't have to do anything. And then, the other main technique we can think about is just to do less work. So, if you're trying to do something and you look at let's say, an assembly line of someone building cars, well, you can either pipeline, and try to get pipeline parallels I mean, your assembly system or you can try to have multiple people building different cars at the same time. So, this all falls in the parallelism category. There's something else you can do if you want to make a car faster is you just take out steps or you take out components. So, you do less work. And one way to do less work, is, to have fancier software systems. So, we can have better compilers and runtime systems. And a lot of times, they can remove work. So, this is like the optimization pass in your compiler. If you turn on -03 or the optimization for GCC, it's going to try to remove instructions from your program, which are either redundant or not doing any useful work. Another great example of this, which people don't really think about as doing less work, but actually is, is something like a cache in your microprocessor. So, in your cache, it puts memory closer to the processor than main memory. Well, this is equivalent to, if you had an assembly system or, a, a, a production line of cars, and let's say, for every part you had to get, you had to walk down the street three blocks, get the part, and bring it close. Well, that's, that's pretty slow. It's doing a lot of work for each part that you need to go fetch. But, in a cache, you can actually put the data very close and by doing that, or put, put the parts very close, similar sorts of ideas here and car assembly is you can put a bin, if you will, of all of the parts you need to build the car and then just grab out of that bin. You're going to do less work, you'll do less walking. Similar sorts of things with caches. So, these are the two primary techniques that we're going to apply. So now, let's dive into the actual technical content of, of what we're going to learn in Computer Architecture, in this Computer Architecture class. And we'll categorize them as either doing less work or parallelism. So, the first, the, this, this, the first thing we're going to start off in this class talking about is we're going to talk about instruction level parallelism. So, we're going to look at superscalar processors, which can execute multiple instructions at the same time. And it's done implicitly from sequential code. And we're also going to study very long instruction word processors or what's called VLIW processors. We're going to hint a little bit about pipeline parallelism and look at how to build long, longish pipeline processors. We'll talk about advanced memory and cache systems. So, this has no parallelism in the word here, in the title here. So, what this is going to be, is this is going to be looking at doing less work. And we're going to look how you build memory systems that either bring the data closer or have higher bandwidth, and, and a lot of the implementation issues in building these advanced memory systems. Then, as the term goes on, we're going to be talking about data level parallelism. So, this is more explicit levels of parallelism. So, being, these are things like vector computers and graphics processor units, or general purpose graphics processor units, GPGPUs. And at the end of the course, we're going to talk about explicit threaded parallelism. And we'll be talking about multithreading, how do you build multiprocessor system so this is multiple chip, multiprocessor systems, multicore and many core systems and how do you interconnect all these different processors. Roughly, the first third of the course is going to be talking about construction-level parallelism. There's going to be sort of a middle third, which is going to talk about caches and little about data level parallelism, and then the last third is going to talk about more threaded levels of parallelism. But that's a very coarse cut of this