1 00:00:03,048 --> 00:00:08,040 Now, we actually start to talk into do things and how to build superscalar 2 00:00:08,040 --> 00:00:13,033 processors, things that exploit ILP, sectional parallelism, things that run at 3 00:00:13,033 --> 00:00:17,086 really high clock frequency or multiple cores and advanced techniques. 4 00:00:17,086 --> 00:00:21,826 So, before we start talking about superscalar, there's a piece of 5 00:00:21,826 --> 00:00:26,761 nomenclature we need to introduce that sort of goes hand in hand with the data 6 00:00:26,761 --> 00:00:31,393 hazard talk that we had before. But, I didn't introduce it there, I need 7 00:00:31,393 --> 00:00:35,593 to introduce it now before we go into actual superscalars. 8 00:00:35,593 --> 00:00:41,854 So, let's consider some example instructions here where register i, 9 00:00:41,854 --> 00:00:46,844 register j, you operate r when you put it into register k. 10 00:00:47,465 --> 00:00:53,779 And, we're going to look at different types of dependencies, and we are going to 11 00:00:53,779 --> 00:00:58,022 name them. So, the basic dependency here is a read 12 00:00:58,022 --> 00:01:02,038 after write hazard, or read after write dependency. 13 00:01:02,038 --> 00:01:07,953 So, in this example here, and if, if time goes down, this operation here is going to 14 00:01:07,953 --> 00:01:12,058 store into register three. And then, this instruction here is going 15 00:01:12,058 --> 00:01:16,003 to read from r3. So, you need to sort of, temporally make 16 00:01:16,003 --> 00:01:19,079 sure this happens before that because you need the value here. 17 00:01:19,079 --> 00:01:21,919 Okay. So, we, this is, we talked about this in 18 00:01:21,919 --> 00:01:25,597 our data hazards. This is the most classic data hazard here, 19 00:01:25,597 --> 00:01:30,950 a read after write hazard. Let's look at something a little bit more 20 00:01:31,157 --> 00:01:35,897 a little bit less intuitive. Let's look at a hazard where this 21 00:01:35,897 --> 00:01:41,837 instruction here reads register one, and the next instruction writes register one. 22 00:01:41,837 --> 00:01:45,925 Okay, that should be no problem. That sounds great. 23 00:01:45,925 --> 00:01:50,136 Well, today, we're going to give an example of a pipeline where this is a 24 00:01:50,136 --> 00:01:53,029 problem, or could potentially be a problem. 25 00:01:53,065 --> 00:01:57,590 So, you know, if you do everything in order and your instructions are sort of 26 00:01:57,590 --> 00:02:01,690 slowly flowing down the pipe and you're only executing one instruction at a time, 27 00:02:01,690 --> 00:02:05,601 you're going to execute this, and then that, and you're going to read register 28 00:02:05,601 --> 00:02:08,147 one here. And then, like, a whole lot of time later 29 00:02:08,147 --> 00:02:11,629 you are going to write register one so nothing, nothing bad happens. 30 00:02:11,795 --> 00:02:16,362 But, if you start to execute instructions out of order, or if you start to execute 31 00:02:16,362 --> 00:02:20,357 multiple instructions at the same time, you're going to start to come into some 32 00:02:20,357 --> 00:02:22,563 problems. We're going to name this a write after 33 00:02:22,563 --> 00:02:26,607 read hazard. So, what that means is, we have a write 34 00:02:26,607 --> 00:02:32,037 that temporally in the program order is happening after a read of that same 35 00:02:32,037 --> 00:02:35,085 register. So, when we, and this is usually called a 36 00:02:35,085 --> 00:02:39,053 antidependence. And we need to, we need to maintain these 37 00:02:39,241 --> 00:02:44,048 when we go to execute our programs out of order and sort of throw everything into a 38 00:02:44,048 --> 00:02:47,015 big bucket and try to pull out instructions. 39 00:02:48,632 --> 00:02:54,031 Okay, output dependencies. This is actually something that you could 40 00:02:54,031 --> 00:02:59,062 possible even think of having happen on a simple sort of, in order processor core. 41 00:02:59,301 --> 00:03:04,001 If you do right back to the register file from different stages. 42 00:03:04,001 --> 00:03:09,052 So, let's take an example if you have a multiplier like in your lab which is going 43 00:03:09,052 --> 00:03:14,671 to write at the end of the pipe and have a very, very high leniency and then you have 44 00:03:14,671 --> 00:03:20,020 a instruction like an add which is let's say, you try to write to the register file 45 00:03:20,020 --> 00:03:24,202 early, you might actually write this instructions result to register three 46 00:03:24,207 --> 00:03:28,644 before that instructions results. If, let's say, this instruction here is a 47 00:03:28,644 --> 00:03:34,011 long leniency operation, so it could be like a multiply, and this is like an add. 48 00:03:34,011 --> 00:03:37,066 So, we're going to call that a write after write dependency. 49 00:03:37,066 --> 00:03:42,031 And we need to maintain the order here that this gets written first, and then 50 00:03:42,031 --> 00:03:45,079 this gets written. Because if you flop those two results or, 51 00:03:45,079 --> 00:03:50,069 or interchange those two results, the next thing it goes to read r3, it can get the 52 00:03:50,069 --> 00:03:53,038 wrong value. So, that's, that's pretty important. 53 00:03:53,038 --> 00:03:54,974 That's called an output dependence. Okay. 54 00:03:54,974 --> 00:04:01,409 So, so last question, is there such a thing as a read after read dependence or a 55 00:04:01,409 --> 00:04:05,074 read after read hazard? So, superscalar processors. 56 00:04:05,368 --> 00:04:13,023 So far we've been limited to processors that can only get a clock per instruction 57 00:04:13,023 --> 00:04:18,095 greater than or equal to one. Superscalar processors will allow you to 58 00:04:18,095 --> 00:04:25,017 execute multiple instructions at the same time and will move us into a new class 59 00:04:25,017 --> 00:04:29,047 here of the clock per instruction, potentially below one. 60 00:04:29,260 --> 00:04:34,055 It's at least fundamentally possible. Now, there might be other things that 61 00:04:34,055 --> 00:04:39,076 cause our clock per instruction to still be above one, but we can get a higher 62 00:04:39,076 --> 00:04:43,066 performance by executing multiple instructions in parallel. 63 00:04:43,269 --> 00:04:48,523 I want to introduce nomenclature here, that's the reciprocal of instructions per, 64 00:04:48,523 --> 00:04:52,629 or clock per instruction, which is instructions per clock. 65 00:04:52,629 --> 00:04:58,025 We just, we move them and rename it. Sometimes people say IPC as the reciprocal 66 00:04:58,025 --> 00:05:07,258 of CPI versus CPI, clocks per instruction equals one over instructions per clock. 67 00:05:07,258 --> 00:05:13,559 So, just, just be aware of that sometimes we'll be using those terms different 68 00:05:13,966 --> 00:05:17,351 iInterchangeably in this class. Okay. 69 00:05:17,351 --> 00:05:20,989 So, what types of superscalar processors can we talk about? 70 00:05:20,989 --> 00:05:26,366 There's lots of different types. There's in order machines and out of order 71 00:05:26,366 --> 00:05:30,079 machines, roughly. And, what in order machine means is the 72 00:05:30,079 --> 00:05:36,036 machine or the, the processor is still trying to execute instructions in program 73 00:05:36,036 --> 00:05:39,021 order. Well, you don't have to do that. 74 00:05:39,021 --> 00:05:44,053 You could actually think about sort of taking apart the program and executing 75 00:05:44,053 --> 00:05:49,470 them out of order as long as you're trying to sort of, preserve the different 76 00:05:49,470 --> 00:05:53,391 hazards, data hazards. And, something like your Pentium 77 00:05:53,391 --> 00:05:57,782 processor. So, I'm going to pass around here a 78 00:05:58,014 --> 00:06:01,416 roughly Pentium [inaudible] class processor. 79 00:06:01,648 --> 00:06:06,053 It's actually Intel Celeron Pentium, Pentium [inaudible] version of the Intel 80 00:06:06,053 --> 00:06:09,849 Pentium Celeron, is a out of order, three wide superscalar. 81 00:06:09,849 --> 00:06:13,472 So, it can execute three instructions at a time. 82 00:06:13,718 --> 00:06:19,576 For instance, the, another example is the original Pentium, the old Pentium when the 83 00:06:19,576 --> 00:06:23,070 original Pentium came out, that was a two wide machine. 84 00:06:23,070 --> 00:06:28,049 So, it could execute two instructions at one time and was in order. 85 00:06:28,049 --> 00:06:47,043 So, you can think about these different, different notions.