1 00:00:03,027 --> 00:00:07,090 So today we're going to continue our discussion of very long instruction word 2 00:00:07,090 --> 00:00:11,010 processors. And we're going to start talking about how 3 00:00:11,010 --> 00:00:13,057 they. Change via a classical VLIW processor, 4 00:00:13,057 --> 00:00:17,074 into a processor which can actually get a lot of the parallelism and the 5 00:00:17,074 --> 00:00:21,096 instructional parallelism, you can get inside of out-of-order superscalars. 6 00:00:21,096 --> 00:00:26,071 And to do this, we are gonna have to add a lot of extra features to our traditional 7 00:00:26,071 --> 00:00:30,008 or classical VLIW. We're gonna solely work through that, and 8 00:00:30,008 --> 00:00:33,741 we're gonna, basically, list out or enumerate all of the possible different 9 00:00:33,741 --> 00:00:38,878 types of instructional parallelism, and where that comes from in something like an 10 00:00:38,878 --> 00:00:42,565 out of order superscalar. And then we're gonna systematically add 11 00:00:42,565 --> 00:00:47,239 features into a very long instructional word processor, or VLIW processor. 12 00:00:47,239 --> 00:00:51,600 To, to get us, to that point. But I'll give you a hint that not all the 13 00:00:51,600 --> 00:00:55,923 things out of order superscalar can get are easy to get in VLIW processors. 14 00:00:55,923 --> 00:01:01,082 Or even possible in the realm of things that people have built up to this point. 15 00:01:01,082 --> 00:01:07,655 So before we do that, I wanted to take a step back and review something from last 16 00:01:07,655 --> 00:01:14,181 class and also review something from I wanted to clarify something that I said 17 00:01:14,181 --> 00:01:20,144 about how a, the EQ model and the LEQ model are the equals model or the LEQ 18 00:01:20,144 --> 00:01:25,749 model from a scheduling perspective. So let's back up to slides way, way, way 19 00:01:25,749 --> 00:01:27,248 back. At the beginning. 20 00:01:27,248 --> 00:01:33,026 And more of what, I was just gonna comment that the equals model of VLIW scheduling 21 00:01:33,026 --> 00:01:38,652 and the less than or equal to less than or equals scheduling model, are just 22 00:01:38,652 --> 00:01:41,364 scheduling models. That's all they are. 23 00:01:41,364 --> 00:01:45,018 They're not actually something that's in the hardware. 24 00:01:45,018 --> 00:01:50,669 They may influence what the hardware has to do or what the hardware has to provide. 25 00:01:50,669 --> 00:01:55,354 So, for instance, if you have an equals model and, you have an instruction 26 00:01:55,354 --> 00:02:00,461 followed by another instruction in your very long instruction word processor. 27 00:02:00,461 --> 00:02:05,703 And, let's say the instruction reads the value of some register in sort of the 28 00:02:05,703 --> 00:02:08,640 shadow of while the value's being computed. 29 00:02:08,640 --> 00:02:11,845 You're gonna get the old value in the EG model. 30 00:02:11,845 --> 00:02:20,317 So, as a, as a quick code example here. We can take a look at a bundle of 31 00:02:20,317 --> 00:02:26,805 instructions here. Let's say you have something like 32 00:02:26,805 --> 00:02:30,295 Multiply. R1, R3, R4. 33 00:02:30,295 --> 00:02:41,540 And in the same bundle or in the same VLIW instruction, we have some other random 34 00:02:41,540 --> 00:02:47,718 thing here. We are going to use the curly braces here 35 00:02:47,718 --> 00:02:53,288 to denote that it's one, instruction or one bundle. 36 00:02:53,288 --> 00:02:59,395 Then this multiply we'll say has a latency of four cycles. 37 00:02:59,395 --> 00:03:06,480 So you can't actually go read the result and if we look at something like a EQ 38 00:03:06,480 --> 00:03:10,081 model. We're gonna end up with, let's say we just 39 00:03:10,081 --> 00:03:14,325 have some other instructions in here, but this one here is important. 40 00:03:14,325 --> 00:03:22,204 We have an add. Which reads R1. 41 00:03:22,204 --> 00:03:29,056 Note: this multiply writes R1 but as we said the multiply has a latency of four 42 00:03:29,056 --> 00:03:35,844 cycles so this in the EQ model is going to get the value before the multiply. 43 00:03:35,844 --> 00:03:41,842 So it's not gonna pick up this result. And then, let's say we just have some 44 00:03:41,842 --> 00:03:46,734 other random stuff maybe some non-dependent adds and subtracts. 45 00:03:46,734 --> 00:03:52,745 And I'm not going to write the registers here, because they don't read anything or 46 00:03:52,745 --> 00:03:56,761 write anything which is read or written in these two bundles. 47 00:03:56,761 --> 00:04:03,188 Maybe you have a NOP. And then finally down here, we have 48 00:04:03,188 --> 00:04:13,409 something. Which does a load. 49 00:04:13,409 --> 00:04:19,016 Of R1 and gets the result of this mulitply. 50 00:04:19,068 --> 00:04:24,035 So all, all I'm trying to get across here is, this is a scheduling model of what the 51 00:04:24,035 --> 00:04:28,097 compiler needs to do and where it needs to place code that we're talking about in 52 00:04:28,097 --> 00:04:32,065 these schedular models. It's not a hardware roblem. 53 00:04:32,065 --> 00:04:38,091 If we were to try and take this same piece of code and run it in a LEQ model, the 54 00:04:38,091 --> 00:04:44,638 main difference is, this add here which reads R1, would not be able, would not be 55 00:04:44,638 --> 00:04:50,072 allowed in the shadow here. So what would the shadow of this multiply 56 00:04:50,072 --> 00:04:54,087 or the delay of the multiply. Because if you were for instance take an 57 00:04:54,087 --> 00:04:59,015 interrupter or something like that and this add were to get moved later, moved 58 00:04:59,015 --> 00:05:03,821 later than this load or moved more than four cycles later or three cycles later 59 00:05:03,821 --> 00:05:09,011 you'll actually pick up the new value. You'll basically change the semantics of 60 00:05:09,011 --> 00:05:12,019 your program. So if you were to do this in something 61 00:05:12,019 --> 00:05:17,035 like a LEQ model this add would have to be above the multiply and you would have to 62 00:05:17,035 --> 00:05:21,037 replace NOP. More of what I'm trying to get across here 63 00:05:21,037 --> 00:05:25,072 is these are just scheduling models and not actually scheduling models the 64 00:05:25,072 --> 00:05:29,022 compiler wants to use and not models, or, or not hardware models. 65 00:05:29,022 --> 00:05:33,079 The hardware has to implement something, which makes sense for the scheduler model, 66 00:05:33,079 --> 00:05:38,024 but when we talk about these different concepts, they really are just a software 67 00:05:38,024 --> 00:05:42,006 scheduler model. Okay, so now we're gonna go back forward 68 00:05:42,006 --> 00:05:46,050 here, and move on to. New contents today.