Okay, so now we're going to sort of go through different problems with VLIWs and different solutions to that problem, those problems. So the top one on this list is a problem of hard to predict branches, and how that can limit instruction level parallelism. So, you just remove the branch. And we're going to call that predication. So we're actually going to add instructions to the hardware. Which we're actually going to add two instructions here so this is, this is, this is limited predication, where we add two very simple instructions. And if you look at these instructions, they're very similar to the question mark, colon or select operator in C. So what does, what does that operator do? We have a equals I don't know. C?d: e; What does this do? Well, it loads a. If, if, if c is true, it loads a with d, if c is false, it loads a with e. Well, you can think about actually doing this with, some sort of If-then-else, piece a code. Which is pretty common. If a < b So you can sort of put, that here. X gets a versus x getting b. That's our select operator. Well we add two, two special instructions here for our limited predication. Move if zero and move if not zero. Well, what does this do? Well if this operand is equal to zero, then this rd gets rs, else that's all it does. That's all that instruction does. And the flip one here is, it checks if it's not equal to zero. Why is this cool? Well, this allows us to transform control flow into a data instruction. So, we've taken a branch out, so if we look at this piece of code, if we're doing it with branches set less than, we do a branch. So, this, this computes our condition co-flag here, branch equals and if it's, the one way a branch is here, if not, it jumps over it. So, we have a bunch of control flow here. We have two control flow operations, the branch and the jump. When we add these instructions, we can basically do that if then else in, in an instruction. And, basically every VILW processor you're going to look at is going to have predication, or at least limited predication. This is, this is not full predication, this is limited predication. We'll talk about full predication in a second. Okay so that, let's, let's think about that for a second. We just took control flow. We turned it into something which is never going to take a branch mispredict. That sounds pretty cool cuz branch mispredicts, you know, were pretty, pretty bad. If we had a branch which was harder to predict, we didn't know with high probability if a was greater than b or not. We can just sort of stick this code sequence in here and just be done with it. And why it's really important for very long instructional processors, is because whenever you take a branch and mis predict, you basically have a bunch of dead instructions. And you can't schedule something in, in that point. But the, a, an Out-of-Order Superscalar can attempt to sort of schedule things in there. We can try to schedule non-dependent operations. But our compiler has to come up with some code sequence, and has to make them parallel at compile time. Okay. So a few, few questions here. What happens, if then, if the if then else has many instructions? This was a very simple case here. We just sort of had one thing inside of each of these if-then-elses. It's not the end of the world. What you can, can do, and typically what people do with partial predication, which is what this gives us, is they'll actually execute both sides of the if statement. Inter leave them in your VILW somehow, and then choose the result at the end with a predication or a move z instruction. Or, these are typically called conditional moves. If you'll look in something like x86 I think these are actually called c moves. If you go look in mips, it's called move z, but people started naming these things slightly differently. So, it's not, not the end of the world, but when you go to do that, you're actually going to execute extra instructions that you may not have to have executed. That's, that's a bummer. Because you could very easily, if, if, there's a lot of code in here and a lot of code in here and you're executing both code sequences, you're basically doing twice as much work. And if, it grows large, you're doing lots of extra work and you may not have enough open slots to sort of fulfill that. At that point, you have the choice, you actually put a branch in. If it's unbalanced also not the end of the world, it's probably actually a little bit easier, you're probably going to have to execute twice as much code. At some point though you may want to actually super unbalance. Like a thousand instructions on the one side of the branch, and like two instructions on the other side of the branch. You may just want to put a branch, an actual branch there, and not try to predicate it. Because if you took the, side which only has two instruction, or the, the two instruction case, well all of a sudden, you've bloated that by an extra thousand instructions, kind of in the common case and that's not very good. So that's, that's partial predication. Let's talk about full predication, which is kinda the extension of that. Instead of just adding, I mean, a simple instruction which moves data values dependent on another value it being zero or not. Let's say every single instruction in our instruction sequence except for maybe let's say branches or something like that can be nullified based on a register. What does this look like? Well, here we have some, little bit more complicated piece of code. We have four basic blocks. It's roughly an if. Then, no see. If else then and then sort of some code at the end. And let's see how this works with predication. Well, what you can do is first of all, you have to somehow set the predicate registers. So typically, these architectures have extra registers, which we call predicate registers. The predicate registers get loaded with some values sort of early, and then, let's say, this instruction and this instruction execute in parallel. Different notation here, let's say there's a semicolon here and there's sort of brackets around that. And, this, in front of the instruction here, in parentheses, we have a predicate register, and which says whether this instruction was supposed to execute or not supposed to execute. Now we can do even more complex things, than our partial prodication. Instead, now you can basically execute everything and not have to do any moves at the end. You don't do any bookkeeping and you can only, you can execute just the side of the branch that you need to execute. Scott Melkey and Insco 95 showed that if you do this, and you sort of have a fan-, fancy enough compiler. He was working at UAUC on the impact compiler. You can remove, let's say, 50 percent of your branches. A lot of these branches are short little branches in your programs. In a full predication, you can do some pretty fancy stuff. This showed up in the Plato compiler, which is a HP or, Plato architecture by HP, and the, sort of, compiler for that. Which was the, [unknown] project at UAUC, the impact compiler. So you can sort of see that, you know, you can get a lot of benefit from this. So, we're going to, I'm going to stop here today but I just want to do a, briefly wrap up and say, we start talking about how to deal with dynamic events and how to get a lot of the advantages of speculative execution from out of order super-scalers but in a statically scheduled regime. And we're going to talk more about how to do some this code motion, how to move instructions across branches, how to move memory operations across other memory operations. And then we're going to talk about how to deal with some dynamic events, which are hard to deal with in a statically scheduled environment in the next lecture. Okay, we'll stop here for today.