Okay, let's look at the issue logic here and a pipeline diagram. So here we have OP A, B, C, D, E, F, so we have straight line code, no branches. And we have things flowing down the pipe. And we have our nice pipeline diagram. And one of the cool things is now that we have a two wide superscalar, we can actually violate a rule that we had before, which said two things cannot be in the same pipe stage at the same time, temporally, cuz time, time runs from left to right in this diagram. So here we have two things to do. You could go to stage two operands, or two operations or two instructions in the fetch stage. And, we're just gonna name, because we don't have a great name for these things, we're gonna call these A and, A0 and A1 and B0 and B1 to represent the different execution unit stages. So in an ideal world, this is pretty sweet. In this, in this at least for this code here, we actually have a clocks per instruction of one half. That's pretty awesome. And as I said, we can have two, two instructions in the same stage of the pipe. Okay. Let's look at a little bit more complex code sequence here. We have add, loads, some more loads, an add, a load. Your issue logic, that swapping logic actually will have to move instructions around in this case. So we have this add and this load. Well this is actually easy. The add goes to the A unit, the load goes to the B unit. No problems there. Okay, so now we have the load. Uh-oh, loads in, we fetched it and it's in the instruction register zero. That means it wants to go to the A pipe, but we need to swap these two. So, you can see here, this is how we draw this. We actually say this add is going to the A pipe here and that's the opposite of what's going on there. But, there's still those stalls going on, at least in this example. And then finally here we actually are going to get a structural hazard. And the structural hazard introduces a stall. So, we fetch these two loads simultaneously, or we can only execute one load at a time. So, we need to stall one of the loads in the decode stage and push that out of the limit. So, it actually has a different pipeline diagram than the no stall, or the no conflicts, no structural hazard example. Okay, so a let's look at a, little bit more complex example here, a dual issue data hazard. What happens when you have data hazards? So unfortunately when you have data hazards you can actually, this is without any bypassing. This, this first, this first example, this first two instructions here don't have any data hazards. But here we have a write to register five, and a read from register five. And, this is a read after write hazard. And because we're not bypassing in this pipeline yet, we actually have to stall the second instruction waiting for that first one, even though we could have potentially executed at the same time, but there's a real data hazard there. So, we need to introduce stall cycles into the second instruction. Does this make sense to everybody? So, we're going to push out that add. If we have full bypassing, we still need to add stalling potentially. So no we don't have to wait for this value to get to the end of the pipe to go pick it up in the ALU but we can pull it back because we can bypass, let's say the add result after a zero, and what you see here is the same instruction sequence. But now it's bypassed from A0 into the decode stage and we can start going again, quicker. So bypassing is really helping us here, and it's crossed with the superscalarness, if you will. So wh-, wh-, what we mean by order matters is that here, we've interchanged these last two instructions. So we just flipped them, and we turned what was a write, excuse me, a read after write hazard into a write after read hazard, and because of that this actually pulls in by one cycle and we don't get the stall. So, just by changing the ordering in the instructions, it will change the data dependencies and that will actually change the ordering and change the execution length. Does that make sense, everybody, why we can actually interchange two instructions, and the data dependencies completely change, and we need to worry very different things about the data hazards. Okay, so I want to briefly wrap up about fetch logic and alignments. So this is, someone was alluding that, I think you were alluding to this. Let's look at some code here, and it's going to take jumps. So execute some instructions. So this is the address, this is the instruction, and we have a jump here to address, 100 hexadecimal. And then we execute one instruction, OPT E, and we jump to 204 hexadecimal, and then we jump, we execute one instruction and execute to, and jump to 300 and, or 3OC hexadecimal, and we just execute some stuff. Here is our cache, and let's say our cache, the block size is four instructions long. And we're going to look at how many cycles this takes to execute. So let's say there's no alignment constraints in the first, in the first case. So, in cycle zero here, we execute these two instructions, and we, and we fetch them from the instruction cache, and they're, they're aligned nicely together. There's nothing sort of weird going on, we just go pull them out. Okay, these next two instructions eight and C, those, those are next to each other, that's, that's great. And then, and then we jump somewhere else, to 100, and we're going to execute these two instructions that are next to each other and they're at the beginning of their lines, so that's great, no problem there. Hm. Okay. Now we start to get some weird stuff. Now we start to jump to sort of the middle of a cache line. In, in this example here we jump to sum address two or four. So our block size is said four instructions. We're sort of jumping, not to the first instruction in that block. So when he, Fully fleshed out, fetch unit, lets say, you can execute with any alignments. So, life is easy. We can just, fetch and we can execute these two instructions at the same time in the same cycle, in cycle three, we fetch both of those. Hm, that could get harder if we actually try to put some realistic constraints in that. Okay, now let's jump to a three the end of a, end of a cache block and we're gonna try to fetch these two instructions at the same time. So one is on this cache line, and one is on that cache line. Do we need to fetch two things from our cache at the same time? Yeah, we do. If we actually wanted to try to execute this instruction and that instruction at the same time. Let's say, for right now, this issue logic actually allows us to do that. Somehow, it's a dual ported instruction cache, we'll say. And then, finally, op five here. Or, 314, executes last. And, and it's just sort of fall through. There's no jumps or anything happening. So some things that can be really hard to actually make work out right are fetching across cache lines and possibly even fetching randomly inside of the cache line, depending on your fetching fetch unit logic. And, and like I said, we might need extra ports on the cache. Here, here is the this code executing, and as you can see we don't actually get any introduce stalls, which just sort of executes this, then this, then this, and this, and we execute two instructions every single cycle. Now let's look at lists of alignment constraints. So, here's our, here's our original example, and let's look at what, what, what we could possibly try to execute here. So we're jumping through call. We, we only use these two instructions from the middle of the line. So let's say we can only fetch a half of a block at a time or something like that in each cycle, because that's how wide our cache is. So what you might have to do in some architectures if you have alignment issues like that, and let's say you are not allowed to have a straddle. You'll actually have sort of extra data fetched that you are just never going to use. You are just throwing away this bandwidth. And also the cycles of this change. So, let's, let's look at this same code sequence and look at what happens when we go to execute it. So going, going back to this, so we execute, op A and op B. Okay, let's just go down the pipe. Okay. Life is, life is good. We get to this address eight here, eight hexadecimal. Well, we're going to swap that, because the jump needs to go down pipe A. But otherwise things, things are okay. Well, now, now we jump to, to the middle of a, of a line here. Hm, that starts to get more interesting. And we're gonna basically end up wasting cycles. So this will take seven cycles where before, we had this taking only five cycles. Cuz we've effectively introduced dead cycles, where we fetched instructions we just didn't use. So the three X's here show up as instructions we fetched. So like, for instance, this instruction or, the instruction at address 200 is that. We fetched it and we're not using it. And we fetched this two and we weren't using either of them, so having a complex fetched unit or not fully bypassed, or not fully alignment-happy fetch unit can cause some serious problems in our performance. Let's stop here for today and we'll talk about the rest next time.