Let's continue on and move on to a different item here. We've talked about structural hazards, now we're gonna move on and talk about data hazards. Okay, so what it a, what is a data hazard? So a data hazard occurs when you have one instruction that depends on the data of a previous instruction, or a previous data value. Sorry, previous data value depending on previous instruction is not precise enough. We want to depend on a previous data value or data value that is generated by a previous instruction that is still in the pipeline. And like stall like, structural hazards, data hazards also have a couple different approaches which we will not talk about all of them today. But let's, let's start talking let's introduce them at least. First thing is you can schedule around it. Okay, so what does this mean. Let's say we have a processor pipeline, and the processor pipeline is generating values and we have one instruction which dependent on another instruction but the first instruction let's say takes a couple cycles to generate that value. The value won't be ready. So, can't go and issue subsequent instruction. So, we have data hazard here, data dependency hazard, while you can schedule around it. So, you could, for instance introduce, no operation instruction into your instruction sequence, and have the programmer avoid the hazard, if the programmer knows the micro architecture of the machine. And this is actually shown up in some earlier processors. Another famous example of this is the floating point unit for the Intel I860U, which is a old, sort of, early risk architecture made by Intel. In the I860 the floating point unit was not interlocked so if you execute a floating point instruction and you have another instruction, which is coming down the pipe which uses the result of that, you might get the wrong value. So the program was the program's responsibility to make sure that didn't occur. And you actually, put in no ops there. Next approach, which we'll be talking more about in today's lecture, is to stall. So if you have a data dependency, you can actually stall earlier, excuse me, stall later instructions dependent on earlier instructions. And some of the, the important thing here to note, is you're going to freeze the pipeline until the, preceding instruction has, generated the value, but hardware does this freezing. The hardware, does the freezing, and, we're gonna, develop this more today, but, you actually have to freeze, everything before that instruction. You just can't, freeze the dependent instruction. And, that's because the, the sort of, traffic behind it will catch up on the earlier traffic and sort of pile up into it. So, if you want to, make a pipeline, which works out like this, you're going to actually want to, stall everything earlier. And we'll look at the wiring that you need to do to do that. Another solution is you can bypass, so what this, an example of this is, you can add extra hardware to your data path, and the extra hardware is going to send the value as soon as it gets created, so you may not have to wait for it to get to the end of the pipeline. So if the data value gets made early, you can just forward that to an instruction which needs it, but that adds extra hardware and complexity to your design. And finally we're gonna talk about not in this lecture, this is the one thing, the solution we'll talk about later is, you can speculate. So, if you have a data hazard, you could assume it's not a problem or you could assume that, you know, everything's gonna be okay. I'll just use the encrypt value for a little bit of time and we'll assume that the value, the old value is equal to the new value or you do data speculations, other ways to do this. And if you make a mistake you catch it by the time you get to the end. And you basically have to re-execute the instruction with the correct value then. So you can do speculation. And this is kind of like a big guessing game here. But this is used in out of order processors in multiple different ways. And we'll be talking about that in a couple lectures from now. Okay so let's look at the an example data hazard executing on our processing pipeline. So we have two instructions here. We have an add with an immediate. Of register zero plus ten in your register one. So we're gonna be using in this class the notation where the left most operand or the left most register is the destination. And the right operands are the input and the source operands. So we're gonna add ten plus zero R0 in, in MIPS R0 is hardwired to zero, and we're gonna add that into R1. And then this next instruction here is going to exhibit a read after write data dependence. And this read after write data dependence, we have an addi, and it's just going to use exactly this value which gets created, the instruction right before. So this is going to take R1, add seventeen to it, and put it in, deposit it into R4 or register four. But we have a little bit of a challenge here because it uses the result of the instruction right before it. Hm. Okay. So, what, what happens in this design? Let's say our instructions are merging down. And, we have the first add here and the second adds back here. Okay, nothing bad happened so far. Now the first add goes here and the second add is here, okay nothing bad has happened so far. But the question is what do we fetch out of the register file for the second add. Because the result of running a one is available here but it hasn't made it back into the register file yet. So it's actually gonna fetch the old value. Hm. That's, that's not good. So if you were just to play this without any stalling or interlocking, you'll actually have this instruction read the old value of R, or excuse me this instruction, the second instruction read the old value of R1, and not the new value that we want it to read. So, we need to think about this a lot harder. Yeah, R1 is stale. Oops, we made a mistake. So how do we go about resolving these hazards. Well we want to somehow detect them, these data hazards and then we want to feed this information back and, later stages provide dependence information to earlier stages. So this is a later stage. This is an earlier stage. And we're going to feed information back here. And dependent on that information that is fed back, we're either going to stall or kill instructions. So the most basic example here is we're gonna have stage four influence stage three, and can say stage three can make some decisions based on that and can maybe stall or kill instructions. And likewise stage three will influence stage two and stage two will influence stage one. But this is not really good enough. Because let's say stage four tells stage three to do something. If stage three doesn't tell the earlier stages, we are gonna pile up, like cars are gonna pile up here all into stage three. So this typically means that you need some higher level, sort of, feedback here, where you have Stage four giving information to all the previous stages, Stage three giving information to all the previous stages, and vice versa. So that you'll actually be able to provide information to all the previous stages, and then everyone can make a decision based on this. And it's really important here is, control your pipeline like this. Basically requires that a pipeline this is really important because if you don't and if you have sort of feedback going the other direction. You might end up with some sort of deadlock in your processor because if you have a early instruction which dependent on a later instruction then you might have that resource might never get free cause the. Later instruction might be dependent on the earlier instruction, vice versa. And, all of a sudden, you have a sort of big cycle, and everyone is dependent on everyone, and the machine just stops. So it's really important that when you're doing this that stage I plus one only feeds strictly information back to stages I to, or one to I. Okay, so let's resolve some data hazards and look at how we'd actually do this on our simplified pipeline here. We're going to use the same example case here, we have two adds and we have a dependence between R through register one. So, the first thing to realize that we're going to have to do is we're gonna have, where, where do we need to stall the pipe line? Or where do we need to stop the pipe line? Or interlock the pipe line? And we're gonna look at these two adds, it wasn't a problem until the second add went to go and read the register file here. If the first if we, if we didn't read the register file until later, this would not have been a problem. Because the value might have been up to date. But because we read it so early here, we pipeline sort of when data gets computed and when it gets written back. And read into different stages. We have this, this, this challenge. So, what we're going to do is we're going to stall here. And reread the register file over and over again until the first value. Of our one gets written to the register file. Unfortunately, we're gonna be waiting a while here. Because we don't write the register file here. We don't write the register file here. We write the register file here. Through that wire. So we like to we're gonna stall multiple cycles here. So when we're doing the stalling in the stage we're gonna wanna think really hard on what should be going down the pipe during that time. Well, we've stalled instruction, the second ad here, instruction two here, here and obstruction one keeps flowing down the pipe so we don't want to stall these later stages. We only want to stall this stage and we need to stall the previous stages so, instruction don't pile up into us. Now how do we go about, from a wiring perspective. Executing instructions, and sort of having the first set of instructions clear out of the pipeline. Well, we're actually going to insert a multiplexer here on the instruction register side of things. And this multiplexer is going to insert no op instructions, or no operation instructions. We're actually going to be inserting bubbles into the pipe right here. So that the first instruction, the first add, can clear out of the pipe. And we know what's, what's, executing here. So, we don't want to accidentally have the second instruction start executing, even though it's stalled here. Cuz then it's gonna possible change state as it goes down the pipeline. And it'll also affect sort of dependency calculations here. So we wanna, we wanna insert no ops. And the stall condition, as I said, goes to the program counter, the instruction register. So these flip flops before the stall point and it's also going to go into the select line on this multiplexer to choose to insert no loss. Okay, so that's the beginning of this and I want to just say that this is sometimes called interlocking or interlocks. Its an important nomenclature here. You're interlocking the execution of the instruction here and depending on the other ones then you're actually checking that sometimes people call stalling it or locking. Okay. So let's draw a pipeline diagram of what's going on here. So we're going to plot time, on the X axis, and instructions on the vertical axis. And we'll look at the other the resource graph also. Okay, so we have our first instruction going down the pipe here. It takes register zero, adds ten to it, and it's gonna go fetch, decode, execute, memory, write back. Second instruction starts going down the pipe here. Third, fourth, fifth well you'll note something here we've stalled and the pipeline. Because we need to wait. For the first instruction to write back to the register file, before we can go read the result, or read that value from the register file. So we strictly have to have this instruction here, the second instruction, the dependent instruction, in the decode stage, and read the register file here on the cycle after the write-back occurs. And we detect this stall condition this whole time. As, as denoted here with the, the nice purple box. And. As I said, we need to stall earlier instructions. So this doesn't stall not only instruction I2 but installs instruction I3 cuz it's in a earlier stage in the pipe and it needs to be, it needs to be stalled. Okay. So we could also graph this the other direction and, and the reason it is useful to graph the other direction is you can where no operations get inserted or no ops get inserted. So here we, we've plotted the other direction of time versus stage of the pipeline or resource. And you can see the we can see what's in the different stages, in the different stages, and at some point here, there's nothing in these stages. Instead we've actually inserted no ops, and it's, comes from the fact that we basically have I3 sitting in the instruction fetch for three cycles and i2 sitting in the decode for three cycles, and the later stages of the pipe get no ops inserted and that is what that multiplexor is doing. So now that we've talked about how stalling happens in a pipeline diagram, lets move on and look at, what the logic looks like inside of this. So here we have our data path for our stall our data path for our five stage pipe. And in stalling, what we are really trying to do here is, detects the case when a earlier instruction writes are register which a later instruction is going to use. So in this case we are going to detect the FA instruction in the decode stage is reading a value which instruction in the execute. Memory stage or write back stage writes. So an uncommitted instruction writes to a register. And a previous instruction here. Or, excuse me, a later instruction goes and reads that, and we stall at the decode stage. When we stall, we're gonna actually wanna stall everything behind it. As we'd sort of already talked about here. Okay. So let's start calculating the signal here. We'll call it. It's the control signal, so we'll call it c. We'll call it c stall. So we'll draw a little, blob up here. We'll call this the, the stall calculation. And what goes into this calculation? Well, it's sort of a complex calculation. But the first thing we're gonna wanna to check is, is we're gonna want to check, the, destination for the operand. So we're gonna check the destination for in some instruction that was an earlier instruction. And this is the register identifier, not the data value, but instead the register identifier. So this would be RD in a typical MIPS instruction. And we're gonna compare that, and we'll call it WS in this calculation. And we're gonna compare it to the two source inputs here or the, the source operand register identifiers. And these are both because of 32 registers in MIPS, these are both five, or these are all five bit values, all three of these values. And we're gonna wire that all into our stall control unit here. Okay. And if we get a match, the most basic thing we're gonna wanna do is we're gonna say stall everything earlier. And, insert no op instructions later down the pipe. So the stall's going to control a lot of things. It's basically going to control the front end of the pipe. And it's gonna disallow in the instruction here for moving far forward in the pipe, if anything in these later stages has a same-destination operand, or in this case so far, we're comparing against just this location. So, we're comparing against if the write back stage has the same register identifier destination as either of the two source operands for the instruction at the, at the decode stage. Okay, so, what should we do if, should we just stall always if the RS fields, one of these source fields, here or here, matches RD here? Some RD. Should we just always stall in that case? Well, hm. Not every instruction is going to write a register. So, what if we have an instruction for instance something like a store instruction which does not write a register. So if this, we have a storage instruction which is not ready to register, we probably shouldn't be doing this compare operation because we can have better performance if we don't stall under those conditions, and we're gonna introduce a signal here called write enabler, WE. And WE's gonna get wired into our stall calculation. Likewise, not every instruction reads, both input operands. And a good example of this is an immediate instruction. An immediate instruction only reads one of the source, operands. Or the source register identified operands. And the other value comes from the immediate bits, which are in the instruction coding. So this is going to do, introduce a read enable calculation here, because not every instruction reads, reads a register. Okay, so lets calculate this a little bit more here, so we are going to have something which calculates the destination and I will talk about what this blob is here in a second. But then we need to add effectively, we write enable bits or we write the enable for this location in the pipe. And need to these read enable signals here. And these get calculated from the instruction registers in the respective locations in the pipe. So, some decode bits there happening, or maybe it all gets decoded here in the decode stage, and then we just pipeline those bits forward. And we do this calculation here, and now we can say well if the instruction in the decode stage matches what is being written back, the write back value, then we stall. Okay. That's, that's close to our full solution. Let's talk about this circle here. What's going on in this circle? Well, what you might notice is, something like a jump and link or a jump and link register have an implicit destination target. The destination is not actually encoded in RD field of the instruction. So, instead we need to add a multiplexor here, which multiplexes from the bits out the instruction register and just a hard coded value of 31, which is gonna denote register 31. And by doing that, we can handle jump in links and jump in link registers. Okay, so to, to finish this out, we want to compare not against just in the right back stage of the pipe. But we wanna compare in all three of these subsequent stages here, here, and here. So we add extra calculation logic here, which computes the, right enable. And, the register identifier, based on the instruction register. And, this might just be, generate early in the pipe4 depending on your pipeline design. That's sort of more traditional pipe line control. In, all this right now, is ignoring a jumps and branches. If you introduce jumps and branches, things get, a little bit more complicated. We'll talk about that in the control section, control hazard section of today's lecture. So now lets go on and talk about different instructions. And where they have, where they, yet there's source operands. And what is their destination operens? And if every instruction in something like Mips, has all sources and all deaths. So this is just summing up what I said before, is that not every instruction reads, and not every instruction writes. So, as we can see here, ALU instructions. Read two operands and running operands, but to store reads two operands, writes no operand. Jumps and links only write and don't read. So, it's a mix and match, even in something like MIPS. And one other thing I wanted to point out is, where this is actually encoded in the instruction moves around a little bit. Mips tried to make this relatively uniform, but there's some examples here where you see the destination changes a little bit between immediates versus non-immediates, and this is just cuz they didn't the encoding space there to leave everything in a fixed location. Something writes a destination or not. And, we have two things. We have a source, or the source operator and identifier. And then we have, the right enable, whether it is, being ridden or not. And, as you can see, there's a little, like, case statement here dependent on the instruction type. And this is, the instruction which is in the, later stages of the pipeline that is executing. So we have if the ALU instruction comes out of our D, if it's a ALU, immediate instruction or a load, it comes out of RT, if it's a jump, jump link, it's R31, and this says what you need to compare against. And then, whether you need to write or not is a little bit complex here if you have a LU, an LUI, or a write. It writes it, except the case if the right source, or the right, right numbers or, the right register identifier is zero. Cuz in MIPS, the zero register is a throw away register, you don't need to interlock against it. You wouldn't be incorrect if you did interlock against it, you would just have slower performance. And then jump and link, jump and link register, always write and then everything else doesn't write the register file. So, we've, this is a, so the first part of our calculation. And now we need to, do a calculation, of whether we actually read the value. And there's gonna be two of these, one for the first operand, one for the second operand. Okay let's build up for the different instructions here. So what's basically transforming this table into some logic equations that we're gonna use. So, ALU, ALUI, loads, stores, branches, they all read. Jump it register jump and link, register all read, at least the first source operand. So, RE1 gets set to true or one or on if it's any of these op codes. But for jump and jump link, which don't read a first operand it doesn't, that comparison has to not compare against this value otherwise you'd be stalling too often. For, the second operand, only, true ALU instructions, so not immediate instructions, and stores read that second operand. And everything else does it. Okay, so now let's put together the actual stall signal, and this is the stall signal for sort of the decode stage and the fetch stage of the pipe. And we're gonna end up with stall equaling a comparison between the source register identifier in the decode stage compared with the right. Register identifier in the execute stage And. That it's actually writing or we need to check with the memory stage. So the same calculation with the memory stage. Same calculation here for the write back stage. And then we also want to make sure, so we take this whole expression, and we end it with whether we actually have a read enable for the first source operand. Cuz if we don't read the first source operand, there's no reason to stall for it. And we do a similar sort of thing here, for the second source operand. We use, RE2 that we derived here, and we ended together with an expression which says. Does the RT, the registry identifier RT in the instruction, is it the same thing as, the different destination, register identifiers in the subsequent instructions? In the different stages of the pipe, later stages of the pipe. Okay, so is this everything? Hm, it looks pretty complicated. And this is well, this isn't so bad so far. If we, if we make the pipe longer, we're gonna end up with more terms inside of these two equations. Well, no, that's not quite the full story. So what are, what are we missing here? Why else would we have to stall a pipeline? Well, unfortunately, this only takes into account instructions where the destination is available right at the end of the execute stage. Here and here. This encapsulates it. These two comparisons encapsulate that. Well, something like a load doesn't necessarily encapsulate that because the load value is not ready until all the way down here. So we might need to insert some extra stalls for that. Also, loads and stores are more complicated because you might have a data dependence through the data memory itself. This example here. We have a little snippet of code here. We have a load which is going to take or excuse me a store here which is going to take register two and write it to some place in memory. And then we have a load here that we're going to read out of some place in memory and put into register four. Okay so the question comes up. Is there any possible data hazard here? Yes, cuz what if R1 plus seven is equal to R3 + five. So we're going to be having a case where you actually have two different values here where one is, needs to pick up the data value of the previous store. So the load needs to pick up the data value of the previous store if, and only if, this is equal to this. Hm. Okay. So that's, that's not so bad. So let's look at these, data hazards a little bit more and figure out how we can derive the equation to check for them. So, just a recap here, our example is we have registers R2 are storing it into a location here, and we're reading from possibly the same location. We don't know. So what if our, our one plus seven is equal to R3 plus five? Well, first of all we're writing and reading to the same address in time. Right next to each other. Well, our hazard is actually avoided because our memory system is so fast. Because everything goes down the, the pipeline, an order will actually go to, right to the memory. And the next cycle we'll be able to read out of that memory. And we'll pick up the new value. Pick up the, the changed value. But I want to introduce this because in more realistic memory systems, this requires much more careful handling, because if you have a memory system which takes multiple cycles for the store to happen, or the store happens let's say, at the end of the pipe into the, the memory then you're not gonna necessarily get that value, and you might need to bypass that or you might need to do something more intelligent. Okay, so let's, we talked about stalling the pipeline. Now let's look at, if we want to improve the performance some more. So one of the things you may not have noticed in that stall, but did happen, is that if there were any instructions in the later portion of the pipeline. Which in earlier, its instruction decode stage [inaudible], it stops. So, no place do we actually afford the data values early. And now we want to talk about forwarding and bypassing, of how to add extra data paths, to allow a value to be sent from a later stage to an earlier stage, faster than having to wait for the right back of the pipeline to occur. So here we have the, bypassing, or here we have our data path that we had from before. And, what I'm trying to get at is, you have the problem that you have a value here or you have, you have an instruction here. And if there was any instruction which write the source, writes through a operand register or identifier that this instruction is gonna wanna read, it's going to stall. But, as you might notice, a little bit of insight here, is, if you have an add, you can actually try to read this value early. But our data path is not good enough to do that right now. Okay, so let's add in a bypass here. So we're gonna add in this bypass which takes the result of the ALU, turns it around, and puts a multiplexer here, and we can now detect whether using sort of a similar sort of signal as our stall signal, we can detect whether two operands match, and if they do, we get the result value out of the ALU early, and run it through this multiplexer. Okay, so an important question is does this help our example we had before from performance perspective. So our stalling logic that we put in was good enough to make sure there wasn't error. But its not good enough to actually have good performance because you have to wait for the value to get to the register file before you go ahead. So, here we have the same example that we had earlier in class where you have, something which writes register one and something that reads register one, and to ALU add instruction, in this instruction back to back. So does, does this get help? Well, yes. So this you can see, clearly see that this instruction is, the, the result here is gonna come back around and we only, we, we effectively don't have to stall the second instruction. Because it can pick up that data value right then and there. The data value gets calculated in this stage, it can sort of loop around real fast here. And we don't have to stall at all. In this case. Okay so, quick quiz question here. Two other cases. We have a memory operation, we just erased memory one, and then we have a jump in link, and then an add. Does this bypass, right here, help, with these two cases? Well. We said it helped here. This case here. Well, when is the memory, load? This is a load. When does the load result get calculated? The load result doesn't get calculated until the output here of the data memory or right here. So all the son. That's after this bypass. So it's too late. So we still need to stall the pipeline here for load with a dependent instruction, dependent on the load. So we're gonna stall there. Okay. Now, a trickier one. Jump and link 500. And then something which reads R31. So, little bit of, background here on the MIPS instruction set. Jump and link implicitly writes to register 31. It's the, the link register. Okay. So, so that means we have a data dependence. Where, where does the jump in link, what, what gets put into R31? So it's the program counter or the program counter plus four is what it is architected as in MIPS. You could probably build it either way depending on how you do jump register. So on first, first look this looks like it should actually like solve a lot of problems we should be able to by pass our result of our jump in link to right where it needs to go. Mm. It's a little unsatisfying though, because, if you look at the rest of the pipe here, if you have a jump. Let's say in the execute stage. How, how are, you know, is, is the consumer of that instruction gonna be here or not? So this one's kind of a trick question. So, does it help? Well, you can bypass out and around, but the thing after it in the pipe is probably not going to be appropriate instruction. So, if I were to answer this question, does it help? I'd probably say no. Because at least in the pipe drawing here, you're not gonna be executing the subsequent instruction, or executing the, even if this is, the instruction at 500, you're not going be actually executing that. You'll probably have to, sort of, wait for that, jump to resolve somewhere further down the pipe, and, then go pick it up. So the bypass, bypasses don't always help. And, especially in something where it's not a, a fully bypassed pipeline. Okay. Oh, before I move off this slide, I wanted to, say that this is called bypassing. Sometimes this is also called, forwarding values. And we're gonna be using those terms interchangeably in this class. Okay so now, now we get more, more details here. We start to look at how to derive the bypass signal. We're gonna do, we're gonna build this the same way we derived the stall signal and we're gonna take terms out of the stall calculation we had before. So, if you will recall, we have the pipeline diagram here with the stall signal. We have stalled sa, stages, and we ended up having to stall in this case where we have a ALU op followed by an ALU op that's dependent. So each, each stall or kill introduces a bubble into the pipeline and this is gonna give us a clocks per instruction over one. With that new data path, which bypasses out of the ALU into input operand A, we can see that we actually can remove all these stalls and just do the bypassing. So, it actually shrinks the time taking to execute this code. And this new data path is really a great thing here. This bypass has taken us from greater than one clock per instruction to one clock per instruction. And we're actually forwarding out of the execution unit for one into the decode unit here and gets consumed here. And the execution unit in time three for the, the instruction two. Okay so let's drive the bypass signal. And we'll start off by looking at our original stall signal. So this is just the stall signal we had before. And, first thing we're gonna do. Is if you look at this, this case right here where we compared, the execute right destination to the, decode first source operand. We don't need that anymore. We added a bypass. And we added a forwarding signal to handle that case. So just sort of put a line through that. Okay, the next question that comes up is we had in this diagram here, we added this multiplexer here, to choose between reading from the register file and reading the data which came out of the arithmetic logic unit. And we need to ask, what is the control on that multiplexer? Well, it's the exact same case that we just crossed out. When that case is true, we wanna do the bypass. So we just take that, those terms, and put it here. And that's actually the control in the multiplexer, a source here. Is this correct? Hm. Is this the full story? Well, unfortunately no. It's close, really close. It looks like it should work, but unfortunately. Only ALU and arithmetic logic immediate instructions can benefit from this. If you have something like a load, you need to wait for the data value to show up. So, this, a source here needs to have some component saying, make sure it's a load, or make sure it's not a load, if you will. And up here, we actually reintroduce that term, checking to see whether it's something with a load. So what we're gonna do is we're gonna split the write-enable into two components. The write enable that you're bypassing and the write enable that you're stalling. And we're gonna re-introduce these sort of two components here with two slightly different write enables, dependent on the decode of the instructions in the execute stage of the pipeline, Okay so let's do that. And we replaced, we still have this term back in the stall signal. We still have this term in the, the bypassing signal, but we now have two different write enables. One for bypass calculation and one for a stall calculation. And the bypass calculation these two different signals are, calculated based on the decode of the instruction in the execute stage. So we bypass only when it's a, a, arithmetic logic unit, or an immediate arithmetic instruction and let's say the destination is not zero. And we stall if it's a load or a jump in link or jump in link register also falls into that case. And that's when we have to do the stall. Because the ju, jump in link, and jump in link register And at least. This data path we only have this multiplexer here, register 31 at the end. You can build data paths which have different multiplexer's for that and you might be able to remove that clause from this. Okay, so what I notice here is our loads, and jumping link registers, and jumping links are gonna stall when we have a match on the registers. And something like ALU instructions generally, write and able to bypass is going to not stall and use the bypass of the following logic. Okay, so let's take a look at what this looks like for a fully by-pass data path. So our fully by-pass data path we're going to add all the destinations locations out of here, out of here. We're going to run that back and we're going to add to big multiplexors here. Because in our first case we only multiplexed for the first source operand, or the A source operand. But we want actually want to multiplex the inputs for a and b the two source operands. And we're also gonna add this PC here for the jump and link that handles some of the more complex pieces here because we, otherwise we have to put multiplexers here for sort of R31 of multiplexing in the PC or something like that. So we've effectively be able to bypass everything here. So the question here is, is there still a need for the stall signal. So this is more than what we had before. This is more than just a source. We now can bypass out of not only here to there but we can bypass out of after the memory operations. So maybe this changes our stall signal so that we don't need to stall on loads anymore. That'd be great. They'd have better performance. Well, unfortunately, no. We still need this. You still need to check. If the opcode is, is a, is a load, in this stage of the pipe, even with a fully bypassed data path. So we've, we've resolved a bunch of the data hazards but the loads, still need to, wait, or the instructions dependent on loads still need to wait, because you don't know, the results of the value, until, you come out of here. So you can't issue a subsequent instruction, into the ALU stage early. You need to stall. But, this is basically our full stall calculation at this point. Because we add all those bypasses, we've removed a lot of the other complexity from our stall signal. And in this case you'll see that loads have a latency of two cycles. Okay. So, as I said, the last technique you can look at is speculation, where you try to guess things. Guess data values. Guess things like that, or try to execute code out of order. We're gonna talk about that later in the course. That's, that's not really in today's lecture. It's not really review material, but we will, we will discuss that to some, some extent. So now we're gonna move on to talking about control hazards, and because we're running a little low on time we'll look at that a little bit more in the next lecture.