So in, in this processor we're going to have basically three pipelines here, a long multiply pipeline, a memory pipe that say takes two cycles, and then a short ALU pipe here on the top. We have a scoreboard like we had before. And this is going to track where, where data is available in the pipe. Where data is available in the pipe and architectural register file is surfing at the end. So this was roughly similar to the in order issue, in order right back, in order commit processor, but there are some interesting things here. If you compare this picture here to this picture here, we just dropped all these pipeline stages. So that's pretty cool. We don't have to bypass out of there anymore. We can just sort of shove the date in the architectural register file. And if we preserve read after write, write after write, and write after read dependencies things, things should be okay. Let's see if we can actually do that. So let's first take a look at the scoreboard for this in order issue, in order in order fetch, in order issue, out of order, execute, and right back, and out of order commit processor. So the, the scoreboard looks very similar to the in order, in order, in order, in order machine. And we can use it to track structural hazard on the right back port. And this is really important. If you try to have, let's say a multiply, and then an add after it, it's possible that the multiply and add under pipe lining might have to go use the right back port at the same time. So, you got structural hazard. And we're going to show a pipeline diagram of that happening in the, in the next slide. We still don't, don't actually need to have a more complex scoreboard. So briefly, someone asked in last lecture, in the scoreboard, do we need to track which functional unit each value is going down the pipe. We still need to do that for this relatively simple pipe, because for a write after write dependence, we are basically just going to stall. If you want to break that requirement, then you need to start tracking more complex things in the score board, and you may even want to have some thing like register name, which we'll talk about in the next class. That's going to allow you to basically break a write after write dependence dynamically in the processor. So an important point here because these pipe stages are different lengths now. In our scoreboard, we had different places where the bits, where the, where the entries in the scoreboard could be. But now if we go to execute an instruction, which is long, we are going to put it in one of sort of like the first entry in the scoreboard is going to march down every cycle as tracking the information that goes down the pipe. But if we are trying to let's say execute a add instruction, it doesn't actually have to wait four cycles to get into the architecture register file so we can actually insert here a one and then it just marks it down the balance of the pipe for a particular location. And this is what I was saying that because we are not going to allow write after writes hazards on a particular register, you're never going to have a case where there's basically multiple inversely ordered bits in this table. If you had that, you would actually have a more advanced scoreboard, and I'll show a picture of that a little later in today's lecture. Okay so let's, let's go through an examples of how to use the scoreboard and how to walk through a in order issue in order fetching issue, out of order execute and write back an out of order commit processor. So, here is the same code sequence we had in the previous example case. So we're going to be using this throughout all of class. Let's, let's take a look at this and sort of see how this pipeline diagram happens and notice a few things. First let's take a look at some read after write hazards, and what the pipeline has to do. Mul R5, R1, R4. This reads R1. Wire one is created by this multiply in instruction zero. So it actually has to wait to get the bypass here from. Instruction two is going to have to wait to get this value. So you can see here it's basically just stalled. And what's happening here is your in-order issue, you can't go try to issue subsequent instructions under that. Later in today's class we're going to be talking about pipelines, which actually have out-of-order issue such that while this is waiting around you can think about trying to go and issue the next instruction that's not dependant on that previous instruction. And by doing that we get more performance cause we can basically be re-ordering instructions and trying to use our functionally as use our ALUs as much as possible. But for right now, we have a bypass coming out of Y3 here out of this multiply down into the register file access stage of that next multiply. So that's sort of one thing that's going on. Let's take a look at another read after write dependence here. So, another read after write dependence is actually register eleven, gets written here and read down there. Well, what are we doing special for this? So, instead of marches on the pipe, the right happens here. Let's see where the read for this instruction, instruction four tries to happens. Well, the read tries to happen there. Well, at that point, the data's actually in the architectural register file. We don't have to worry about any funny, funny bypassing or anything like that. And you can sort of work through the rest of the, the, the, the things going on here. But there's a few other things I wanted to point out in this picture. First, because you have different pipe links, you can actually see that, let's say, this add here is running the architectural register file, before a previous instruction writes the register file in program order. And this has some, some consequences, some large consequences when you're starting to think about if, let's say, this instruction here, the multiply that preceded the add took some sort of fault or took some sort of exception. Because now, you basically change the architecture register file before anyone, before that other instruction has finished and other instruction didn't actually finish. So what does, what does that mean? Whoa. One other thing I wanted to point out in this picture, which is a really interesting case is right here. So this add instruction here is dependent on R12. R12 gets created here. And basically at the end of the stage is ready to bypass. So if we look down we'll say, well, this instruction here we don't even try to read from the bypass until here, so that value is ready. But for some weird reason this instruction stalls. Can anyone see what's going on with that instruction? All of its inputs are ready. It's ready to go. It's a party to go to, but there's a issue. Okay. So that, that is what's happening here, is that if you were to remove this I stall stage here, this would get pulled forward, and now you'd have this writing and that writing at the same time. So they should have a structural hazard on the right port of the register file there. So you have to, to, to, to stall. Okay, so let's, let's take a look at how that shows up in the reorder excuse me, how it happens that show's up in the scoreboard. So when we go to sort of look at this so what's, what's we have here is we have cycles. There's eighteen cycles across the top. Or actually, there's nineteen cycles across the top. The first one's zero, but I don't draw that. If we look at, let's say, this cycle here, instruction one which this add is in the I stage. So it's in the issue stage of the pipe. So it's looking for it's operands basically. And what's going happen is this add needs to check to make sure that it's not going to conflict on the right port of the previous MUL. In this case it doesn't actually happen. But when this instruction moves here, so this is what I was saying is that it doesn't actually put 1s in the four locations, instead it marches down. Instead its going to, start here at the before two cycles to go before it writes to the register file, because the pipe is shorter. So it has to check this location, and says, well, for my register that I'm trying to write, is anything else currently scheduled to write that location in two cycles And our scoreboard can answer that question. If there was a one in this box, yes. We would know that there was a MUL or some other instruction that had long leniency that you would get conflicts. So in this instruction we're to move here. This one would also, clocks every cycle, and moves down our scoreboard. It's going to conflict at that point, and we're going to know we're going to have a write hazard on the register file. So we can sort of, we can sort of see those things happening in our scoreboard. And then we, we could also use our scoreboard to actually detect that real case, this case here, this last instruction, this last add, we're going to see that show up over here. So let's, let's try and find that. Okay, so, we have instruction six. It wants to basically move forward in the pipe, but it checks this location here and says, okay. Or, or in this cycle, here, Instruction six basically should be the issue stage, and doesn't move out of issue stage. It's, it's sitting in the issue stage. It looks, in this location, which is basically two, two cycles till the end of the pipe, and sees that there's a one there. The, the box is trying to indicate that. So it looks there, and says, oh, there's a one there. That means I can't issue the stage and I need to stall, and we get the stall showing up. These other boxes, are here to donate the here to represent the other adds that are happening in the other MULs and things so we actually check with these other locations, actually this is just the other one add, these are, we are going to check for this add here, this add here, this add checks here and sees a conflict and has to check again the next cycle that's why there's four little boxes vertically on that, on that chart. Other things, that this is a different representation here. You can see R1 is being written and has a long lines in the pipe. This other register has a shorter or other register has a shorter life in this time, because in a, in our scoreboard, because it's an add instruction. Okay. So, do we have any questions about that before we move on to a more complex pipe? So this is assuming a fixed latency for every instruction, that is correct. Or at least per pipe or function unit in the pipe. You can definitely have function units which have variable leniency. So an example of that is, well there's sort of two good examples of that. One is something like a divider unit. Sometimes people build divider units, so that you keep dividing until you're done. And it's sort of a way to shorten the length of a divide. So that sometimes has a variable length. Another good example of this is something like a load, that misses in your cache. An out of order processor, and you have to wait for the load to come back. Good ways to handle that actually in a scoreboard. Sometimes scoreboards will just have an extra sort of special bit on the side for each destination register which says, this register's just out to lunch, you know, I, it's in some long variable length pipeline, I don't know what's happening on it. Don't try to bypass it, don't try to do anything special with it. And just wait for it to come back and that bit clears. So processors I built for these variable length instructions we'll typically just have an extra bit. For maybe the different functional units, maybe the divider, and one for the load miss case, or something like that, such that, you know, if that exceptional case happens, or if the load misses, or if the you go ahead and take a divide, which has a variable length, cuz divide can take anywhere from, like, two cycles up to, like, twenty cycles in some pipelines. And in every, everything in the middle. You'll just mark a bit saying, this register's not ready, in the scoreboard. And then, if someone tries to go read that register, it just knows to stall. So it's a slower performance sort of way to deal with that but that's a, that's a, that's a tough, tough case to handle. A scoreboard can help there and it's a sort of extra information in the scoreboard. Okay so like I said we have this out of order commit processor. It's doing out of order write back and it's doing out of order commit. Oh. Well, out of order write back maybe okay we maintain our write after right dependency so we're not actually going to end up with inflex state in the architecture register file because of that but something bad can happen is what happens if we go and try to take an exception. So let's say we have our same instruction sequence that we've been looking at up until this point. And here, we're wandering around. We're going down the pipe. And, this instruction here take some sort of fault and its figure it out at the end of the pipe at our commit stage so the multiply goes all the way to down to the end. We end up with I don't know, multiplies don't take a whole lot of great faults but let's say it takes some sort of exception. What, what is, what is going to happen? Well, that instruction is dead, all the other instructions are dead, cuz it took a fault. Unfortunately, we already wrote the register file. Done, done, done, done, done. Yeah. Okay, so now we end up in our trap handler or our exception handler or interrupt handler and all of a sudden register eleven is just wrong, it has the wrong architectural value. So this is one of the reasons why people should try not to build out of order commit. It gets, gets it gets tricky to have out of order commits with precise exceptions. Now there are, there are some ways to do it. So one way is limit the types of instructions. So if you have a in order issue, out of order commit, out of order write back, what you could think about doing is, you know that this doesn't write until this point here. So what if we resolve all of our previous let's say, all of our let's say all of our previous exceptions. If we move our commit point earlier in the processor, we can actually make this work and have precise exceptions. So if our commit point, let's say, is in the either memory one stage, or first stage of the multiplier or something like that, at that point, you know, we haven't written any other state here that's wrong. That write back hasn't happened, so we can still kill everything down. Unfortunately, that means that you can't have an exception happening here, here or anywhere else sort of like, later in your pipe. So that's, that's a problem. So you, you can limit the types of exceptions, push your commit point early, and still have an out of order commit processor with precise exceptions. But that even is tricky. So, so, this is a great question. So why can we not have two commit points? Some processors do have two commit points. And some processors will have a, it's called sliding commit point. So that you try to commit things, sort of, early, and then if something else for certain types of instructions, you can move the commit point later. But typically, you want to have a big point in one place where you say after this point in the pipe all of the state has past this has been committed and those instructions cannot be rolled back and those instructions cannot be undone. But there are examples of things where people will have a sliding commit point. I've actually built a processor which has a moving commit point. But it's, it get tricky, because what it basically means is certain types of instructions cannot execute after certain other types of instructions. Because if they do, these will violate that, that sliding commit point. Like this example here. If, if the fault can be taken here, there's no way to solve this problem. But you could have something, whereas a sliding commit point, where, if you have these, a mall followed by, let's say, an add, you can actually sort of slide the commit point out. So there are processor ideas that you can try to have a sliding commit point. But otherwise you have to, you have to check, that gets quite a bit complicated. But I don't want to really get into that, today. Let's, let's leave that for sort of advanced topics discussion. But in, in, what we're going talk about in this class, we're going say we want to have one commit point. We want it to stay one place in the pipe. Is the canonical location that pass that point all the data that is inflight is, has executed and is committed and we need to know sort of one location for that.