Let's move on to memory disambiguation. So, this is kind of the analog of read after write hazards through the memory system. Whoops. Okay. We have a basic instruction sequence here. We have a store, followed by a load. When can we execute the load? Well, that's a tough one. If the load's not dependent on the store, we can just go execute out of order. We can potentially even execute the load before the store, in an out of order machine. If the load's dependend on the store, so if let's say, register two is equal to register four, then we're going to have a problem. So, what's, how do, how do we go about trying to solve this? Up, up until this point we, in our pipes we've said, okay, loads in stores are in order cuz that, that can solve these problems. So, we, let's say, we just execute all those in stores in order and we can still allow other instructions to move around it. So, it's out of order relative to other instructions, but we have in-order memory instructions. And, that's going to guarantee that we're not going to have any problems. Well, we're leaving performance on the table here. We could go faster. We could actually execute loads and stores out of order if R2 and R4 are different. But, we're probably going to need some extra structure to do that. Okay. So, say, we look at a conservative out-of-order load example. So, our, our conservative one, let's split the store into two sort of sub-parts. An address computation, and the actual data write. By doing this, we can guarantee that the store, because we do the address computation early, we can know that, that subsequent load is not going to alias, or we know the R4 is not going to be equal to R2. And, we can go execute the load before the store in this case. By splitting the store into two different parts, the actual store portion and the address computation. And, we still need to, unfortunately, check every single load against all previous uncommitted stores or need some structures to go do that. So, instead of a, instead of a kind of problem, cuz it's a bummer. And, we're not able to execute any load if we don't know the store address of any of the stores before us because we're not, we're just not going to know whether we can go execute that or not. Okay, well, can we do one better? Let's start speculating because let's, let's guess that these are not equal. Let's execute the load before the store. And then, we hold off committing the loads and store instructions, and we do that in order. And, if something bad happens, so if the registers do equal each other, we have to replay all of these instructions. And what's, what's annoying is we also have to kill all of the subsequent instructions. So, anything that was after that load and store which could have picked up a value from loading the store, or, well, rather, from the load. Any dependent instructions the load, we're going to have to go kill all that. And, depending on how precise we want to be, we can either try to kill just everything after it, or we can try to kill just the selectively, the things which are dependent on it. I think it's really hard to track. And, and we have a pretty high penalty here for inaccurate address speculation. So, if we go and we just execute these two instructions, we put them into the pipe and they were dependent on each other, we have to rollback a lot of state. So, one, one, one sort of heuristic on this is, was, was done in the Alpha 21264, is called Memory Dependence Prediction. So, what you do is you're still going to guess that the loads in the stores do not alias. But, then later, if you find that the two registers do equal each other, you're going to squash all the instructions. But then, you're going to do something special to that load. You basically going to say, that's a, that's a, wait, wait for the store, you sort of put barrier in. So, you're not going to get multiple roll backs over and over again. So, this is kind of a conservative way that when you see you're rolling back a lot, there's a lot of thrown away work that you're doing there. So instead, you can just say, well, this load, I think is dependent on that store. You can potentially, even remember that across instruction sequences. So, if it's always the case, let's say, this load is dependent on the store, you could potentially keep it into your instruction cache or something like that. And then, you know, when you go to execute that same load at a different time, it's going to wait for all the stores to complete and it's going to barrier. This is kind of a prediction. It's just heuristic, but it's one way you could potentially not cause that load to always replay. And, if you have sort of multiple dependent loads or, or, excuse me, it loads in one store you could potentially, cause those loads not to have to replay multiple times. But, the, but the big advantage here is when you go to re-execute that load some time in the future, you're not going to flush the entire pipe. So, this is kind of like a branch prediction, if you will. It's like, this branch I've, I've trained the predictor to say that, that load usually is dependent on the previous store. So, just wait for all the other stores to clear out of the pipe. So, it's a cute little trick. Okay. So, we're almost out of time. All right. So, speculative loads and stores. We're going to introduce a store buffer to hold the speculative state. So, let's take a look very briefly here, I'll flash up what a store buffer looks like. And then, we'll think about it for a second here. So, here you have your cache, your L1 data cache we'll say. We're going to send all the addresses to go to the L1 cache, also to this other structure here which is the speculative store buffer. Inside of the speculative store buffer, there is going to be bits basically saying whether it's valid or not. And, the load is going to check against here. And, let's say, the data hasn't actually gone into the cache, it can go pick up the new value here. And, the S bit is basically, it's going to tell us whether it's in flight or not. So, it's possible that we allocate into here, but the store isn't, at the end of the pipe yet or we're still calculating the data. So, we need to know that we're going to get a hit if we get a load out of order, and we need to go check for that. And that's, that's really why we have two bits here. And then, sometime in the future, we have to eventually move the data into the data cache cuz this structure will get full. If we sort of fire enough stores against it, it's going to get full. So, it's probably a good idea to go move the data into the data cache at some point. We can do that at our convenience, though. And, if the store aborts, this is important here, so if the store let's say, was speculative of itself, and, and some of the wrong side of the branch or something like that, we just remove it from this table. One interesting thing you hear is, if the data is in both this buffer and in the cache, which one takes precedence? So, the store buffer has the new, new values, so we have to look there. And then, finally, it's possible to get multiple stores in here which have the same tag, the same address. Is this how you're storing to the same address over and over again? And when you go to read, you don't need to go read the youngest value out of there cuz that's the most up to date one. So, this, look up through here is a little complicated. You actually need to do it in the program order. Okay. So, we'll, we'll stop here for today.