So that is our first optimization technique. Let us, if there are no questions, we will move on to our second optimization technique. Your book lists ten optimization techniques. We are only going to cover I think seven of them between today and the next lecture because I think some of them are not actually that important but we are also going to cover some other ones which I think are important. okay so next thing we are going to look at, the next optimization we are going to look at is how to deal with hits or excuse me, how to deal with read misses in the cache that have some data there that needs to get kicked out of the cache. So here we have our CPU, L1 data cache, our next level of cache here will say a little L, L2 cache or maybe main memory There is something in this cache, at a particular line and it is dirty in the cache. So the dirty bit is set. It has state we cannot throw out. We do a read and it aliases that same location and we need to evict that line. It will create a victim. In a naive implementation, we'd actually have to sit there and wait for all that data to go out to main memory while we get through the next while We go do the read and get the data and fill it in. Okay, That is, that is, that is not pretty good. We sort of have to wait for this evicted dirty line. We will talk about that in a second. So the processor could be just stalled, waiting on rights. One thing you do is actually have to read misses. Go beyond the rights, sort of pass the rights and pass the rights going out to the unified L2. But one of the problems here is you do not have that many ports onto this unified L2. You only have one port out here we will say. So if you have the load sort of pass it and if you do this little dance that when the read data comes back or load data comes back, you need to have a place to put it. So either at that point you might need to wait for it to go out. To main memory or go out to the next level of cache. So you cannot really get around this. So you can say oh I'll try to get my load out early and just sort of worry about it later. Yeah but that does not work if that load hits in your next level of cache. You need to access the next level of cache and you are waiting for this data to go out, because you need someplace to put it. So the solution to this is we put a little buffer between our L1 and our L2 or L1 and our main memory and this will hold writes or victims that go out from the L1 to the L2 And now, we have someplace to put the data. So, if we wanted to do this fast, we would actually do the load, we would miss our L1 cache, We would send that request out to here. And then, instantaneously, we would start evicting the line into this buffer. And the reason we cannot evict it here is because the load is actually using that, that data right now or it is using the L2 cache, we will say. But we have some place to put the victim data, and when the load comes back, we can put it into the L1 data, array. So this brings up a whole bunch of problems. Biggest one being, at some point, you actually can get transition, from the right buffer, into the L2 cache. Hmm. You could do that if you have extra time. So you could just have some circuits here with checks. But then comes the question of if you need to do this a second time what happens? Your second, let us say load that has to create a victim and this buffer is full. What do you do? Or you could just stall. And wait that's that is a, that is a pretty good option. So the, the, prob, this kind of what you are saying here is the probability that you have two victims generated in a very short period of time is low and this is actually a scheme that people do use. They do not do anything special and the first victim that gets generated goes into the right buffer, the second victim that gets generated just stalls, the pipe. That is okay. You have higher performance, if you can have the, subsequent read, basically. Go beyond the right buffer here and start actually doing something in the memory system. And if you want to do this, you need to, just like what we did in the previous example, you're gonna have to check this right buffer to see if the data is there. And that introduces complexity, because now your data can be here or here or you are further out. So there is just more places to check. Okay, so that is like the first half of the right buffers. The second half of the right buffer, why we want to put a right buffer is if we have a write-through cache. So we have been talking about write-back caches which introduce victims. But, if you recall, a write-through cache, every single store that happens, Discordance the data cash that the low level L1 data cache and it also gets bring in into the next level of cash, we will because it is writing through. So let us say you have a right through from the L1 to the L2. And one of the challenges with this is you might not have enough bandwidth, into the L2 cache to basically take in every single store that occurs. So the solution in this is you can actually put a right buffer here which will sort of buffer off some of this extra store bandwidth and we will introduce a notion of a coalescing right buffer. So this is a extra addition to a right buffer here that will actually merge multiple stores through the same line. So let us say you have a store to address five and a store to address six with a right through cache. You do not wanna actually have to write sort of, two full cash lines out to the L2. Instead what IP will do is they have coalescing right buffers. So there is one right buffer here might have multiple entries that holds the whole cache line. And that first store will push that whole cache line out into here. The second store will try to push the whole cache line out but you will notice that it is for the same address that it already has in it so actually merge the two caches line into one location. And what this does is decreases the bandwidth that you need at the L2 cause it is very common in codes to write sequential addresses. So it is if common to let us say you are, you are adding to raise the destination array you will actually just be writing address after address after address. And you do not want to actually have to go fire up the L2 for every single store you do in that array operation, If you have a write-through cache. So you can put a coalescing write buffer here, to, to save bandwidth, into your L2. This, this, this is our, for our second technique is having this right buffer. Okay, so what does the right buffer do? Does it decrease our miss rate? Cache is the same size. Associativity is the same. Probably not going to change our miss rate. Miss penalty. Okay,, raise your hand here Who thinks this affects the miss penalty? Some people raising their hands. I think we should probably be all be raising our hands, because that was really what we were trying to do with this whole right buffer is to reduce the missed penalty, and the reason this reduces the missed penalty is that reed misses in here. It, it does not need to wait for the right to occur of the background data, or the victim data. Instead, it can just have that happen in the background. This also does not affect our hit time. It may actually help our bandwidth. The reason it does not affect our hit time is our L1 cache just will still work the same it works before you are still, can do loads and stores against that and if you hit it is fine. So it only affects miss sorts of things. Bandwidth like I said if you are a write through cache this might actually effectively give you more bandwidth if you have a coalescing right buffer.