So lets go on to our next cache optimization technique. So we talked about multilevel caches, almost all processors today have multilevel caches and this is good. It's going to reduce our miss rate and for at least the lower level cache, it's effectively going to miss the, reduce the miss because it's going to be able to go to a closer cache versus having to go out to main. What's our new technique here? Well, the next technique I wanted to introduce is what we call a victim cache. And why do we call this thing a victim cache? Well, a victim is a cache line which has been evicted from your cache. Hm. So there's victim or this evicted line, or evicted cache block, Typically if it's dirty, it's has to be written back out to main memory. If it's clean, you can just invalidate it and not have to write it back. But what the insight of the victim cache is, is that we're going to keep around the most recent victims. Or the most recent cache lines that we've casted out of our cache and keep them in a special little side structure that we're going to call a victim cache. And the side structure of this victim cache is going to be very, very, very small. Now why is this useful? Well, one of the things that is pretty common is that you're going to have a piece of code. Which for instance writes the same set in the cache. So let's say we have a two way set of associative cache. So there's only two sets per index or two ways per index, excuse me. So we go look at the two ways. And let's say we have a loop where we read two values from an array and we write it to a third array. So let's take a look at a piece of code that says, something like the following. For i is 0, i less than some big number, i plus, plus. And then, let's say we have three arrays which are all large, so let's say we have an array, an integer array. A, and this is, let's say a million. One million. We have integer array, b, which is also one million. And integer array c, which is also one million. And we'll say that this is one million, the one million, which is basically one thousand by a thousand. So, it's a power of two. Okay, in a four loop here, we have a simple loop. The loop's going to say, c[i] = a[i] + b[i]. And that's our loop. Now we've a two way set associative cache. Well that sounds like a problem here. So if you go look at this, what we're actually going to do is we're going to pull in a, we're going to do a load of a. And this is going to have let's say the same index in the cache because we have the same Index into this array here, and if those, these three arrays A, B, and C are all aligned are naturally aligned. They're going to basically hit the same place. They're going to have a conflict with b. Which is going to have a conflict with C. So C of i, A of i, and B of i, all want to fight for those two ways in our cache at the same index, at the same time, every loop iteration. Now why is this bad? Well, we're iterating here based on integers, well our cache line is bigger than integers. So what we're going to do is we're going to pull in the cache line, Let's say it has a(i) on it. They're going to pull a cache line that has b(i) on it, and then we're going to evict one of these two to fill it in because we need to do a right to c(i), which is in the same index in our cache. So we're going to fight that. We have three things trying to fit into two locations. And to make matters worse here the next time we go round this loop we could have actually just hit on that same cache line a, b, and c cache line because you need a small let's say a 64 bit cache line here for the interest let's say four bits and a cache line of 64 bits long we could have done special locality and temporal locality here. But instead, we got nothing. We just lost. Because we have three things trying to fit into two. And even if you use something like least recently used, it's actually going to do the wrong thing here. Least recently used is going to continually kick out The least recently used thing. But the next time I'll allow in the loop, you want that least recently used thing. So you could have actually done better here if you didn't use least recently used. But instead, just let's say, kept a and b and always cache missed on c or something like that or pegged one of these three into the cache. For a two way associative cache. Okay so how do we go about solving this? So let's go back and take a look at the slide here and we're going to see that what we can do is, we can have a victim cache which is hooked up to our level one data cache. And this victim cache is going to have effectively extend our associativity on a very limited subset of the lines or very limited subset of the indexes into our cache. So, if we take a look at this, let's say we have a fully associative cache here for recently evicted lines, maybe four entries. In fact actually, the original paperwork showed this, they had even let's say one cache line worth of victim cache and it helped performance. But, this relatively small number of entries here for victim cache can extend the associativity of any index in the cache line. So it's a fully associative cache added into our cache miss design. And how does this go about working? Well now when you go to access level one cache here, if you miss in your level one cache you pull it in from your level two cache but you first go and check the victim cache. Now how do things get into the victim cache in the first place? Well when you go and try to invalidate something out of your level one cache instead of just taking that and throwing away that data you transfer it over into the victim cache. So it's a little cache of most recently evicted lines from our level one data cache or most recently invalidated lines from our level one data cache. So it has some extra associativity. Now you can check this either in parallel or in serial with your level one cache. And there's a couple design questions that come up here is on a level one miss. And let's say it misses in the victim cache. What goes about happening here? You're going to be bringing in a new line from a level two cache. You're going to take what was in the level one cache. You're going to move it into the victim cache. And you want to go check this on future cache actives that miss in level one cache. But the question is, what happens with the victim cache? Well, just like a normal cache, if it's dirty data you probably want to go write that back to the L2 Cache. Alternatively, you could make The Victim Cache, effectively write-through. And in fact, usually the sort of Victim Cache design, they typically try to make the Victim Cache write-through. Because then, you dont have to worry about when something has to get removed from the Victim Cache, where it has to go. You don't actually have to go and do the write-back to the father levels of cache. Instead you can just basically throw it away, but that's a design decision. You can decide whether you want that level of victim cache to always be clean or dirty. Thankfully, usually if you're doing the evict, you might have enough extra bandwidth in your L2 here to actually just do that right into the L2 here and leave the victim cache completely clean. So to recap, the basic idea here is you add a couple extra ways, if you will, or a little bit of extra associativity for a very limited set of the indexes in your cache here by adding this victim cache. The downside here is you might have to go check this victim cache. If you do this serially it might add to your lane C out to the L2, or you could keep on checking it in parallel. Which would potentially increase power, or the energy used, but not your performance. Okay, let's pull up the scoreboard and see how out, how adding in victim cache helps. So first we're going to see is that the miss penalty to our level one cache is going to be decreased here. Because we've effectively added an extra structure here that's going to be closer let's say then a level two. So miss penalty is going to be lower. But the other advantage of a Victim Cache is a miss rate out of the aggregate. Level 1 cache plus Victim Cache together is now going to be lower. So it's going to be better from a miss rate perspective.