So lets go on to our next
cache optimization technique. So we talked about multilevel caches, almost all processors today have
multilevel caches and this is good. It's going to reduce our miss rate and
for at least the lower level cache, it's effectively going to miss the, reduce
the miss because it's going to be able to go to a closer cache versus
having to go out to main. What's our new technique here? Well, the next technique I wanted to
introduce is what we call a victim cache. And why do we call this
thing a victim cache? Well, a victim is a cache line which
has been evicted from your cache. Hm. So there's victim or this evicted line,
or evicted cache block, Typically if it's dirty, it's has to
be written back out to main memory. If it's clean, you can just invalidate
it and not have to write it back. But what the insight of
the victim cache is, is that we're going to keep
around the most recent victims. Or the most recent cache lines that
we've casted out of our cache and keep them in a special little side structure
that we're going to call a victim cache. And the side structure of this victim
cache is going to be very, very, very small. Now why is this useful? Well, one of the things that is pretty
common is that you're going to have a piece of code. Which for
instance writes the same set in the cache. So let's say we have a two
way set of associative cache. So there's only two sets per index or
two ways per index, excuse me. So we go look at the two ways. And let's say we have
a loop where we read two values from an array and
we write it to a third array. So let's take a look at a piece of code
that says, something like the following. For i is 0, i less than some big number, i plus, plus. And then, let's say we have three
arrays which are all large, so let's say we have an array,
an integer array. A, and this is, let's say a million. One million. We have integer array, b,
which is also one million. And integer array c,
which is also one million. And we'll say that this is one million,
the one million, which is basically one
thousand by a thousand. So, it's a power of two. Okay, in a four loop here,
we have a simple loop. The loop's going to say,
c[i] = a[i] + b[i]. And that's our loop. Now we've a two way set associative cache. Well that sounds like a problem here. So if you go look at this, what we're
actually going to do is we're going to pull in a, we're going to do a load of a. And this is going to have let's say
the same index in the cache because we have the same Index into
this array here, and if those, these three arrays A, B, and
C are all aligned are naturally aligned. They're going to basically
hit the same place. They're going to have a conflict with b. Which is going to have a conflict with C. So C of i, A of i, and B of i,
all want to fight for those two ways in our cache at the same index,
at the same time, every loop iteration. Now why is this bad? Well, we're iterating
here based on integers, well our cache line is
bigger than integers. So what we're going to do is we're
going to pull in the cache line, Let's say it has a(i) on it. They're going to pull a cache line that
has b(i) on it, and then we're going to evict one of these two to fill it
in because we need to do a right to c(i),
which is in the same index in our cache. So we're going to fight that. We have three things trying
to fit into two locations. And to make matters worse here
the next time we go round this loop we could have actually just hit on
that same cache line a, b, and c cache line because you need a small
let's say a 64 bit cache line here for the interest let's say four bits and
a cache line of 64 bits long we could have done special locality and
temporal locality here. But instead, we got nothing. We just lost. Because we have three things
trying to fit into two. And even if you use something
like least recently used, it's actually going to
do the wrong thing here. Least recently used is
going to continually kick out The least recently used thing. But the next time I'll allow in the loop,
you want that least recently used thing. So you could have actually done
better here if you didn't use least recently used. But instead, just let's say, kept a and
b and always cache missed on c or something like that or
pegged one of these three into the cache. For a two way associative cache. Okay so how do we go about solving this? So let's go back and
take a look at the slide here and we're going to see that what we can do is, we can have a victim cache which is
hooked up to our level one data cache. And this victim cache is going
to have effectively extend our associativity on a very
limited subset of the lines or very limited subset of
the indexes into our cache. So, if we take a look at this, let's say
we have a fully associative cache here for recently evicted lines,
maybe four entries. In fact actually, the original paperwork
showed this, they had even let's say one cache line worth of victim cache and
it helped performance. But, this relatively small
number of entries here for victim cache can extend the associativity
of any index in the cache line. So it's a fully associative cache
added into our cache miss design. And how does this go about working? Well now when you go to
access level one cache here, if you miss in your level one cache you
pull it in from your level two cache but you first go and check the victim cache. Now how do things get into
the victim cache in the first place? Well when you go and try to invalidate
something out of your level one cache instead of just taking that and throwing away that data you transfer
it over into the victim cache. So it's a little cache of most recently
evicted lines from our level one data cache or most recently invalidated
lines from our level one data cache. So it has some extra associativity. Now you can check this either in parallel
or in serial with your level one cache. And there's a couple design questions
that come up here is on a level one miss. And let's say it misses
in the victim cache. What goes about happening here? You're going to be bringing in
a new line from a level two cache. You're going to take what
was in the level one cache. You're going to move it
into the victim cache. And you want to go check this on future
cache actives that miss in level one cache. But the question is,
what happens with the victim cache? Well, just like a normal cache,
if it's dirty data you probably want to go write that back to the L2 Cache. Alternatively, you could make The Victim
Cache, effectively write-through. And in fact,
usually the sort of Victim Cache design, they typically try to make
the Victim Cache write-through. Because then, you dont have to worry
about when something has to get removed from the Victim Cache,
where it has to go. You don't actually have to go and do the
write-back to the father levels of cache. Instead you can just basically throw
it away, but that's a design decision. You can decide whether you want that level of victim cache to always be clean or
dirty. Thankfully, usually if
you're doing the evict, you might have enough extra bandwidth
in your L2 here to actually just do that right into the L2 here and
leave the victim cache completely clean. So to recap, the basic idea
here is you add a couple extra ways, if you will, or
a little bit of extra associativity for a very limited set of the indexes in your
cache here by adding this victim cache. The downside here is you might have
to go check this victim cache. If you do this serially it might add
to your lane C out to the L2, or you could keep on checking it in parallel. Which would potentially increase power, or
the energy used, but not your performance. Okay, let's pull up the scoreboard and see
how out, how adding in victim cache helps. So first we're going to see
is that the miss penalty to our level one cache is
going to be decreased here. Because we've effectively added an extra
structure here that's going to be closer let's say then a level two. So miss penalty is going to be lower. But the other advantage of a Victim Cache
is a miss rate out of the aggregate. Level 1 cache plus Victim Cache
together is now going to be lower. So it's going to be better
from a miss rate perspective.