Okay, so one, one technique to do this is
to, Let's say, take the TLB, and instead of,
having the TLB in front of our cache. But our TLB,, let's say, in parallel or
after, after our cache. So what this means is the addresses that
go into our cache are virtual addresses. And this has some pretty big implications.
Lots of processors are doing this, these days,
Where they'll actually put, the TLB in parallel with the cache.
And this picture is a little bit confusing, because it looks like this is
after the cache. To some extent, that is and it isn't,
depending on how you sort of squint and look at this.
If the cache is completely virtually indexed and virtually tagged, it would
look something like this, because you only go fire up the TLB when you take cache
miss, and you have to go out to well, farther out layers of memory.
Now if you have a, a virtually indexed but physically tagged cache, what that means,
is the address that goes into the index of a cache array is a virtual address.
But then you do TLB access in parallel and outcomes a physical address, and they need
you to compare the tag match on the physical addresses.
That makes a lot of, lot of things a lot easier in life and we'll look at that in a
second. So,
One of the major, major challenges that you end up with here of virtually indexed
caches that I wanted to point out is you start to get some aliasing problems.
What do I mean by this? Well, before, when you were to go put
something in the cache, it could only be basically one place in a direct-mapped
cache. In a n way associative cache.
It could be in, in different places. So a two way set associative cache could
be in the two ways. But you knew where to look for it, at
least. But all of a sudden, if you start to have,
Bits above the minimum page size in the address feed in to where it is in the
cache, it can actually be in multiple places in the cache. So, brief example
here. We have a 32 bit address..
We have our cache offset here. We have, let's say, a page size of four
kilobytes. So, that's going to be what, twelve
bits?.. And, let's say our cache index.
Or our cache rather has, I don't know more than four kilobytes in it.
So all of a sudden we got a direct match cache which has eight kilobytes.
Uh-oh. So, this, this bit here.
So this, this is our index. Sorry, our index into our cache has one
bit here above the page boundary. And the OS could elect to have this bit
here, be a zero Or a One.
So what that means is, all of the sudden when we go to index into our cache.
The same physical piece of memory might be in two different locations or depending on
how the operating system sort of lays out memory, you might look in the wrong spot
or you might need to check both places. So, we just started thinking about this,
that the bits above the minimum page size, if our cache is bigger than our minimum
page side, minimum page size, the bits are not gonna match.
And, we're going to we'll walk through an example of that in a second.
Also, virtually address caches has some other, other challenges here, Because two
applications can have the same virtual addresses.
And let's say you're trying multiplexing between application one and application
two. All of the sudden, these two applications
are going to go into your cache and you might have one application hitting on the
data of another application. So if two applications are trying to go
access, address five, and they have different values stored in address five,
in our virtually indexed cache, all of the sudden, you might start get something
weird here, you might actually end up where,
If, if you don't protect against this, one process is reading another process's data
out of the cache. So you need to protect against this.
There's a couple different approaches. One approach is actually just to flush the
cache on every context swap. So every time you change processes, flush
the whole cache. That's sounds really expensive, but
believe it or not, that's actually done with, non, non trivial, probability.
And, er, it actually is done in some actual real, systems out there.
A little bit, nicer way to do this is to have address space identifiers.
So you actually tag the cache with an address space ID as per the tag
information. So it's not just the virtual dress that
matters but its also which process ID, or which dress space ID.
So it is another sort of thing. But then it increases your tag bits.
So you've got to be little bit careful about that.
So this is mostly about having a virtually indexed, virtually addressed cache.
Or virtually, is going to be virtually indexed, virtually tag cache.
And if we look at this, how this fits into the, the pipeline,
Life actually gets a lot, lot better from a hardware perspective.
This is sort of summing up what we saw before.
You really only have to do, only have to do translate on cache miss.
So your, your mating processor pipeline looks the same as what we've been drawing
up to this point. But on cache miss, you have to go through
either your instruction translation look aside buffer, or your data look aside
buffer. Er, data translation look aside buffer.
So, to sort of this, little bit more pictorially here, what's happening with
virtually addressed cache's. Let's take a, take little bit of a gander
at this example here. So we have two virtually addresses,
virtual address one, virtual address two. And, the operating system elects to map
those to the same physical page, in memory.
This is something that virtual memory systems do many times, sometimes you wanna
have a memory whole, we have memory now twice you wanna have memory map between
two of your applications share data, pretty common thing to have happened.
So we want to share some physical memory. Unfortunately, if you go look at our
virtual adressed cache here, we actually end up with, and in two different
locations, this is going back to that first example there.
We have a first copy here and a first copy there and it depends on where it actually
was located in the virtual address space, which has no mapping to where it actually
should be located in the physical address space, so the bits don't match.
The bit of the virtual address here and the bit of the physical address in this
bit here does not match. And that causes a, a world of problems,
because all of a sudden you can have well say even the same application right to
address lets say. Ten and right to address 4096 plus ten and
there's supposed to be mapping to the same location. It's supposed to be the same
address, but you couldn't write five and then the other using the other name for
that same location you know read lets say a 1000 or some random number out of there.
So there are, there's a couple of techniques to, to deal with this and I
don't want to go into too much detail. Your ah,, book goes into some more detail,
but just to give you a little but of an insight on how to, how to, how to go about
solving this. There are some systems out there which
actually require that the virtual indexed piece of memory, or the resides in the
same location as another page of that same address in the cache.
Now, You sit there and you scratch your head,
And you might say, well, is this effectively decreasing our associativity
of our cache, or, or moving things around our cache.
Well, A little bit [laugh] is the answer, but
these are sort of the trade-offs. The OS can, to some extent, manage this,
this different layout. And that's what I was saying here er,
early sparks actually used a system like this were ensures that the virtual
addresses acessing The address in the P A will not conflict
in the direct map cache in a, in a bad way.
So, so your, your guarantee you will always go to the same location.
So, That's sort of the beginning point, if you
have virtually indexed and virtually tagged.
But you can have other, other mixes of these things.
Not all of them make sense. You can have physically indexed,
physically tagged, that what we've been talking about at the beginning of, of last
lecture. That was the sort of the simple case.
Virtually indexed, virtually tagged, that has lots of, challenges we'll say.
Virtually indexed physical tags. Now this is actually a really good trade
off here. So you do the translation parallel with
the cache axis, And then you do the tag check.
You don't actually need to have ascides in this case.
Address base identifiers, because you're guaranteed to have the correct physical
match. You might be accessing, let's say, the
wrong location in the cache, or, or you might be accessing the wrong location, so
we don't get around this, virtual physical problem in being in two locations.
We'll talk about that in a second. You still need to handle that, sort of, in
the sparkway. But at least you don't have to have
address space identifiers, and at least you don't have to flush your cache on
every process swap. Because you're guaranteed that the, the
check that you do on the cache miss or hit is exact, cuz you're doing a physically
indexed. Excuse me, physically tagged cache.
And then you can have something, well, we call it both index and physically tagged.
This is a cute little trick that a lot of architectures play, is they just want to
ignore all these problems. And they want to make something that looks
like a physically indexed, physically tagged cache.
But they still want to have a cache that's bigger then their minimum page size.
So you have a four-key page size and you want to have a it in eight kilobyte cache.
You have a direct mapped cache. This bit here, the one above the, the, the
page size, is going to be part of your tag index.
But it's not, It doesn't fall into. So it's part of your tag index, but you're
not going to be able to control it. But what you can do, is, let's say you
take eight kilobyte cache and you make it two way set associative.
So all of a sudden, it's eight kilobytes, but it reduces the number of index bits
you have. And it fits within your index into the
cache. So it's a cute little trick that you have.
It's virtual and physical. The virtual and the physical dresses below
the page line is the same. So the index,
Into the cache. It doesn't get changed after you go
through address translation. So, you'll see this where people actually
add associativity to their L1 caches, just to avoid having to do address translation.
And then you do address translation in parallel and you do physically tagged.
And you do the, the, the tag check physically.
There is this other down here that I kinda have X through.
I haven't, I, don't think I've ever seen one of these built.
You can, cause it kind of doesn't make sense.
I mean a physically indexed virtually tagged cache.
Yeah, not sure why you'd want to do that because, usually the hard part is coming
generating the address that takes into the cache.
So, I have, I've never seen one of this but it's always possible to go, go, go
something like that. But it's something that I think about that
having, if, if all of the sudden your cache size, or the amount of index bits
that go into your cache is more than the amount of bits in your minimum page size,
your address is going to show up in multiple places.
And you have to, either deal with that or at least understand what's going on there.
And it's usually done by the operating system.
One, one final note,- we've mostly been talking about multi-level page tables for
user index, and then you have a tree of pages that come out of it. This is only
one approach. People have built stranger things out
there, and still implemented paging. That's only one structure to hold all the
pages in. So one thing you could think about is if
you have lots and lots of page tables. And they're all mapping, let's say to the
same they all look similar. You could try share different portions of
the page table and many times the operating system does that.
But another way to look at it, is you can try to have a table which takes a physical
page and maps it backwards to a virtual address.
These are actually, usually called inverted page tables.
And on first appearance, this sounds weird, cuz it's the map in the direction
you don't want, you would think. But to wake up for it, what.
Usually the architectures do to solve this problem, the people, those architectures
that have had inverted page tables do to solve this problem, is they have a fast
hashing, function and a really small hash map, which does the correct direction.
And then, if they have the, slow direction, they basically sort of walk
the, page table and say, complicated hashing scheme, but basically it's kind of
one of those linked list hash functions. So you, you check one location in the,
physical to virtual table, and if it's not there, there's a link to another location,
another location, then if it's, It actually ends up working out not too, not
too bad but that's not very common today. We just want to put out that you just have
the idea or you can have different arrangements of pages and canonical page
that we talked about today or in class last time, there's only one way to go
about doing it. Okay, so the questions on virtual memory
caches, before, yep. That's a good question.
[laugh] So, we said page relocation is when the operating system, takes a page,
and, decides to, move it in physical memory someplace else.
Are they going to be as predicted as them, 25N, that location? So, You're saying
because, yes, You're saying the cache tree gets stale.
Yes, so, so the problem is that. Let's take the draws.
Here, we have a, a linear page table. We put in address, 5000 hexadecimal.
And that, maps to, some location in our physical memory.
Let's say, to make life easy, it maps to address, 8000 hexadecimal.
Now, the OS comes along and swaps this page out to disk.
Sometime in the future, decides to pull it back in.
And it pulls it back in down here at a00 hexadecimal, a000 hexadecimal and it
updates the page table points there. Now what Obama was trying to say is.
In our cache, we had some data that point to this physical memory that was in, in
the cache. Now all of the sudden we, we go and we
move everything around. Is this, is this a problem?
Because we're basically going to, do a index, and the physical address that comes
out is not going to match what was in there for that data.
That's actually okay. We're just going to get a miss on that,
that location. So we're just going to get a miss, and
then it's basically going to evict it and then go pull that exact same piece of
data. That's actually not so bad.
Now, there are [laugh] other subtle, subtle, more subtle challenges with these
virtually indexed, virtually tagged. Sorts of things that will many times
require you, when you go to use remapping, to actually invalidate all the memory that
you take out. Because you actually might get a hit .
Even though its though its pointing to the wrong location.
So lets say the other, the other case is your in a virtual index, virtually tagged
cache. And, we did this remapping, this is exact
same remapping here. Well, there's different physical memory
underlaying, underlying address 5000 virtual now.
And we wanna make sure we don't take a hit on old data, which is still in our cache.
So typically the scheme to go handle this, is you actually invalidate all that memory
out of your actually it's typically flush and validate operation out of your cache.
And depending on what architecture you're on, some architectures actually have
instructions that flush the entire cache. So X86 has something called write back and
invalidate which will actually flush the entire, it'll, it'll write back all of the
data and it'll flush the entire cache. Other architectures, or something more
like mips, does it on a line by line basis.
And it's a, basically an operating system only instruction.
Where you can actually access address. And given that address or excuse me, you,
you present an index into the cash. And given that index it'll flush that data
cleanly. Now and then there's something sort of in
the middle if you want actually have the user be able to, be able to do the sort of
flushing, you need to think a lot harder about this because you want some way for
the user to be able to present a virtual address but have that name something about
the cash. So there are some ways to do that but it's
kind of, the, the corner cases there get pretty tricky.
But, so yeah, does that answer your question?
So you, you, you get a miss when you go to access it someplace else and that's
actually okay. Now.
Trickier thing, it let's say the OS decides to.
Point to this, some other way. And let's say DMA or user of a processor
or something to go write this piece of memory.
Now your cash is stale, but this is sort of a more involved question going to if
you have multiple processors, how do you keep memory from multiple processors
coherent, which we're going to be talking about in two lectures, about cash
coherence between different processors. If it's on the same processor and it's
accessed the same way, the cash is going to pick up that change.
If it's accessed, let's say. .
Some other ways that same address, the operating system's gonna have to, be very
careful. And this is why these virtually indexed,
physically tagged, address, caches usually require some way for the operating system
not to have those bits differ. If the bits match, you know you'll kick it
out. So, and because it's physically tagged,
even if you have a, let's say, four way accociative cache.
You're going to get a hit on the physical address bits, after the translations.
So it's now that you can actually have, let's say, way zero and way one having the
same piece of physical address data in it. That just can't happen in the cache
because, on a, on a, physic lead tag cache because the physical tag information could
be the same.