Okay, so,
Now we have to look at the hardware we have to build to, in order to do this.
Just to recap. We have a virtual address.
The offset just goes straight through. The virtual page number goes through some
sort of address translation through our page table.
And we come up with a physical page number, and this is our physical address.
Just like with the base and the bound register.
We have a bound, we want to actually take the virtual address number, and stick it
somewhere into something which does the protection check.
So what do we want to check in our protection?
Well, we want to check maybe whether it's a read or a write, cuz you might want to
have some pages that are read only, or some pages that are write only?
You may want to have some pages that maybe only the, the kernel can access.
This is what I was saying about how you can map all the kernel into your address
space. It's at the same location, but maybe you
don't want the user to be able to access that, unless you're in kernel mode.
So a lot of time, there's a bit that says, is this a kernel access, or is this a,
Or an OS access or a user access. And,
One of the challenge here is, is that you really want this address translation go
fast, cuz you don't want to take every single load in store and turn something
that was a one-cycle operation into a, you know, 10-cycle operation or however many
levels. Unfortunately, you know, you could cache
miss on your page cable access and have to go out to main memory, could even make it
worse. So,
We came up with a nice little structure to be able to solve this problem and this
nice little structure here, is, a translation lookaside buffer or TLB.
And what this is, is this is a cache of page table translations.
So you, you shove into it a virtual address and out comes, er, excuse me.
You shove into it a virtual page number and out comes a physical page number.
So it is the direct map, but it's a small set of these, instead of having the
entire, all, all of memory. So the most recently used, you might write
some sort of most recently used algorithm and raise some other algorithm on this,
that when you have a hit in your TLB all that happens is this, a single cycle.
You stick your address in and an address comes out.
If you have a miss, then you might have to go to some sort of slower case, which has
to, let's say, go walk the page table or multi-level page table, for instance.
Let's look at what's in this structure. You have a valid bit.
You typically have a bit that says whether it's readable.
You have a bit that says whether the page is writable.
This allows you to have read-only memory. If you, for instance, only have the read
bit turned on and the write bit turned off.
It allows you to have write-only memory. Now, you might say write-only memory, is
that possible? Well, yes,
Actually this is a thing, if you have, let's say, two processes that are
communicating and you want to have one process be able to communicate and only
write to the other process. But not to have the other process
communicate back the other way. So one directional channel.
You can do this with write-only memory. Sometimes these, these are merged.
Different architectures, some, some are, not all architectures have write,
write-only memory. But,
For, for completeness you want to have that.
You have a, a D bit here, which is the dirty bit.
Yes, this is, this is like the, the song by the Black Eyed Peas, Dirty Bits.
Yes so it's a dirty bit here. The, the dirty bits allows you to know if
the page has been accessed or not. So let's say you have a writable page and
you want to know if that page has been accessed or written to.
You can use this bits, this is similar to the dirty bits in caches, to know whether
the data has to be written back to a higher level of cache or not.
Well, if you think of memory, Or excuse me,
What we have mapped into memory, as a cache of stuff that could possibly be on
disk, we have to know whether to write it back out to disk.
So we can use a, a dirty bit to do that and it's usually in the TLB structure.
Now, there's different, different ways to build these caches.
The most common way is to actually have them be fully associative.
So, for fully associative, you have to do a tag check.'Cause there's a tag in here,
which gets matched against the virtual page number.
And it's associative, so it could be in any location in this TLB.
And then, finally, this is the data payload, or the physical page number which
comes out. So this is our, our base translation
lookaside buffer and it's basically just a cache of the page table.
And what we want to do is at some point when you take a, a miss in here, so the
opposite of our hit, We're going to pull in a page table entry
from our, walk the page table possibly do that multi-level resolution.
We have to go and do multiple loads at a time and take that data and put it in
here. And now the next time we go to access that
same page it hits in here and doesn't take any more cycles.
So is it usually fully associative? They're usually relatively small, sixteen
to maybe a 128 entries. Sometimes you'll see multi-level version
of these, where you have the fully associated one at the lowest level and
then some short of level two, TOB the backs it, that's a lot bigger and possibly
not fully associative. Couple different designs in here for how
to go about replacing, Having something like true LRU can be hard
if this gets large and it's possible that LRU is not even a good approach.
Unlike caches, this is a very different access pattern.
It's not necessarily spatially, there's not necessarily any spatial locality
really going on here. You probably have temporal localities.
So you, that might say that l or u might be good,
But it's not like there is spacial locality of, because you access one page,
there's any problem you can go access the page after it or the page before it.
So lots of the techniques from caches, prefetching, etcetera doesn't really work
here. But from a replacement perspective, people
have tried lots of different things. Something that actually works surprisingly
well, it's just random. You just randomly chose an entry,
Kick something out and you replace it. Random is actually hard to do sometimes.
Believe it or not. So a lot of times, what people will use
is, what's called a clock algorithm, Where you basically have a free running
counter which says where to go access in your TLB to go evict.
And every cycle you go and you somehow change this counter, maybe increment it by
one and there's a little hardware counter there that will actually do this. And
that's not quite random, Because it's correlated somehow.
And that's actually can be a good thing, cuz that can actually make it so that you
can somehow test your processor. Having something that's true, than random,
if you are trying to pick up like random noise from the atmosphere or true random
number generator, that could be very hard to test.
So instead, well, a lot of times what people will do is use a clock algorithm,
such that on every instruction that executes, you increment the TLB
replacement location by one and it rolls around when it gets to the, the full size
of the TLB replacement size. And what's nice about this is if you reset
that register to, let's say, be five, then you execute 100 instructions, you know
what it's going to be 100 cycles in the future.
So you have some predictability in that, but it looks like random because it
doesn't, it doesn't increment, let's say, with every memory reference,
Instead increments with every execution of an instruction.
First in first out has some notion of locality that works okay, true or you,
some people actually do here. So one of the good questions that comes up
is, If we're trying to access a big array,
Can we map enough memory with our TOB to access that big array without having to
take a TOB miss? So we, we'll call this the reach of the
TLB, of all the things you can map of the TLB.
So let's say we have a relatively decent sized TLB here.
We have 64 entry TLB four kilobyte pages. How much, can we, we go access?
Well. We can go access 256 kilobytes.
No, that's not small, but it's not huge. So it actually does bode for having more
TOB entries, it also bodes for possibly having larger page sizes.
Now, this is a traditional sort of trade off here for caches with block size that
having larger page size. Makes you have larger reach.
Makes you have fewer TLB entries. But it can increase your internal
fragmentation and, you might have to pull more data off the disk if you go to sort
of start up a new process, and has these large pages.
And you don't have to use all of it, You don't want to use all of it.
So something to think about is that the, the, the reach of the TLBs is important,
If you don't want to get the OS involved. So a couple of extensions to the TOB,
Something we haven't talked about up to this point is, what happens, when you have
multiple processes running at the same time or time multiplex between them.
What if you do the TLB? So we talked about changing the page table
base pointer when you change to a new process.
Well, The TLB is a cache of the page table.
So when you go to change the page table base pointer, you're going to have to
invalidate your entire TLB. And if you're doing this let's say, 100
times a second, like you do in modern day Linux, this can be, be pretty painful or
many thousand times a second on some faster systems, you're going to be
trashing your TLB pretty quickly. So that could be, that could be quite
painful. So solution to this is you actually add
extra bits into your TLB, which say, which process a particular entry in your TLB
belongs to and we're going to call this an address space identifier.
So it will get a little bit more general than a process identifier here.
Sometimes some operating systems will actually use the same, some num, some bits
of the process identifier as the address space identifier,
Some, some people don't. Well, you can, you can leave this as a
something which will uniquely identify the address space.
So now, what we can do is we can c-mingle different processes TLB entries and this
address space identifier takes part in the associative lookup in the TLB.
So, our, our match operation checks the tag and it checks to make sure our address
space identifier is equal to a special register which is on the side,
Which, which is added, which is the address space ID for the currently running
process. So we're actually going to check this and
that. Now sometimes, you do want to actually
ignore the address space identifier. So a good example of this is, like,
Operating Systems pages. So if you have the Operating System mapped
into every single process out there, you don't necessarily want to be polluting
your TLB with separate copies of the page table entry for all the different
Operating System pages. So a lot of times if you add an address
space identifier, you also add a global bit that comes that into the match.
And the global bit basically says, ignore the address space identifier and turns it
back into the case we had before. This is traditionally only used for
Operating Systems pages. And then finally, something I wanted to
say as a sort of a need extension here is to have variable size pages.
So instead of our basic TLB which has, let's say, one page size, four kilobytes
or one megabyte or something like that. You have some bits in the page table
entry, which actually say, this is not part of the match, well, actually this is
part of the match, I guess. You do have to go to look this up cuz this
is going to change how many bits of the tag to look at.
When you're going to use this and you're going to say, oh,
The output is a different page size, So we can have one megabyte pages and four
kilobyte pages at the same time in our page table.
And you can use large pages for large pieces of memory that you know are going
to all be there and small pages for something where you're worried about
internal fragmentation. So you can think about having the, the
page size and most architectures do have page size.
A lot of architectures have something like an address space identifier these days.
And some architectures will actually have an address space identifier that's hidden
from the Operating System or hidden from the programmer and it tries to figure it
out itself. So it's not architecturally visible but
the micro architecture effectively has address space identifier.