So if you hit in the TLB, life is good.
You get the address you need. If you miss, there's two major approaches
to figuring out where to go look for the data, or go, where to go look for the page
tables. The approach that something like x86 uses
is to use a hardware page table walker. So there's a little state machine in the
main processor, which says oh page table mishappened.
It stalls the whole processor, and then what it does is it, in physical memory
will walk, that multi-level page table for you.
And it will do index into it, and another index, and another index of its multiple
page table, and finally come up with an address.
It will refill the TLB, and this is like the miss case of your cache.
Another common approach, which is what Nips and Alpha do, is though actually use
software to do that walk. And effectively what this is,
This is similar to having the miss case of your cache be done in software, or it's
done in the operating system here. And why this can work out and not be too
horrible, is because TLB misses are relatively infrequent.
So because the TLB, TLB misses relatively infrequent, you can try to do it in
software. Something I, I did want to say is, you can
also, because it's being done in software, you can have different layouts of your
page tables. You don't have to have one very rigid page
table layout. Cuz in, if you do it in hardware, that
means the layout of your page table has to be known between the hardware and the
software and they have to be agreed on. And that can cause, cause challenges.
Let's see. One other thing I wanted to say.
Powerpc is actually a little bit interesting here.
We, we haven't put under hardware. But it, to some extent, can sort of
straddle these two a little bit. Some implementations of PowerPC do this
completely in hardware. And some actually have some software
assist for the harder, harder cases. And you can also think of the, the
software ones sometimes having a little bit of hardware to assist.
So, for instance, in MIPS, there's a special hardware register that gets loaded
with the address of the first level page table index.
If you're doing a multi-level page table and the software can elect to use that
register or not. So, it's like a little bit of harder
boost, but it doesn't do the actual cycling of the page table in hardware
instead it's done in software. Just wanted to briefly show an example
here of real page table structures. This is the spark table structure for
their 32 bit spark. They actually have three levels of page
tables. Something, something to think about.
You can have in a, in a memory management unit will do the walk in hardware.
So it will all index and then index into it again and index into it again based on
these different indexes to come up with the physical page number if you have a TLB
mess. Okay now we have to put it in the pipe
line. So here's our 5-stage pipe.
We want to build the axis or caches. And axis or caches with virtual,
addresses, could be problematic. We'll talk about that in a second.
But, it'ld be nice to be able to access them with physical addresses.
So we want to go through our TLB to be able to access these caches.
Well, all of a sudden we've added something to our miss path.
Or excuse me, to our hit path actually of our cache.
We put something in series with it. This can slow down our cache access.
Ugh, that's not great. I wanted to show something else being
drawn here, here's the hardware page table walker.
So if have a miss it stalls the whole processor and fires up and it'll actually
use the page table base register to walk around in main memory and figure
everything out. And usually the page tables are held in
untranslated physical memory. Cuz if, if you have to go through virtual
memory to go touch your page tables to go manage your page tables,
It's a real big headache. But this makes life a little bit easier.
cuz the, the data, the caches and the instruction caches, the memories, all
physical addresses, we don't have to worry about any of this virtual memory stuff.
That's not so bad, except for, I don't like how this slows down my processor.
Okay, so let's look at a big, putting it all together here perspective.
You have a virtual dress that comes in. Goes to your TLB.
If you get a hit, you check to make sure you can read it or write it or you have
access to it. If you do, you just access the cache and
life is great. If you don't you trap into the operating
system. And you, for protection, violation, or
some sort of segmentation fault. Your TLB look up here, let's say you take
a miss, you end up with the page table walker.
Now this might be a hardware page table walker or a software page table walker.
And, this is where we start to see virtual memory showing up.
So I said we were going back to memory on disk.
Well, this is where it shows up. In the page table, the data could be in
secondary storage. It could be on disk.
And we can figure that out right at this point.
And the operating system will be notified if it's not in the page table.
So if the page table entry is invalid, you end up in the page fault handler.
If it is valid, if you have a hardware page table walker, the hardware page table
walker will update the TLB and you go to reexecute the exact same instruction that
you just faulted on or if your hardware page table walker not even going to take a
fault here on this path. It'll just stall the pipe for a bit.
If you have a software page table walker, you'll actually have software doing this,
this little part here. Now let's say, the page exists but is on
disk. We, we somehow moved it onto a disk.
Well the page fault handler will actually load the page, in the memory, read it off
a disk, patch the page table, such that it now points at the correct place in memory
and then re-execute the same instructions. So return from interrupt at the
instruction that took the original page fault.
So this turns into if we put this all together, we end up with our, modern
virtual memory system here. And the modern virtual memory system
allows. Either memory to be in memory or memory to
be on disk. And one of the, the cool ideas that people
came up with is some fancier ways to manage this.
So we end up with what's called demand paging.
So in demand paging, we don't load anything when we start a program.
If you go run a program on Linux. Believe it or not, all it does it maps the
first page of your executable and just starts you executing.
Doesn't pull in the entire executable. If your executable's five megabytes, it
pulls in four kilobytes' worth of that on x86 and starts you going and just, you,
starts executing. Now what happens is, when you go to, run,
let's say you're executing, you execute off that four kilobytes, you end up in
this page fault handler. Now the OS has a, a table,
It might store it inside of the page table, or it might store it in a side
structure. It knows, oh, well, the rest of the
executable's still on disk. So it goes and it grabs that bit off disk,
and puts in the memory, maps the page and restarts the load, the mist, we'll say.
And this works not only for portions of the executables that are on disk but this
can also work for things like your heap. So if you go and try to access off the end
of your heap and the OS knows that you actually allocated that space it can just
actually create new space for you and go find some page in memory for that space.
So, believe it or not, on a modern day x86 processor running Linux, when you go to
call Mallock and you allocate a bunch of space, that space does not get created for
you. Nothing happens.
This is why Mallock happens really, really fast.
The page does not get created until you first touch that page.
So it's just smoke and mirrors in the meantime.
And this is called demand paging. So during that, that first time when you
call Mallock, it just does nothing. It just, the OS keeps track that, oh, it
should have given you some memory, but it didn't.
And then, when you go to actually go touch the piece of memory that it was supposed
to give you. At that point, it goes, creates memory for
you. And the reason it does this is because it
decreases the memory pressure. You can actually run more programs if the
programs are not executing the memory. So let's say that you have a program which
is, allocates Mallocks a lot of space. But the does not go to access it.
Well, It didn't have to go allocate memory for
you. This causes some challenges on the flip
side though of all the sudden let's say you have a, a hundred programs running and
they all Mallock, I don't know,
A gigabyte of space that I'll call Mallock of a gigabyte.
The OS would just say yes, yes, yes, yes, to everybody.
And then everyone goes and tries and access that gigabyte of space, but your
machine only has let's say, mm, Four gigabytes of memory.
But all of a sudden, it promised a thousand gigabytes of memory.
Well, when that goes to, when your programs go to actually go access that
thousand gigabytes or that terabyte of memory which it doesn't actually have,
At that point, one of them's going to crash, basically, and you'll get an
out-of-memory error. So this is the flip side of this demand
paging scheme is that your OS has to be very careful not to run out of memory.
But sometimes it does. So we're going to lets stop here for today
and next time we're going to finish up this lecture and move, move on to some
other topics. But I just wanted to.
I think we're running a little bit late today.
But I next time we're going to talk about how to integrate translation lookaside
buffers with caches, and how to integrate them into the pipeline.
So there's a lot of complexities in doing that.
It's very possible that you don't want to actually access the TLB first before you
go to access the cache. And we're going to talk about how to do
that and the main reason you don't want to do that is it increases your time through
here. Okay let's stop here for today.