Okay. So, we'll, we'll get started, looks like we have a quorum today. So we're going to be continuing wrapping up what we were talking about last time which was about address translation and virtual memory. So we have like about two or three slides leftover. And we mostly what we were trying to talk about was how address translation and virtual memory influences the design of caches. Let's, let's go back and briefly recapture what we were talking about last time just put it back in everyone's memory. So, we're talking about memory management and different ways to do memory management. And we had talked, and, and some of these orthogonal concepts and functions that memory management's trying to do. Mainly, we're trying to translate addresses so that we can move around data and remap data. We were talking about protection, which would allow us to run multiple different operating systems or multiple different applications at the same time on one chip. And we also started to talk about virtual memory, but this is kind of where we were left off last time. We were talking about we talked about paging and we briefly touched on demand paging. So, we want to continue on that today and then look at some implications of the design, issues of virtual memory and the ability to have lots of different address spaces intermixed at the same time with different mappings, and how they intermixes with having lots of memory, or very large amounts of memory even though your system may have a very small amount of memory. So, it's a very large amount of virtual memory as the name applies. So, if you go look at a modern day virtual memory system, So if you're going to run Linux on your desktop these days. Your desktop, let's say, has a few gigabytes of RAM. Has some small number, Probably, one, two, three, four, maybe even six if you have a, a really good desktop of gigabytes of RAM. And, One of the things you may want to do is run applications that use more RAM than that. And, we have this big storage device plugged into our computer which has lots and lots of storage, namely, our disk. Now, the disk is not the only place you can actually try to swap memory out over. You can actually use other things, some people have built virtual memory systems that swap memory across the network, for instance, to another computer. This is actually found to be fast in the original network of workstation days. So, this is a project at Berkeley, where they had lots of computers, lots of old Sun computers on a network and they would actually able, they're actually able to swap memory across the network to the neighboring computer and use the neighboring computer's RAM, Effectively to make the, the RAM size bigger of the machine that they were using or effectively swap out. Now, What are the key things to make it all this work is having this notion of demand paging. And, demand paging, basically, is around this idea that the operating system only maps a page when it absolutely has to. And the operating system can also decide to kick things out and put that memory, let's say, on disk. If the memory is clean, What I mean by clean is, it's not dirty relative to the process. So, the processor did not go write any data to that memory and there's the exact same blocks on disk. When it goes to kick it off, it doesn't have to go save that dirty data. It can just mark it as saying, oh, it's on the disk already over there so don't, don't worry about this. If you go take an operating systems class, you're actually going to implement a demand paging scheme. I know they do that here at the at Princeton's operating systems class. So, you can try out some different algorithms on how to pull data in and out cuz there's different management algorithms you can use to do, to do this demand paging. But if we went back and looked at sort of, how address translation works from the hardware level, because that's what we're focusing on in this course. You take an address, and you run it through a translation look-aside buffer. If you have a hit, this is the everything's going well case. You have a hit, so the address is in your translation look aside buffer cache of the page table. You also probably want to check some protection bits, so to see if you can do a read or write, or if the process that is reading and writing the data is in the correct address space. So, for instance you don't want to have an application trying to read or write the address space of the operating system, or something like that. If, if it's permitte, the translation come out of the TOB, Everything's good and you just go access the data. If not, you're going to end up here in this protection fault case. Now, this protection fault case is going to be in the operating system. And the operating system will probably try to kill the processor or, or has some sort of segmentation fault. Alternatively, if you're up here in the, the TOB look up and, and you take a miss in the TOB, there's lots of interesting things that start to happen now at this point. If you have a hardware pagewalker, so what that means is a little finite state machine which will walk through the page table, walk all they way till' the end of the page table, find the mapping and install it into the TOB. This and this is all done in hardware. Hence, it's speckled. If you are on an architecture or something like Mips or Alpha, or Telera, that's all done in software. So, this pagewalker and updating the TLB, is actually done in software. So we'll have some software going, and walking it, and doing a bunch of memory references in your little piece of software there. And then, there're some hybrid approaches. So for instance, in Mips there's some special hardware that helps the software walk the page table faster. If the memory that you try to go get at is not in the page table already, then, you sort of follow here in the operating system line. You take it into op, into the operating system. The operating system then has to go look through all its structures and see, oh, is that data on the disk somewhere? Is it, are you accessing some piece of memory that doesn't actually exist? If that's the case, then you're actually going to be sort of, going over this segmentation fault or, or bus error world also cuz you're basically going to try to do a memory reference to some piece of memory which isn't there. But, if, let's say, it's on disk or it's in the swap memory or the backing page cache on the, on the disk, You're basically going to have the OS fill in everything, fill in the TOB and just return. And life is, life is all good, And you continue on. So, now that we have decided that we want something like virtual memory and we've decided that we want to have translation look aside buffers, let's look at how this influences the design of our hardware pipelines. So, here we have addressed translation, shoehorned into a 5-stage pipeline here. As some of you may notice, Here and here, we just added to the delay of these stages. We just sort of shoehorned it in. We didn't add an extra stage, we just sort of put it in there. And it's serial. Is the naive approach to go do this? Hmm, well, that has some, has some serious latency considerations. So, if your instruction memory is on your critical path to your processor and, all of a sudden, you put something else in serial fit your processor gets slower. Or if your data cache is on your critical path to your processor and you add something in that, that's, that's going to also slow down your processor. So, we want to look at techniques where we can actually move those two structures off the critical path. Alternatively, we can start to think about having something where we could pipeline the TOB and the cache, and you have one stage for TOB look up and one stage for the cache. It gets a little hard in that, maybe over here in the data, data side of the world, it gets a little hard because that's going to increase your access time to your data memory. So, when you do a load, this is going to push out that load an extra cycle if you put another pipe stage in there. And in the instruction side, this could also hurt your branches. So, if you add an extra cycle out on the if you put a pipeline stage something like here, then your PC plus four loop gets a little bit harder. Not, from a time perspective, it probably doesn't get harder, but from a if you have a branch perspective, and you take a branch in there, you're going to add an extra cycle to your branch, mis-predict latency. And, also, it gets a little bit weird here that you can access instruction memory effectively in one cycle. Having said that, it's usually easier to take instruction, to be off the critical path because you don't really change the high order bits of your instruction that often. You really don't change that, you only crosspage boundaries when you do out of the jump or if you fall, happened to fall through of the end of the page and both those are relatively rare cases. So, people found that pretty easy sort of, to take that of the critical translation path. So, we're also going to focus on the data side of the world here.