1 00:00:03,220 --> 00:00:09,322 So if you hit in the TLB, life is good. You get the address you need. 2 00:00:09,322 --> 00:00:16,790 If you miss, there's two major approaches to figuring out where to go look for the 3 00:00:16,790 --> 00:00:21,162 data, or go, where to go look for the page tables. 4 00:00:21,162 --> 00:00:28,540 The approach that something like x86 uses is to use a hardware page table walker. 5 00:00:28,800 --> 00:00:33,703 So there's a little state machine in the main processor, which says oh page table 6 00:00:33,703 --> 00:00:36,790 mishappened. It stalls the whole processor, and then 7 00:00:36,790 --> 00:00:41,633 what it does is it, in physical memory will walk, that multi-level page table for 8 00:00:41,633 --> 00:00:44,357 you. And it will do index into it, and another 9 00:00:44,357 --> 00:00:49,139 index, and another index of its multiple page table, and finally come up with an 10 00:00:49,139 --> 00:00:52,045 address. It will refill the TLB, and this is like 11 00:00:52,045 --> 00:00:58,264 the miss case of your cache. Another common approach, which is what 12 00:00:58,544 --> 00:01:04,700 Nips and Alpha do, is though actually use software to do that walk. 13 00:01:05,720 --> 00:01:10,026 And effectively what this is, This is similar to having the miss case of 14 00:01:10,026 --> 00:01:14,393 your cache be done in software, or it's done in the operating system here. 15 00:01:14,393 --> 00:01:18,581 And why this can work out and not be too horrible, is because TLB misses are 16 00:01:18,581 --> 00:01:22,947 relatively infrequent. So because the TLB, TLB misses relatively 17 00:01:22,947 --> 00:01:25,580 infrequent, you can try to do it in software. 18 00:01:27,300 --> 00:01:31,903 Something I, I did want to say is, you can also, because it's being done in software, 19 00:01:31,903 --> 00:01:34,767 you can have different layouts of your page tables. 20 00:01:34,767 --> 00:01:37,911 You don't have to have one very rigid page table layout. 21 00:01:37,911 --> 00:01:42,514 Cuz in, if you do it in hardware, that means the layout of your page table has to 22 00:01:42,514 --> 00:01:46,720 be known between the hardware and the software and they have to be agreed on. 23 00:01:47,060 --> 00:01:54,260 And that can cause, cause challenges. Let's see. 24 00:01:54,260 --> 00:02:02,456 One other thing I wanted to say. Powerpc is actually a little bit 25 00:02:02,456 --> 00:02:06,550 interesting here. We, we haven't put under hardware. 26 00:02:06,550 --> 00:02:10,517 But it, to some extent, can sort of straddle these two a little bit. 27 00:02:10,517 --> 00:02:14,304 Some implementations of PowerPC do this completely in hardware. 28 00:02:14,304 --> 00:02:18,632 And some actually have some software assist for the harder, harder cases. 29 00:02:18,632 --> 00:02:23,081 And you can also think of the, the software ones sometimes having a little 30 00:02:23,081 --> 00:02:27,108 bit of hardware to assist. So, for instance, in MIPS, there's a 31 00:02:27,108 --> 00:02:32,338 special hardware register that gets loaded with the address of the first level page 32 00:02:32,338 --> 00:02:34,968 table index. If you're doing a multi-level page table 33 00:02:34,968 --> 00:02:37,529 and the software can elect to use that register or not. 34 00:02:37,529 --> 00:02:40,834 So, it's like a little bit of harder boost, but it doesn't do the actual 35 00:02:40,834 --> 00:02:44,000 cycling of the page table in hardware instead it's done in software. 36 00:02:47,167 --> 00:02:54,265 Just wanted to briefly show an example here of real page table structures. 37 00:02:54,265 --> 00:02:59,541 This is the spark table structure for their 32 bit spark. 38 00:02:59,541 --> 00:03:04,050 They actually have three levels of page tables. 39 00:03:04,050 --> 00:03:09,317 Something, something to think about. You can have in a, in a memory management 40 00:03:09,317 --> 00:03:14,318 unit will do the walk in hardware. So it will all index and then index into 41 00:03:14,318 --> 00:03:19,785 it again and index into it again based on these different indexes to come up with 42 00:03:19,785 --> 00:03:22,986 the physical page number if you have a TLB mess. 43 00:03:22,986 --> 00:03:25,920 Okay now we have to put it in the pipe line. 44 00:03:28,240 --> 00:03:38,080 So here's our 5-stage pipe. We want to build the axis or caches. 45 00:03:39,600 --> 00:03:43,429 And axis or caches with virtual, addresses, could be problematic. 46 00:03:43,429 --> 00:03:48,420 We'll talk about that in a second. But, it'ld be nice to be able to access 47 00:03:48,420 --> 00:03:54,904 them with physical addresses. So we want to go through our TLB to be 48 00:03:54,904 --> 00:04:02,306 able to access these caches. Well, all of a sudden we've added 49 00:04:02,306 --> 00:04:08,583 something to our miss path. Or excuse me, to our hit path actually of 50 00:04:08,583 --> 00:04:12,736 our cache. We put something in series with it. 51 00:04:12,736 --> 00:04:19,700 This can slow down our cache access. Ugh, that's not great. 52 00:04:20,586 --> 00:04:25,018 I wanted to show something else being drawn here, here's the hardware page table 53 00:04:25,018 --> 00:04:27,510 walker. So if have a miss it stalls the whole 54 00:04:27,510 --> 00:04:32,108 processor and fires up and it'll actually use the page table base register to walk 55 00:04:32,108 --> 00:04:34,768 around in main memory and figure everything out. 56 00:04:34,768 --> 00:04:38,756 And usually the page tables are held in untranslated physical memory. 57 00:04:38,756 --> 00:04:43,465 Cuz if, if you have to go through virtual memory to go touch your page tables to go 58 00:04:43,465 --> 00:04:46,180 manage your page tables, It's a real big headache. 59 00:04:49,840 --> 00:04:52,902 But this makes life a little bit easier. cuz the, the data, the caches and the 60 00:04:52,902 --> 00:04:56,087 instruction caches, the memories, all physical addresses, we don't have to worry 61 00:04:56,087 --> 00:05:02,221 about any of this virtual memory stuff. That's not so bad, except for, I don't 62 00:05:02,221 --> 00:05:11,507 like how this slows down my processor. Okay, so let's look at a big, putting it 63 00:05:11,507 --> 00:05:17,500 all together here perspective. You have a virtual dress that comes in. 64 00:05:17,760 --> 00:05:22,537 Goes to your TLB. If you get a hit, you check to make sure 65 00:05:22,537 --> 00:05:26,980 you can read it or write it or you have access to it. 66 00:05:27,780 --> 00:05:31,432 If you do, you just access the cache and life is great. 67 00:05:31,432 --> 00:05:34,680 If you don't you trap into the operating system. 68 00:05:35,120 --> 00:05:39,530 And you, for protection, violation, or some sort of segmentation fault. 69 00:05:39,530 --> 00:05:44,201 Your TLB look up here, let's say you take a miss, you end up with the page table 70 00:05:44,201 --> 00:05:47,250 walker. Now this might be a hardware page table 71 00:05:47,250 --> 00:05:55,295 walker or a software page table walker. And, this is where we start to see virtual 72 00:05:55,295 --> 00:06:00,260 memory showing up. So I said we were going back to memory on 73 00:06:00,260 --> 00:06:03,240 disk. Well, this is where it shows up. 74 00:06:03,720 --> 00:06:08,784 In the page table, the data could be in secondary storage. 75 00:06:08,784 --> 00:06:14,205 It could be on disk. And we can figure that out right at this 76 00:06:14,205 --> 00:06:18,400 point. And the operating system will be notified 77 00:06:18,400 --> 00:06:22,994 if it's not in the page table. So if the page table entry is invalid, you 78 00:06:22,994 --> 00:06:28,496 end up in the page fault handler. If it is valid, if you have a hardware 79 00:06:28,496 --> 00:06:33,249 page table walker, the hardware page table walker will update the TLB and you go to 80 00:06:33,249 --> 00:06:37,945 reexecute the exact same instruction that you just faulted on or if your hardware 81 00:06:37,945 --> 00:06:41,610 page table walker not even going to take a fault here on this path. 82 00:06:41,610 --> 00:06:46,020 It'll just stall the pipe for a bit. If you have a software page table walker, 83 00:06:46,020 --> 00:06:49,800 you'll actually have software doing this, this little part here. 84 00:06:50,520 --> 00:06:54,851 Now let's say, the page exists but is on disk. 85 00:06:54,851 --> 00:07:02,393 We, we somehow moved it onto a disk. Well the page fault handler will actually 86 00:07:02,393 --> 00:07:08,244 load the page, in the memory, read it off a disk, patch the page table, such that it 87 00:07:08,244 --> 00:07:14,312 now points at the correct place in memory and then re-execute the same instructions. 88 00:07:14,312 --> 00:07:19,946 So return from interrupt at the instruction that took the original page 89 00:07:19,946 --> 00:07:25,933 fault. So this turns into if we put this all 90 00:07:25,933 --> 00:07:30,581 together, we end up with our, modern virtual memory system here. 91 00:07:30,581 --> 00:07:33,880 And the modern virtual memory system allows. 92 00:07:35,048 --> 00:07:38,751 Either memory to be in memory or memory to be on disk. 93 00:07:38,751 --> 00:07:44,168 And one of the, the cool ideas that people came up with is some fancier ways to 94 00:07:44,168 --> 00:07:47,665 manage this. So we end up with what's called demand 95 00:07:47,665 --> 00:07:50,477 paging. So in demand paging, we don't load 96 00:07:50,477 --> 00:07:55,780 anything when we start a program. If you go run a program on Linux. 97 00:07:56,040 --> 00:08:01,090 Believe it or not, all it does it maps the first page of your executable and just 98 00:08:01,090 --> 00:08:05,262 starts you executing. Doesn't pull in the entire executable. 99 00:08:05,262 --> 00:08:10,257 If your executable's five megabytes, it pulls in four kilobytes' worth of that on 100 00:08:10,257 --> 00:08:13,648 x86 and starts you going and just, you, starts executing. 101 00:08:13,648 --> 00:08:18,457 Now what happens is, when you go to, run, let's say you're executing, you execute 102 00:08:18,457 --> 00:08:22,280 off that four kilobytes, you end up in this page fault handler. 103 00:08:22,280 --> 00:08:26,102 Now the OS has a, a table, It might store it inside of the page 104 00:08:26,102 --> 00:08:29,000 table, or it might store it in a side structure. 105 00:08:29,000 --> 00:08:32,700 It knows, oh, well, the rest of the executable's still on disk. 106 00:08:33,260 --> 00:08:38,987 So it goes and it grabs that bit off disk, and puts in the memory, maps the page and 107 00:08:38,987 --> 00:08:45,030 restarts the load, the mist, we'll say. And this works not only for portions of 108 00:08:45,030 --> 00:08:49,930 the executables that are on disk but this can also work for things like your heap. 109 00:08:49,930 --> 00:08:54,830 So if you go and try to access off the end of your heap and the OS knows that you 110 00:08:54,830 --> 00:08:59,790 actually allocated that space it can just actually create new space for you and go 111 00:08:59,790 --> 00:09:04,630 find some page in memory for that space. So, believe it or not, on a modern day x86 112 00:09:04,630 --> 00:09:09,411 processor running Linux, when you go to call Mallock and you allocate a bunch of 113 00:09:09,411 --> 00:09:12,160 space, that space does not get created for you. 114 00:09:12,480 --> 00:09:16,269 Nothing happens. This is why Mallock happens really, really 115 00:09:16,269 --> 00:09:18,892 fast. The page does not get created until you 116 00:09:18,892 --> 00:09:22,390 first touch that page. So it's just smoke and mirrors in the 117 00:09:22,390 --> 00:09:24,839 meantime. And this is called demand paging. 118 00:09:24,839 --> 00:09:29,386 So during that, that first time when you call Mallock, it just does nothing. 119 00:09:29,386 --> 00:09:33,991 It just, the OS keeps track that, oh, it should have given you some memory, but it 120 00:09:33,991 --> 00:09:37,221 didn't. And then, when you go to actually go touch 121 00:09:37,221 --> 00:09:39,898 the piece of memory that it was supposed to give you. 122 00:09:39,898 --> 00:09:42,171 At that point, it goes, creates memory for you. 123 00:09:42,171 --> 00:09:45,807 And the reason it does this is because it decreases the memory pressure. 124 00:09:45,807 --> 00:09:49,848 You can actually run more programs if the programs are not executing the memory. 125 00:09:49,848 --> 00:09:54,040 So let's say that you have a program which is, allocates Mallocks a lot of space. 126 00:09:54,040 --> 00:09:55,960 But the does not go to access it. Well, 127 00:09:56,540 --> 00:09:59,979 It didn't have to go allocate memory for you. 128 00:09:59,979 --> 00:10:06,094 This causes some challenges on the flip side though of all the sudden let's say 129 00:10:06,094 --> 00:10:10,680 you have a, a hundred programs running and they all Mallock, 130 00:10:11,420 --> 00:10:15,230 I don't know, A gigabyte of space that I'll call Mallock 131 00:10:15,230 --> 00:10:20,183 of a gigabyte. The OS would just say yes, yes, yes, yes, 132 00:10:20,183 --> 00:10:24,128 to everybody. And then everyone goes and tries and 133 00:10:24,128 --> 00:10:29,100 access that gigabyte of space, but your machine only has let's say, mm, 134 00:10:29,100 --> 00:10:32,902 Four gigabytes of memory. But all of a sudden, it promised a 135 00:10:32,902 --> 00:10:35,990 thousand gigabytes of memory. Well, when that goes to, when your 136 00:10:36,618 --> 00:10:40,490 programs go to actually go access that thousand gigabytes or that terabyte of 137 00:10:40,490 --> 00:10:44,258 memory which it doesn't actually have, At that point, one of them's going to 138 00:10:44,258 --> 00:10:47,084 crash, basically, and you'll get an out-of-memory error. 139 00:10:47,084 --> 00:10:51,480 So this is the flip side of this demand paging scheme is that your OS has to be 140 00:10:51,480 --> 00:10:54,620 very careful not to run out of memory. But sometimes it does. 141 00:10:55,529 --> 00:11:01,749 So we're going to lets stop here for today and next time we're going to finish up 142 00:11:01,749 --> 00:11:06,001 this lecture and move, move on to some other topics. 143 00:11:06,001 --> 00:11:11,040 But I just wanted to. I think we're running a little bit late 144 00:11:11,040 --> 00:11:14,819 today. But I next time we're going to talk about 145 00:11:14,819 --> 00:11:21,447 how to integrate translation lookaside buffers with caches, and how to integrate 146 00:11:21,447 --> 00:11:29,735 them into the pipeline. So there's a lot of complexities in doing 147 00:11:29,735 --> 00:11:32,863 that. It's very possible that you don't want to 148 00:11:32,863 --> 00:11:37,122 actually access the TLB first before you go to access the cache. 149 00:11:37,122 --> 00:11:42,512 And we're going to talk about how to do that and the main reason you don't want to 150 00:11:42,512 --> 00:11:45,640 do that is it increases your time through here. 151 00:11:47,012 --> 00:11:49,420 Okay let's stop here for today.