1 00:00:03,260 --> 00:00:07,345 Okay. So, we'll, we'll get started, looks like we have a quorum today. 2 00:00:07,531 --> 00:00:12,545 So we're going to be continuing wrapping up what we were talking about last time 3 00:00:12,545 --> 00:00:15,950 which was about address translation and virtual memory. 4 00:00:16,136 --> 00:00:19,293 So we have like about two or three slides leftover. 5 00:00:19,293 --> 00:00:25,212 And we mostly what we were trying to talk about was how address translation and 6 00:00:25,212 --> 00:00:28,526 virtual memory influences the design of caches. 7 00:00:28,526 --> 00:00:34,661 Let's, let's go back and briefly recapture what we were talking about last time just 8 00:00:34,872 --> 00:00:44,598 put it back in everyone's memory. So, we're talking about memory management 9 00:00:44,598 --> 00:00:46,946 and different ways to do memory management. 10 00:00:46,946 --> 00:00:51,259 And we had talked, and, and some of these orthogonal concepts and functions that 11 00:00:51,259 --> 00:00:54,862 memory management's trying to do. Mainly, we're trying to translate 12 00:00:54,862 --> 00:00:57,975 addresses so that we can move around data and remap data. 13 00:00:57,975 --> 00:01:02,342 We were talking about protection, which would allow us to run multiple different 14 00:01:02,342 --> 00:01:06,820 operating systems or multiple different applications at the same time on one chip. 15 00:01:07,620 --> 00:01:12,639 And we also started to talk about virtual memory, but this is kind of where we were 16 00:01:12,639 --> 00:01:16,141 left off last time. We were talking about we talked about 17 00:01:16,141 --> 00:01:19,098 paging and we briefly touched on demand paging. 18 00:01:19,098 --> 00:01:24,005 So, we want to continue on that today and then look at some implications of the 19 00:01:24,005 --> 00:01:29,612 design, issues of virtual memory and the ability to have lots of different address 20 00:01:29,612 --> 00:01:34,200 spaces intermixed at the same time with different mappings, and how they 21 00:01:34,200 --> 00:01:39,424 intermixes with having lots of memory, or very large amounts of memory even though 22 00:01:39,424 --> 00:01:42,674 your system may have a very small amount of memory. 23 00:01:42,674 --> 00:01:46,880 So, it's a very large amount of virtual memory as the name applies. 24 00:01:55,540 --> 00:02:00,224 So, if you go look at a modern day virtual memory system, 25 00:02:00,224 --> 00:02:05,160 So if you're going to run Linux on your desktop these days. 26 00:02:05,160 --> 00:02:09,343 Your desktop, let's say, has a few gigabytes of RAM. 27 00:02:09,343 --> 00:02:13,336 Has some small number, Probably, one, two, three, four, maybe 28 00:02:13,336 --> 00:02:18,545 even six if you have a, a really good desktop of gigabytes of RAM. 29 00:02:18,545 --> 00:02:21,758 And, One of the things you may want to do is 30 00:02:21,758 --> 00:02:23,960 run applications that use more RAM than that. 31 00:02:23,960 --> 00:02:29,506 And, we have this big storage device plugged into our computer which has lots 32 00:02:29,506 --> 00:02:34,202 and lots of storage, namely, our disk. Now, the disk is not the only place you 33 00:02:34,202 --> 00:02:38,678 can actually try to swap memory out over. You can actually use other things, some 34 00:02:38,678 --> 00:02:43,046 people have built virtual memory systems that swap memory across the network, for 35 00:02:43,046 --> 00:02:47,136 instance, to another computer. This is actually found to be fast in the 36 00:02:47,136 --> 00:02:51,167 original network of workstation days. So, this is a project at Berkeley, where 37 00:02:51,167 --> 00:02:55,521 they had lots of computers, lots of old Sun computers on a network and they would 38 00:02:55,521 --> 00:02:59,660 actually able, they're actually able to swap memory across the network to the 39 00:02:59,660 --> 00:03:02,885 neighboring computer and use the neighboring computer's RAM, 40 00:03:02,885 --> 00:03:07,185 Effectively to make the, the RAM size bigger of the machine that they were using 41 00:03:07,185 --> 00:03:08,690 or effectively swap out. Now, 42 00:03:08,690 --> 00:03:13,697 What are the key things to make it all this work is having this notion of demand 43 00:03:13,697 --> 00:03:16,627 paging. And, demand paging, basically, is around 44 00:03:16,627 --> 00:03:21,820 this idea that the operating system only maps a page when it absolutely has to. 45 00:03:21,820 --> 00:03:26,037 And the operating system can also decide to kick things out and put that memory, 46 00:03:26,037 --> 00:03:28,199 let's say, on disk. If the memory is clean, 47 00:03:28,199 --> 00:03:31,679 What I mean by clean is, it's not dirty relative to the process. 48 00:03:31,679 --> 00:03:35,897 So, the processor did not go write any data to that memory and there's the exact 49 00:03:35,897 --> 00:03:39,007 same blocks on disk. When it goes to kick it off, it doesn't 50 00:03:39,007 --> 00:03:42,856 have to go save that dirty data. It can just mark it as saying, oh, it's on 51 00:03:42,856 --> 00:03:46,020 the disk already over there so don't, don't worry about this. 52 00:03:46,020 --> 00:03:50,072 If you go take an operating systems class, you're actually going to implement a 53 00:03:50,072 --> 00:03:53,560 demand paging scheme. I know they do that here at the at 54 00:03:53,560 --> 00:03:56,689 Princeton's operating systems class. So, you can try out some different 55 00:03:56,689 --> 00:04:00,690 algorithms on how to pull data in and out cuz there's different management 56 00:04:00,690 --> 00:04:03,460 algorithms you can use to do, to do this demand paging. 57 00:04:03,740 --> 00:04:07,960 But if we went back and looked at sort of, how address translation works from the 58 00:04:07,960 --> 00:04:11,336 hardware level, because that's what we're focusing on in this course. 59 00:04:11,336 --> 00:04:15,060 You take an address, and you run it through a translation look-aside buffer. 60 00:04:15,360 --> 00:04:18,884 If you have a hit, this is the everything's going well case. 61 00:04:18,884 --> 00:04:23,782 You have a hit, so the address is in your translation look aside buffer cache of the 62 00:04:23,782 --> 00:04:28,022 page table. You also probably want to check some 63 00:04:28,022 --> 00:04:33,265 protection bits, so to see if you can do a read or write, or if the process that is 64 00:04:33,450 --> 00:04:37,213 reading and writing the data is in the correct address space. 65 00:04:37,213 --> 00:04:42,641 So, for instance you don't want to have an application trying to read or write the 66 00:04:42,641 --> 00:04:46,404 address space of the operating system, or something like that. 67 00:04:46,404 --> 00:04:49,512 If, if it's permitte, the translation come out of the TOB, 68 00:04:49,512 --> 00:04:52,029 Everything's good and you just go access the data. 69 00:04:52,029 --> 00:04:55,150 If not, you're going to end up here in this protection fault case. 70 00:04:55,150 --> 00:04:58,522 Now, this protection fault case is going to be in the operating system. 71 00:04:58,522 --> 00:05:02,952 And the operating system will probably try to kill the processor or, or has some sort 72 00:05:02,952 --> 00:05:06,859 of segmentation fault. Alternatively, if you're up here in the, 73 00:05:06,859 --> 00:05:11,274 the TOB look up and, and you take a miss in the TOB, there's lots of interesting 74 00:05:11,274 --> 00:05:13,845 things that start to happen now at this point. 75 00:05:13,845 --> 00:05:18,260 If you have a hardware pagewalker, so what that means is a little finite state 76 00:05:18,260 --> 00:05:22,786 machine which will walk through the page table, walk all they way till' the end of 77 00:05:22,786 --> 00:05:26,140 the page table, find the mapping and install it into the TOB. 78 00:05:26,500 --> 00:05:31,127 This and this is all done in hardware. Hence, it's speckled. 79 00:05:31,127 --> 00:05:36,232 If you are on an architecture or something like Mips or Alpha, or Telera, that's all 80 00:05:36,232 --> 00:05:39,615 done in software. So, this pagewalker and updating the TLB, 81 00:05:39,615 --> 00:05:43,592 is actually done in software. So we'll have some software going, and 82 00:05:43,592 --> 00:05:47,985 walking it, and doing a bunch of memory references in your little piece of 83 00:05:47,985 --> 00:05:51,309 software there. And then, there're some hybrid approaches. 84 00:05:51,487 --> 00:05:56,117 So for instance, in Mips there's some special hardware that helps the software 85 00:05:56,117 --> 00:06:01,224 walk the page table faster. If the memory that you try to go get at is 86 00:06:01,224 --> 00:06:05,827 not in the page table already, then, you sort of follow here in the operating 87 00:06:05,827 --> 00:06:08,916 system line. You take it into op, into the operating 88 00:06:08,916 --> 00:06:12,270 system. The operating system then has to go look 89 00:06:12,270 --> 00:06:16,816 through all its structures and see, oh, is that data on the disk somewhere? Is it, 90 00:06:16,816 --> 00:06:20,624 are you accessing some piece of memory that doesn't actually exist? 91 00:06:20,624 --> 00:06:24,944 If that's the case, then you're actually going to be sort of, going over this 92 00:06:24,944 --> 00:06:30,002 segmentation fault or, or bus error world also cuz you're basically going to try to 93 00:06:30,002 --> 00:06:33,640 do a memory reference to some piece of memory which isn't there. 94 00:06:34,985 --> 00:06:41,480 But, if, let's say, it's on disk or it's in the swap memory or the backing page 95 00:06:41,480 --> 00:06:46,566 cache on the, on the disk, You're basically going to have the OS fill 96 00:06:46,566 --> 00:06:50,165 in everything, fill in the TOB and just return. 97 00:06:50,165 --> 00:06:54,000 And life is, life is all good, And you continue on. 98 00:06:55,800 --> 00:07:00,935 So, now that we have decided that we want something like virtual memory and we've 99 00:07:00,935 --> 00:07:06,196 decided that we want to have translation look aside buffers, let's look at how this 100 00:07:06,196 --> 00:07:09,240 influences the design of our hardware pipelines. 101 00:07:09,840 --> 00:07:17,140 So, here we have addressed translation, shoehorned into a 5-stage pipeline here. 102 00:07:17,680 --> 00:07:23,302 As some of you may notice, Here and here, we just added to the delay 103 00:07:23,302 --> 00:07:26,700 of these stages. We just sort of shoehorned it in. 104 00:07:26,700 --> 00:07:30,930 We didn't add an extra stage, we just sort of put it in there. 105 00:07:30,930 --> 00:07:35,840 And it's serial. Is the naive approach to go do this? 106 00:07:35,840 --> 00:07:39,458 Hmm, well, that has some, has some serious latency considerations. 107 00:07:39,458 --> 00:07:43,899 So, if your instruction memory is on your critical path to your processor and, all 108 00:07:43,899 --> 00:07:48,065 of a sudden, you put something else in serial fit your processor gets slower. 109 00:07:48,065 --> 00:07:52,451 Or if your data cache is on your critical path to your processor and you add 110 00:07:52,451 --> 00:07:56,344 something in that, that's, that's going to also slow down your processor. 111 00:07:56,344 --> 00:08:01,004 So, we want to look at techniques where we can actually move those two structures off 112 00:08:01,004 --> 00:08:04,758 the critical path. Alternatively, we can start to think about 113 00:08:04,758 --> 00:08:09,194 having something where we could pipeline the TOB and the cache, and you have one 114 00:08:09,194 --> 00:08:11,966 stage for TOB look up and one stage for the cache. 115 00:08:12,132 --> 00:08:16,623 It gets a little hard in that, maybe over here in the data, data side of the world, 116 00:08:16,623 --> 00:08:21,502 it gets a little hard because that's going to increase your access time to your data 117 00:08:21,502 --> 00:08:25,937 memory. So, when you do a load, this is going to push out that load an extra cycle 118 00:08:25,937 --> 00:08:30,228 if you put another pipe stage in there. And in the instruction side, this could 119 00:08:30,228 --> 00:08:33,721 also hurt your branches. So, if you add an extra cycle out on the 120 00:08:34,049 --> 00:08:38,579 if you put a pipeline stage something like here, then your PC plus four loop gets a 121 00:08:38,579 --> 00:08:41,908 little bit harder. Not, from a time perspective, it probably 122 00:08:41,908 --> 00:08:46,329 doesn't get harder, but from a if you have a branch perspective, and you take a 123 00:08:46,329 --> 00:08:50,423 branch in there, you're going to add an extra cycle to your branch, mis-predict 124 00:08:50,423 --> 00:08:53,174 latency. And, also, it gets a little bit weird here 125 00:08:53,174 --> 00:08:56,788 that you can access instruction memory effectively in one cycle. 126 00:08:56,788 --> 00:09:01,248 Having said that, it's usually easier to take instruction, to be off the critical 127 00:09:01,248 --> 00:09:05,878 path because you don't really change the high order bits of your instruction that 128 00:09:05,878 --> 00:09:08,362 often. You really don't change that, you only 129 00:09:08,362 --> 00:09:12,653 crosspage boundaries when you do out of the jump or if you fall, happened to fall 130 00:09:12,653 --> 00:09:16,718 through of the end of the page and both those are relatively rare cases. 131 00:09:16,718 --> 00:09:21,347 So, people found that pretty easy sort of, to take that of the critical translation 132 00:09:21,347 --> 00:09:23,928 path. So, we're also going to focus on the data 133 00:09:23,928 --> 00:09:25,320 side of the world here.