1 00:00:03,039 --> 00:00:08,001 So, now that we've talked about these different memory technologies. 2 00:00:08,061 --> 00:00:13,065 Let's move on in talking about an introduction to caches, and why we have 3 00:00:13,085 --> 00:00:18,052 caches, and where this really comes from. From sort of, first principle's 4 00:00:18,052 --> 00:00:22,041 perspective. And, to start off motivating caches, let's 5 00:00:22,041 --> 00:00:25,058 draw the most basic memory system we can draw. 6 00:00:25,058 --> 00:00:30,310 So, on our abstract computer model, we have a processor and it connects to 7 00:00:30,310 --> 00:00:35,030 somewhere to store our data and our instructions, or our main memory. 8 00:00:35,030 --> 00:00:42,017 And, you know, the, the time it takes to actually access something like our main 9 00:00:42,017 --> 00:00:45,073 memory can many times be much, much larger than our cycle time. 10 00:00:45,073 --> 00:00:49,666 Or the latency can be, can be much longer. So, that D-RAM I passed around, you know, 11 00:00:49,666 --> 00:00:54,709 to get, to get a bite out of, or to get, like, the first bite out of something like 12 00:00:54,709 --> 00:00:57,745 a modern day's D-RAM. You're basically, it's going to be 13 00:00:57,745 --> 00:01:01,416 thousands of instructions. If you, if you, or probably not. 14 00:01:01,416 --> 00:01:06,583 It's probably right around a thousand actually, in a modern day computer. 15 00:01:06,583 --> 00:01:11,595 It's gotten a little bit better as we'll see on the next graph for, for a funny 16 00:01:11,595 --> 00:01:14,563 reason. But, and, and the bandwidth we're going to 17 00:01:14,563 --> 00:01:18,070 define as the, sort of, number of accesses per unit time. 18 00:01:18,070 --> 00:01:23,233 So, as I said, for the close in, arrays, you can usually access them much faster 19 00:01:23,233 --> 00:01:26,810 and have more bandwidth through them, sort of, by definition. 20 00:01:26,810 --> 00:01:31,111 But, for things far away, you have to, to have less bandwidth, kind of by definition 21 00:01:31,111 --> 00:01:35,337 cuz it should probably go through, through some pins, or some buses that go very far 22 00:01:35,337 --> 00:01:37,530 away. And then, we're going to define the 23 00:01:37,530 --> 00:01:42,928 bandwidth delayed product as the amount of data that can be in flight at the same 24 00:01:42,928 --> 00:01:45,695 time. And, as you have sort of things farther 25 00:01:45,695 --> 00:01:50,300 away and more bandwidth, this bandwidth delayed product gets bigger cuz you're 26 00:01:50,300 --> 00:01:54,889 multiplying, sort of, the amount of stuff the amount of bandwidth you have times 27 00:01:54,889 --> 00:01:59,050 where everything can be in the pipe. And you've, basically, have a, a big 28 00:01:59,050 --> 00:02:02,018 pipeline that serve thousands of values in flight. 29 00:02:04,006 --> 00:02:11,055 So, this is what I was eluding to about the, the gap happening between D-RAM, 30 00:02:11,055 --> 00:02:17,476 bandwidth, and processor performance. So, memory technology has been, sort of, 31 00:02:17,476 --> 00:02:22,062 slowly getting faster, and processor technology got a lot faster. 32 00:02:22,062 --> 00:02:27,025 The one thing that, at least, not so bad anymore for its, for a single scalar or, 33 00:02:27,058 --> 00:02:31,076 for single-processor performance we've sort of flattened out at this point. 34 00:02:31,076 --> 00:02:35,088 So, [laugh] now RAM is getting faster. Unfortunately, you know, this is still 35 00:02:35,088 --> 00:02:39,044 keep going exponentially here if you sort of look at multicore. 36 00:02:39,044 --> 00:02:43,040 If you have multiple cores in the same chip, you know, the, the amount of 37 00:02:43,040 --> 00:02:47,085 aggregate processor performance is, is going up relative to the very performance 38 00:02:47,085 --> 00:02:50,861 and, and things get worse. So, here we have a four issue, two 39 00:02:50,861 --> 00:02:56,099 gigahertz superscalar accessing 100 nano a second mem, memory, D-RAM, you know, can 40 00:02:56,099 --> 00:03:00,878 execute 800 instructions, during the time to access that one memory reference. 41 00:03:00,878 --> 00:03:06,804 And, you know, we have things faster than two gigahertz now, and we also have slower 42 00:03:06,804 --> 00:03:11,679 memories and we, we, you know, depending what memory system you put it in, it can, 43 00:03:11,679 --> 00:03:16,534 you know, some, some of them are roughly about a thousand instructions. 44 00:03:17,472 --> 00:03:19,945 Okay. So, what's, what's the important first 45 00:03:19,945 --> 00:03:22,610 order effect when we're looking at memory systems? 46 00:03:22,610 --> 00:03:26,142 Well, size does matter. As you have larger and larger memory 47 00:03:26,142 --> 00:03:30,188 systems, it just takes longer time. There's more capacity, and if you have to 48 00:03:30,188 --> 00:03:34,759 drive and you have to go physically farther, and you can't fundamentally fight 49 00:03:34,759 --> 00:03:38,014 the speed of light. It's going to take you a certain amount of 50 00:03:38,014 --> 00:03:41,077 time to go a certain distance. And, if that distance is farther, it's 51 00:03:41,077 --> 00:03:45,078 going to take you longer. So, you also have sort of, more fan out if 52 00:03:45,078 --> 00:03:49,051 you have bigger, bigger memory systems. So, physical size is, is important here 53 00:03:49,051 --> 00:03:53,623 when we're thinking about memory systems. Okay. 54 00:03:53,623 --> 00:04:00,980 So, let's now introduce a cache, or idea of why, why a cache is, is good. 55 00:04:00,980 --> 00:04:06,393 So, let's look at a notion of memory hierarchy. 56 00:04:06,393 --> 00:04:13,376 Unlike this picture where we had a processor connected to all of memory in 57 00:04:13,376 --> 00:04:18,077 the system, we're now going to put some intermediary levels. 58 00:04:19,036 --> 00:04:24,519 So, we have a processor, and we're going to put small-fast memories here. 59 00:04:24,519 --> 00:04:30,649 And we're hope, hopefully, going to try to hit the, or access these small and fast 60 00:04:30,649 --> 00:04:34,782 memories, instead of accessing this big slow D-RAM. 61 00:04:34,782 --> 00:04:45,018 Let's look at the, the capacity here. The registers, so your [unknown] registers 62 00:04:45,018 --> 00:04:50,032 or registers of whole bus, your program kind of specialize registers in your 63 00:04:50,032 --> 00:04:55,019 processor, has very small capacity. S-ram has more capacity, D-RAM has even 64 00:04:55,019 --> 00:04:58,396 more capacity. And, its sort of double, less than there 65 00:04:58,396 --> 00:05:03,411 to extenuate that you can fill a lot more in the same space. 66 00:05:03,411 --> 00:05:09,258 Though the latency follows that same hierarchy there. 67 00:05:09,258 --> 00:05:14,008 And, an important other thing to think about when you're building memory 68 00:05:14,008 --> 00:05:17,852 hierarchies is that on-chip memory bandwidth is typically a lot higher than 69 00:05:17,852 --> 00:05:20,549 off-chip memory bandwidth. If you're on-chip, you can run wires 70 00:05:20,549 --> 00:05:24,468 there, but the second you have to go off-chip, you have to go out the pins of 71 00:05:24,468 --> 00:05:26,595 the chip. You're going to be limited. 72 00:05:26,595 --> 00:05:31,341 And, what this really boils down to is the pins, or balls, or solder columns. 73 00:05:31,341 --> 00:05:36,926 So, modern day chips we don't actually have pins on them any more, instead we 74 00:05:36,926 --> 00:05:41,010 have ball grid array, what is little tiny solder balls. 75 00:05:41,010 --> 00:05:46,512 The size of those, the pitch of those is still much larger than the wire-on chip. 76 00:05:46,512 --> 00:05:51,592 The wire on chip is so much smaller than the solder ball, which is, or in 77 00:05:51,592 --> 00:05:57,649 modern-day actually, that's something like your x86 core i5 or something like that. 78 00:05:57,649 --> 00:06:01,835 Or, Intel core i5 actually has, what's called land grid array. 79 00:06:01,835 --> 00:06:07,720 Where you have a little array of little, the pads, little metal pads on the bottom 80 00:06:07,720 --> 00:06:11,061 of the chip. And then, there's these little pins are 81 00:06:11,061 --> 00:06:14,837 actually located in the socket on little, we call them, pogo sticks. 82 00:06:14,837 --> 00:06:19,512 But, they're basically little pieces of metal on springs and you put the chip in 83 00:06:19,512 --> 00:06:24,013 and then the little pogo stick hits the piece of metal and that's the contact. 84 00:06:24,013 --> 00:06:27,669 On older chips, you actually had the pins on there, on the chip itself, and these 85 00:06:27,669 --> 00:06:32,037 are sort of, put it into a big socket. And, in the middle is ball grid array. 86 00:06:32,037 --> 00:06:36,349 But, anyway, all those things, the connection technology to get off the chip 87 00:06:36,349 --> 00:06:39,725 is much larger. So, you can fit fewer of them on the same 88 00:06:39,725 --> 00:06:44,099 area than if you were to wire on chip. And that's really this, this insight here 89 00:06:44,099 --> 00:06:48,081 that on chip bandwidth is a lot larger than off chip bandwidth. 90 00:06:48,081 --> 00:06:53,310 And because of this, you're going to want to think about putting small, fast memory 91 00:06:53,310 --> 00:06:56,093 close by on chip. We can store, store data. 92 00:06:56,093 --> 00:07:00,044 Okay. So, what happens in this memory hierarchy? 93 00:07:00,044 --> 00:07:06,187 When we go to access data, if the data is in our fast memory here, we're going to go 94 00:07:06,187 --> 00:07:10,080 here, hit in our low latency, unscramble latency and come back. 95 00:07:10,080 --> 00:07:16,325 If the data is not in the fast memory, we have to go all the way out to our D-RAM 96 00:07:16,325 --> 00:07:23,706 and come back. And, memory hierarchy's work only if 97 00:07:23,706 --> 00:07:32,136 you're going to find useful data here. So, if you go build a small fast memory 98 00:07:32,136 --> 00:07:37,251 close by, but you never find any information in there that you need, why 99 00:07:37,251 --> 00:07:41,090 did you build this structure? You're just sort of doing extra work and 100 00:07:41,090 --> 00:07:44,826 burning extra power. And, this is one of the key insights of 101 00:07:44,826 --> 00:07:50,019 why we have caches, or what a cache is. Is that, when we go to build a memory 102 00:07:50,019 --> 00:07:54,158 hierarchy like this and we introduce something like a cache, by finding the 103 00:07:54,158 --> 00:07:57,578 data nearby, that's going to be faster to access. 104 00:07:57,578 --> 00:08:00,769 We need to find the data there with high probability. 105 00:08:00,769 --> 00:08:05,078 Otherwise, we should just not build it and just gone out to the main memory all time, 106 00:08:05,078 --> 00:08:07,804 you're just wasting everyone's time and area, and cost. 107 00:08:07,804 --> 00:08:13,548 So, memory hierarchy only works if you've, your fast memory actually is, is useful. 108 00:08:13,548 --> 00:08:17,896 Okay. So, let's look at different memory access 109 00:08:17,896 --> 00:08:27,323 patterns and try to come up with a couple of different techniques that we can 110 00:08:27,323 --> 00:08:33,504 exploit to increase the probability that the data we're going to need is in that 111 00:08:33,504 --> 00:08:38,345 small fast memory. So, we have an interesting graph here, and 112 00:08:38,345 --> 00:08:43,073 I want to explain what this is. And we'll, we'll look at a, a toy example 113 00:08:43,073 --> 00:08:47,451 first and then we're going to look at a memory trace, refer, memory reference 114 00:08:47,451 --> 00:08:52,071 trace from a real operating system with a real program running under it. 115 00:08:52,071 --> 00:08:56,025 So, let's start off with, something, some examples here. 116 00:08:57,037 --> 00:09:02,070 We have time on the horizontal axis, and we have address on the vertical axis. 117 00:09:02,070 --> 00:09:07,090 And, it's something like your machine, let's say, it has 4GB of RAM. 118 00:09:07,090 --> 00:09:12,459 We actually have sort of a line across here for every single address in the 119 00:09:12,459 --> 00:09:16,908 machine, and then we have time. And, as the program executes and goes and 120 00:09:16,908 --> 00:09:22,050 accesses different pieces of memory, we're going to put a dot on this graph. 121 00:09:23,072 --> 00:09:28,007 So, let's start off by looking at something like an instruction fetch. 122 00:09:28,007 --> 00:09:33,030 Typically, programs will execute in order. They'll execute the first instruction, the 123 00:09:33,030 --> 00:09:37,059 next instruction, and the third instruction at the addresses, sort of, 124 00:09:37,059 --> 00:09:40,068 monotonically increasing, unless you hit a branch. 125 00:09:40,068 --> 00:09:44,015 So, let's, let's take a look at some instruction fetches. 126 00:09:44,015 --> 00:09:47,933 We start off by executing some code. And then, we're going to actually execute 127 00:09:47,933 --> 00:09:49,966 a loop here. So, when we're in the loop, we're going to 128 00:09:49,966 --> 00:09:53,035 be executing the same piece of code over and over again. 129 00:09:53,035 --> 00:09:57,001 And if you look at that, from an address perspective, we start off, not in the 130 00:09:57,001 --> 00:09:58,089 loop. So, we have some address here, some 131 00:09:58,089 --> 00:10:02,074 different address, and then we execute these three addresses, and then we execute 132 00:10:02,074 --> 00:10:05,564 those three addresses again, and those three addresses again, those three 133 00:10:05,564 --> 00:10:09,412 addresses again. We keep fetching those three addresses 134 00:10:09,412 --> 00:10:12,019 over and over again from our memory system. 135 00:10:12,019 --> 00:10:14,724 And, at some point, we exit the loop and go execute something else. 136 00:10:17,059 --> 00:10:20,021 Okay. There's a pattern there. 137 00:10:20,021 --> 00:10:24,063 Now, we're going to think about whether we can exploit that pattern in a second. 138 00:10:25,025 --> 00:10:28,270 Next, let's talk about accessing the stack. 139 00:10:28,270 --> 00:10:31,507 So, computer programs typically have stacks. 140 00:10:31,507 --> 00:10:34,922 This is where you either spill or fill registers for. 141 00:10:34,922 --> 00:10:37,707 Typically, you also use it to pass arguments. 142 00:10:37,707 --> 00:10:42,859 And you spill and fill, most modern day compilers will do that, when you make a 143 00:10:42,859 --> 00:10:46,038 function call, or return from a function call. 144 00:10:46,038 --> 00:10:49,026 And the argument passing is around function calls, also. 145 00:10:49,026 --> 00:10:53,039 You'll put arguments that the, yet, the later functions need on, on the stack, and 146 00:10:53,039 --> 00:10:57,004 then you go execute that. Alternatively, you can store the arguments 147 00:10:57,004 --> 00:10:59,548 in the registers. But, on something like x86 will say, 148 00:10:59,548 --> 00:11:03,541 you're going to put arguments onto the stack, and you also put local variables 149 00:11:03,541 --> 00:11:07,050 onto the stack cuz, 'uz, because it gives you a convenient place to put it. 150 00:11:07,077 --> 00:11:11,088 So, let's take a look at what happens when you go to do a subroutine call. 151 00:11:11,088 --> 00:11:14,070 On a subroutine call, the stack is at one address. 152 00:11:14,070 --> 00:11:18,081 The stack pointer is somewhere here. And, when you go do a subroutine call, 153 00:11:18,081 --> 00:11:21,091 you're going to push all of your registers onto the stack. 154 00:11:21,091 --> 00:11:25,793 So, you're going to put something here, put something here, put something here, 155 00:11:25,793 --> 00:11:29,762 put something here, and the stack pointer is going to logically increase. 156 00:11:29,762 --> 00:11:35,255 And then, once you're in the function that you called, you're going to want to 157 00:11:35,255 --> 00:11:38,663 accsss, access, the arguments. And, those are usually very close 158 00:11:38,663 --> 00:11:42,612 together, they are all right next together on a stack, or you might have to go access 159 00:11:42,612 --> 00:11:45,746 the argument over and over again. You know, this is, if you don't have 160 00:11:45,746 --> 00:11:49,391 enough registers it's like at our first lecture we looked at a stack based 161 00:11:49,391 --> 00:11:52,053 architecture. And, in a stack based architecture, we had 162 00:11:52,053 --> 00:11:55,790 to actually push onto our stack the same value multiple times. 163 00:11:55,790 --> 00:12:00,158 It's pretty common to reuse the same local variable or reuse the same argument, just 164 00:12:00,158 --> 00:12:03,732 cuz you don't have enough registers. And, you want to go load that data from 165 00:12:03,732 --> 00:12:07,260 your stack over and over again. So, it's pretty common to access very 166 00:12:07,260 --> 00:12:11,269 close together on the stack, to be probably the same cache, within a cache 167 00:12:11,269 --> 00:12:15,340 line, or access the data very close by. And then, when you're done, you return 168 00:12:15,340 --> 00:12:19,840 from subroutine and you're going to pop all the data off the stack. 169 00:12:19,840 --> 00:12:26,523 So, time is increasing this way, so you're going to move in reverse order. 170 00:12:26,523 --> 00:12:29,802 Okay. Let's look at some data axises. 171 00:12:29,802 --> 00:12:34,510 The one I wanted to focus on here is actually vector axises. 172 00:12:34,510 --> 00:12:38,066 So, what's a vector axis? Well, this is something like an array. 173 00:12:38,066 --> 00:12:42,537 And arrays, a lot of times you'll be reading the first word, or the first byte, 174 00:12:42,537 --> 00:12:46,700 then read the next byte, the you read the next byte, then you're doing some 175 00:12:46,700 --> 00:12:50,403 operation across all of it. So, let's say, we're doing a matrix 176 00:12:50,403 --> 00:12:53,253 addition. We have one matrix and we're adding it to 177 00:12:53,253 --> 00:12:56,343 another matrix, and they're single dimensional matrices. 178 00:12:56,832 --> 00:13:01,546 You're actually going to be reading different addresses but they're going to 179 00:13:01,546 --> 00:13:05,013 be in some pattern, and they're going to be sort of close by. 180 00:13:05,013 --> 00:13:09,028 Now, if you're accessing, let's say, every fourth element in an array, this distance 181 00:13:09,028 --> 00:13:12,524 here would get larger. The, the address distance is going to get 182 00:13:12,524 --> 00:13:17,684 a little bit large and your access pattern is going to get a little bit farther a 183 00:13:17,684 --> 00:13:20,791 part. And, scalar values, these are sort of 184 00:13:20,791 --> 00:13:24,835 singleton values. So, a good example here is, let's say, 185 00:13:24,835 --> 00:13:29,695 this array that we're doing here is we're trying to take an array. 186 00:13:29,695 --> 00:13:34,942 And, instead of add two arrays together, we're going to take one array and we're 187 00:13:34,942 --> 00:13:39,905 going to add five to every single, or some variable, x which has we'll say, five in 188 00:13:39,905 --> 00:13:46,322 it, to every single element in the array. So, what your machine would, your 189 00:13:46,322 --> 00:13:51,685 processor would do is it would first load the thing you're going to be adding to. 190 00:13:51,685 --> 00:13:56,373 So, the, the data from the matrix, and then add, and then load from the scalar 191 00:13:56,373 --> 00:13:59,948 value, x, from main memory. Do the add and do store. 192 00:13:59,948 --> 00:14:05,539 Then, your load to the next location of the vector, load the file, load the x, do 193 00:14:05,539 --> 00:14:08,557 the add, put it back, and then continue that on. 194 00:14:08,557 --> 00:14:11,083 Okay. So, can we, can we try to exploit 195 00:14:11,083 --> 00:14:16,056 different types of vocality? And this slide is really important to 196 00:14:16,056 --> 00:14:21,094 grasp what's going on in caches. This is probably one of the most important 197 00:14:21,094 --> 00:14:26,314 slide in, in today's lecture, is to, to grasp what is going on here with the 198 00:14:26,314 --> 00:14:31,084 different patterns we're seeing, and how do we exploit these patterns. 199 00:14:32,074 --> 00:14:36,052 Okay. So, there's two type of patterns I want to 200 00:14:36,052 --> 00:14:40,265 talk about. And, your standard cache system is going 201 00:14:40,265 --> 00:14:45,464 to exploit these two patterns. But, let's first of all, look at them, 202 00:14:45,464 --> 00:14:50,045 them individually. And, we'll be able to see the patterns 203 00:14:51,010 --> 00:14:55,099 here. First one we're going to talk about is 204 00:14:55,099 --> 00:15:01,787 temporal locality. Temporal locality is this notion that if 205 00:15:01,787 --> 00:15:08,234 you access a piece of data, you're likely to access that piece of data again, in the 206 00:15:08,234 --> 00:15:13,466 not too distant future. So, it might be useful to keep that data 207 00:15:13,466 --> 00:15:17,062 nearby. So, temporal locality I'll say, again, is 208 00:15:17,062 --> 00:15:23,353 if they, if your location is referenced, it is likely to be referenced again in a 209 00:15:23,353 --> 00:15:28,603 short period of time. So, let's try to identify that in these 210 00:15:28,603 --> 00:15:32,387 diagrams so far. Okay, well, I see that a few places. 211 00:15:32,387 --> 00:15:37,280 These are giving access, we access some value, and we're going to access that 212 00:15:37,280 --> 00:15:41,535 value again pretty soon, and again pretty soon, and again pretty soon, and again 213 00:15:41,535 --> 00:15:44,990 pretty soon. And, we're basically, accessing something 214 00:15:44,990 --> 00:15:49,075 with high temporal locality. Likewise, these scalar values here, this 215 00:15:49,075 --> 00:15:53,186 scalar axis we're accessing again, and again, and again, it's this exact same 216 00:15:53,186 --> 00:15:56,329 address. And, the time between one axis and the 217 00:15:56,329 --> 00:16:00,561 next axis is pretty, pretty close by. Okay. 218 00:16:00,561 --> 00:16:10,254 Now, let's look at this and see other, other interesting access patterns here. 219 00:16:10,254 --> 00:16:18,051 Well, interestingly, on a stack, when we go to access a value and we're pushing our 220 00:16:18,051 --> 00:16:26,768 values onto a stack, it's very common. To access a value at say, address 1,000, 221 00:16:26,768 --> 00:16:33,745 and then go access a 1,000 + four, 1,000 + eight, 1,000 + twelve, 1,000 + sixteen, 222 00:16:33,752 --> 00:16:41,091 and access all of the nearby values. So, what are we going to call that? 223 00:16:41,091 --> 00:16:46,037 Is that a, is that a common pattern? Well, yeah, it's, it's pretty common and 224 00:16:46,037 --> 00:16:49,079 we, we see that here, we see it in loops for instructions. 225 00:16:49,079 --> 00:16:54,506 We're actually accessing one instruction, it's very common for us to go access later 226 00:16:54,506 --> 00:16:59,193 instructions very close by. Packed, let's say, next to it so this is 227 00:16:59,193 --> 00:17:02,081 pretty common that you just fall through in your code. 228 00:17:02,081 --> 00:17:06,036 So, we're going to define this and call this, spatial locality. 229 00:17:06,036 --> 00:17:11,021 And, spatial locality says, that if a location is referenced, it is likely that 230 00:17:11,021 --> 00:17:18,077 locations nearby it are going to be referenced in the near future. 231 00:17:19,018 --> 00:17:23,036 And these are both just heuristics. There's nothing magical about these. 232 00:17:23,036 --> 00:17:26,418 These are just heuristics of how programs happen to work out. 233 00:17:26,418 --> 00:17:30,599 And, but, we're going to build systems that try to leverage these two heuristics, 234 00:17:30,599 --> 00:17:35,163 or these two characteristics rather, is two types of locality to improve the 235 00:17:35,163 --> 00:17:39,658 performance of our processor. So, be able to use those small little RAMs 236 00:17:39,658 --> 00:17:44,513 that we're going to put in our memory hierarchy and actually store useful data 237 00:17:44,513 --> 00:17:47,072 in them. I also wanted to sort of point out that 238 00:17:47,072 --> 00:17:50,528 you can have both spatial and temporal locality mixed together. 239 00:17:50,528 --> 00:17:55,542 So, if we look at something like running a program which has lots of loops, we have 240 00:17:55,542 --> 00:17:59,397 spatial locality, because we're executing instructions nearby. 241 00:17:59,397 --> 00:18:04,581 But, we also have temporal locality here, because we're reusing all this data again, 242 00:18:04,581 --> 00:18:07,397 and again, and again in a short period of time. 243 00:18:07,397 --> 00:18:12,657 So, in the near future, we are accessing sort, of the same, same data over and over 244 00:18:12,657 --> 00:18:15,632 again. Even though it's not the exact same piece 245 00:18:15,632 --> 00:18:20,397 of data, it's, it's nearby, so we're sort of seeing both spatial and temporal 246 00:18:20,397 --> 00:18:22,442 locality exhibited there. Okay. 247 00:18:22,442 --> 00:18:28,002 So, let's look at, finally at a real memory reference pattern or real memory 248 00:18:28,002 --> 00:18:31,074 reference trace. And this has the same axises of what we 249 00:18:31,074 --> 00:18:34,653 saw before. We have time, and there's one dot per 250 00:18:34,653 --> 00:18:39,886 axis, and then we have memory addresses. I don't know what the unit on the vertical 251 00:18:39,886 --> 00:18:45,636 is, it's probably not, it's probably in like thousands or millions or something 252 00:18:45,636 --> 00:18:49,039 like that. And this is from a study that some people 253 00:18:49,039 --> 00:18:53,211 at IBM did many years ago. And, but we're going to look at this and 254 00:18:53,211 --> 00:19:00,010 see different patterns show up here. So, the first thing I want to circle here 255 00:19:00,010 --> 00:19:06,004 is spacial locality. So, we see that along this line here, we 256 00:19:06,004 --> 00:19:12,008 access the first piece of data at a, let's say, this address, and we then go on and 257 00:19:12,008 --> 00:19:17,082 access all of the data subsequently. So, this is as if we were like reading an 258 00:19:17,082 --> 00:19:21,033 array. We just read all the values in the array. 259 00:19:21,033 --> 00:19:26,636 So, we have spacial locality that because we went and accessed element one in the 260 00:19:26,636 --> 00:19:32,504 array, there's a, a high probability that we're going to go access other data values 261 00:19:32,504 --> 00:19:37,027 which are nearby in that array, in a, in a nearby time. 262 00:19:37,074 --> 00:19:42,034 Let's take a look at temporal locality. So, horizontal lines here are temporal 263 00:19:42,034 --> 00:19:44,864 locality. And, you can see that, this program is 264 00:19:44,864 --> 00:19:49,765 accessing this value over and over again. Or, at least, values very close by, it's 265 00:19:49,765 --> 00:19:54,399 hard to tell on this resolution, but this is temporal locality that we're accessing 266 00:19:54,399 --> 00:19:59,029 the same values or, or mostly the same values over, and over, and over again. 267 00:19:59,029 --> 00:20:04,480 And then, finally, we can see both temporal and spatial locality in these 268 00:20:04,480 --> 00:20:09,011 little blocks here. We can see that we're accessing similar 269 00:20:09,011 --> 00:20:14,884 sorts of addresses which is going to be spatial locality, and we're doing it all 270 00:20:14,884 --> 00:20:20,051 within a short period of time. So, we're, we're seeing temporal locality 271 00:20:20,051 --> 00:20:30,005 of reference. And, caches can exploit both of these. 272 00:20:30,094 --> 00:20:38,059 Caches are going to exploit temporal locality by keeping data that gets used 273 00:20:38,059 --> 00:20:44,030 and reused many, many times, nearby. So, when you see axises which are 274 00:20:44,030 --> 00:20:49,571 temporally, close by or, or, temporally spaced so in time or space nearby, what's 275 00:20:49,571 --> 00:20:55,369 going to happen is a cache is going to sit somewhere here before you get to a main 276 00:20:55,369 --> 00:20:58,882 memory system. And, when you go to access a value, it's 277 00:20:58,882 --> 00:21:02,300 going to memorize that and keep that value nearby. 278 00:21:02,300 --> 00:21:07,916 So, if you go to access it again, in time, and if your program exhibits temporal 279 00:21:07,916 --> 00:21:13,736 locality, that data value will be close by and you won't have to go out to the big 280 00:21:13,736 --> 00:21:18,567 slow RAM. And, caches can exploit spatial locality. 281 00:21:18,567 --> 00:21:25,276 And, the way caches go about exploiting spacial locality is that, when you go to 282 00:21:25,276 --> 00:21:31,473 access a particular piece of data, your cache is going to pull in data around that 283 00:21:31,473 --> 00:21:36,000 piece of data. And, we'll talk more about this next time 284 00:21:36,000 --> 00:21:39,550 how, how that works. But, we're going to segment data into what 285 00:21:39,550 --> 00:21:45,750 we call, cache blocks, or cache lines. And, that's typically, those, those cache 286 00:21:45,750 --> 00:21:50,297 blocks or cache lines are aligned. And, when you go to access a piece if 287 00:21:50,297 --> 00:21:53,884 data, it might be somewhere at the beginning, or at the end. 288 00:21:53,884 --> 00:21:57,940 But, or might be in the middle, you pull in all the data around that. 289 00:21:57,940 --> 00:22:02,564 And the, the heuristic here is that if you access some piece of data, let's say, you 290 00:22:02,564 --> 00:22:07,547 only go access the first byte in a cache line, you're likely to go access other 291 00:22:07,547 --> 00:22:11,069 bytes in a block. So, multiple, pieces of data around that. 292 00:22:11,069 --> 00:22:27,077 So, we are able to exploit spatial locality here.