We move on to our final topic of ELE 475 which is, directory based cached coherence. It's a little bit of a warm up here. We remember the three Cs of caches. We had compulsory misses, capacity misses, and conflict misses. Well, we're going to think about adding a new miss type here, a coherence miss. So our coherence miss is some other cache, or some other entity, reaching down into our cache and invalidating something there. So this is strictly different then compulsory, capacity and conflict. If anything it looks most closest to a compulsory miss because you're effectively. It's like a first miss, but someone, some other entity bumped it out of your cache. But it's not any of those either. So this communication that's coming from other cores that is been in a snooping protocol or symetric shared memory multi-processor that other traffic comes in and it will actually bump things out of your cache. You need to worry about this. Now we're going to, we're going to take these coherency misses and put them into two categories. True sharing, we, we talked about this briefly at the end of two lectures ago, or three lectures ago. Well we're going to, we're going to categorize this as two different categories of misses. True sharing misses, which we're going to say, is that if were to have a cache where each block length or cache line size is exactly what's a one byte to the minimal size you can have on your machine, you would still have that miss. So that's a true miss. And a true miss is if you're actually sharing data. So if one cache, let's say, writes some data, and another cache wants to go read, or another processor wants to go read that data, and needs to go pull it into its cache, that's a true sharing misses. You need to do that communication. Now, to contrast that, we talk about false sharing, or false sharing misses. And what a false sharing misses is, is saying that if you were to reduce the, sharing size or the block size down to say, one byte or one word [COUGH] and you run the same program and that miss occurs, or the miss occurs when the block size is let's say one, versus or, or excuse me, if the miss occurs in the larger block size versus when the block size is one, then that is actually a false sharing miss. The block size was too big and you had two pieces of data in the same cache line that are effectively causing a sharing, even though there was no true sharing going on of the data. It just so happens they're packed into the same cache line. [COUGH] Now it's a little bit more then that we're also going to say that false sharing can happen when data gets moved around or gets invalidated but it's not being, it may be shared later in the program, but that exact miss was not because of data being communicated. So it's a little bit broader then what we said last time. It, it, it does still happen because there is two piece data packed in to the same line but effectively what I'm, what I'm trying to get out here is you could have data's that bounce around between different caches and the same instruction sequence or the same load in store sequence will not cause then this is if you had a very small cache line size but does happen of a large cache line size. So lets, lets look at a example here and try to categorize these different misses. So let's, let's start off here. The initial conditions are X and X2, or X1 and X2, which are two piece, two words of data, are packed into the same cache block or the same cache line. P1 and P2 have both read the data, and it's, it's, it's readable in both caches at the beginning of time. Okay so all of a sudden we do a right to X1. And we have to, what do we have to do? Well, we're going to have to invalidate. X1 and P2. And this, this is a true miss because the data was in both. We need to pull it out of the one. We need to actually invalidate it. because this is actual, actual real data. Okay. So next thing we do is P2 goes and executes a read of X2. Well, what you may notice here is at the beginning of time X2 is in the cache of P2, but it got bumped out, here. If P1 never want to go right X2 so this is a false sharing mess. This got an exclusive into P1's cache and this is going to pull it out of that cache and invalidate it. So recall, say false miss because X1 was irrelevant to P2 for this, for this miss. Okay, so now we see, another right to X1. Well, P2 didn't actually touch X, X1, so likewise, this is completely false sharing. Now we see a right to X2 well we didn't see any communication going on here. So this is also a false sharing miss. Now finally we have something that is real here. We're going to read X two. When we wrote X two there, we read X two here, we are actually communicating data. So this is true sharing. And that's okay. But we want to try to minimize these, these false sharing patterns. This is just a warm up. Some mo, motivate us into directory based coherence, a little bit. Okay. So let's, let's motivate this a little bit more. And let's look at something like a online transaction processing workload. So this is a database workload. So it's a multiprocessor database workload. It's using threads. And what we're going to see here is we have. We're going to run the same work load on a four processor system with four different cache sizes. This, this data is, taken from a paper from your book. And. What you'll notice is as you increase the cache size, our false sharing and our true sharing don't really change. You still need to communicate data and your still going to get false sharing. Just because you make the cache size bigger, didn't change the block size. You're still going to get the same false sharing patterns. But, as you increase the cache size. The instruction. Misses the capacity misses the conflict misses the cold the, the, the compulsory misses we call that cold go down because the cache is getting bigger, So non-shared cache lines performing the, the, the. [COUGH]. Number of memory cycles. The amount of time you take memory misses there, is going down. But the rest of this is not changing. Hm. Well, okay. This is, this is interesting, so the second question comes up as, what happens if we increase the number of cores in our system? So this is a relatively small system here. Let's, let's plot the number of cores here from one to eight, at the same workload. And look what happens. So if we, we look at this. Something else is in variant here. This is for a fixed cache size. We're going to plot the number of processors down here now. The. Number of memory cycles per instruction, for instruction misses, complex capacity misses, com, compulsory misses, doesn't change, just stays the same because it's basically uni-processor based. But, as you add more cores you get both, more true sharing and more false sharing. Hmm, well this is a little scary because our performance is basically going down as we add more course. So this is only up to eight, what happens if we're, you know way out here at 100 course? What do we think's going to happen? Well, we're probably going to to be dominated, our performance is going to be dominated by the sharing and the false sharing and these misses. Huh, well, we're going to start thinking about that one. Figure out how to, how to solve that and think about scalability of coherence system.