Okay, so, now that we've gone through the beginning excersises of what a directory based distributed shared memory machine looks like. Let's talk about how to actually figure out where the directory is. So you have an address. And usually these systems you don't want to do it on the physical address spaces. You're not going to want to do this on virtual addresses. You don't want to have to run this, This is because you're sharing data between lots of different systems. At this point you're sort of, out of the system bus. Your address is no longer virtual. You've gone, you've gone through the translation look inside buffer or then MMU and you've figured out what the physical address is. So, to figure out what the directory is, or sometimes called the, the if in a distributed memory machine the home node. Or is the it's a number of which one of these directories to go to. And there's a lot of different ways to do this. But one of the more common ones, is to just use some bits out of the address. So you take the number of directories in the system. Take the log base two of that. And then, you take that number of bits to be the home node number. So when you take a cache miss, and it's not in your cache. And you need to go figure out and do the load of that data, we'll say. You send a message and the message ID and the destination of that message will actually be the home node. And hopefully, your interconnect knows how to route the data to that directory. Now, taking the high outer bits has some benefits. Lets, lets take a look at that. As we discussed already in a, in a non-linear form memory access architecture, the OS can control the placement. I can do this because based on these high order bits, you can actually determine where, which node in the system or which directory in the system you're going to. So you can actually basically allocate memory, allocate your stack, allocate your instruction space, based on the physical address and the OS commands that. because the OS has absolute authority over where physical addresses get doled out to. Downside is a directory or a home node can become a hot spot. So let's say all of a sudden, all of the processors in your system try to access one page of memory. There's like a, a hot page which has all the locks in the system. And, you're in some threaded program and you have to access those locks a lot. Well, if you look at that, that's all going to be down here. It's going to be sort of low order addresses. It might be from sort of here down, whatever your page size is will say. So even, even if you're not having false sharing, anything like that. You typically would try to pack all the data onto a page or a structure or something like that, and it's pretty hard to interleave it based on the very high order bits of your, of your address. And especially considering a program has effectively no control of the high order bits of a physical address, that's managed by the OS. So if you do this, one node can become a hot spot, because these are all alias to the same directory. So all, all the messaging traffic goes to one node and this almost starts to turn back into a bus. Now, we have one directory, all traffic has to go there. It's a little better cause we don't necessarily need to invalid all other locations, but the directory and the bandwidth in the directory starts to become critical. Hm, well that's a tough one. The flip side is you can start to try to have the low order bits determine where your directory is, or which home node you're using. So, you still have the, the offset within a cache line. But then you have the number the, the bits of the physical address that can determine what home node your going to be the low order bits. Well, this ends up being very well load balanced, because you'd choose different home nodes effectively atrandom depending on which cache line it is. So you know two cache lines will same cache line will go to the same home node or the same directory. But if you have certain different cache lines in which it is pretty common because it is pretty hard to content all unwanted cache lines. This is up much data in one cache line. [COUGH] You'll spread across the different controllers and you'll effectively have some good distribution. Flip though is the OS losses placement ability here. So it's, it's tricky, it's a tricky trade-off here to think about. some people have even built systems where it's configurable. this gets a little more advanced. And I touched on this in the last slide of, of today's lecture. But, [COUGH] you could think about having some systems where, depending on the actual address and depending what comes out of your, page table. Maybe making different, choices of how to do the mapping. But everyone has to agree on the mapping. Which gets a little bit tricky because the directory has to agree on the mapping. And all of the caches in the system have to, agree on the mapping. Okay, so let's take a look at what is inside of a directory. So we added this new hardware structure, and whenever we add a new hardware structure I like to look at all the bits inside of the hardware structure. So we add a new arbor structure, and this arbor structure has an entry per cache line in in that particular memory connected to the directory. So if you were to look across the entire system, there will actually be an extra piece of data for every single cache line in the system. And the naive approach to this will habit such that every single cache line in the system, whether it's. Sorry, not every single cache line, every single memory line in the system. So if you've ten terabytes of memory in the system, the naive approach is going to have a directory entry for every single block size chunk of memory, a cache box size chunk of memory in the system. And these are held in big tables, typically they're held in SRAM. You might try to put them in DRAM. And what do we have here well the directory needs to know what state the cache line is in and we're going to look at three different states in our basic protocol here shared, uncached, and exclusive. So everything starts out as uncached it's out in main memory. When it gets pulled into a cache's read only, the directory is going to okay that is now shared. If it gets pulled into a cache read/write, the directory is going to note that as exclusive. Now, if it's in shared or exclusive, we need to know what node, well if it's exclusively, you know, uniquely what node has that? So we can go message it when we need, need to go invalidate it. And if it's shared, we need to know the list of all possible places that it could be, that we're going to have to send messages to. And this is better then having to broadcast or send messages to all the nodes in the system. So we're going to have what's called a sharer list here. Which is a, in a naive full map directory is going to have one bit per core in the system, or per cache in the system. And it's either just going to have one or a zero in it. So if it's a one that means that core has a share or read only copy of the data. And when some other cache goes to get it in writable in its cache it's going to have to invalidate, let's say this one or zero with core's cache. Now if you're exclusive, your not going to have multiple bit set here. Cause this basically means that, that core has a writable copy and we can't have if we want to keep the data coherent we won't want multiple, we don't want multiple copy writings in the system. So as you can see here, denoted only one, one here. And if it's uncached, we don't need to track anything there, we just got, don't cares. [COUGH] There's one other state here that I, I have and it's pending. And this usually actually turns into a couple sub-states there's different ways to track this. At the directory, these transactions take multiple steps. You're going to send some data and start transitioning. Let's say, you want to get a data, data writable. Well it, that, the directory's going to have to invalidate all the other copies. It can't do this instantaneously, but we want to provide the appearance of a atomicity or, or, or that the operations are atomic in some way. So typically, you'll actually have some sub-states that are shared, that are stored here, which are something like, oh this cash line is currently transitioning from, I don't know, U to E. Don't allow some other transaction to happen to it right now. Just kind of block that. Another way to do that, the one way is to store it actually in the directory, as a state bit. Another way is you have some fully associative structure, a side structure, which just has all of the cache lines currently in flux. And, the directory's smart enough to know that if some other request comes in for that line, while it's in flux just to NACK that request, or negative acknowledge that request and tell the other cache to retry. So you can do it either way. but it gets pretty complicated. We're not going to talk about all the details of that but we'll talk about the high level transitions assuming that they are somehow topic. So here we're going to look at how MSI. It fits together with this. But you could actually think about doing this with Mesi or some other protocol. It's a little bit simpler, emphasize a little bit simpler so we're going to look at that. Also the benefit of something like a Mesi protocol is lessened in a directory because if you pull something in, in the exclusive state, which is unmodified at the beginning. [COUGH] And someone else wants to get a read only copy. You're basically going to have to send a message to that core. And that was inexpensive on a bus, because it could just see the transaction going across. And it would just snoop it and would demote from E to shared or something like that, E to S. But now, it actually turns into actual work. The directory's going to have to generate messages. And you're going to have to wait for responses coming back from a cache which had it in exclusive, so. [COUGH] full mezies a little bit less common when you stretch grow these distributed shared memory protocols. Okay, so this is a slide we had before. This is MSI on a bus. Well things change a little bit when we go to MSI for directory coherence. And before we go through this, I wanted to point out, that there is actually two different state machines going on here. There's one state machine that is happening in the cache controllers, so actually, in the cache of a respective processor. And then there's a different state machine which is happening in the directory. And you'll see that they have different letters here. This is SU and E versus MS and I. And, and we label these differently on purpose just to, not, not get totally confused. And these state machines interact by sending messages between each other, and as messages flow between the directory and the cache. There will be both going through different state transitions on this, on this two tables. Okay, so let's, let's jump into this. This is the same modified, shared and invalid states that we have in our bus space snoopy and aside protocol. We didn't change anything here. And the rules, the rules the same. If you haven't modified, you can do a right to this and not to send any messages. If you have a shared, you can read the data and not have to contact anybody. If you have an invalid, and you want to do anything with it, you probably need to contact somebody. Or you probably need to contact the directory. Before, we would have to send the transaction on the bus. Likewise, the transition from S to M or M to S where you, used to communicate it was the same. So think about this as the same state machine running, except running on a bus where before we would send transactions across the bus. Now we're going to take those transactions and turn them into messages that we send to the directory and messages that we receive from the directory that we have to respond to. So before when we were snooping traffic crossed the bus which caused us to transition different locations. So here other processor has intent to write and we saw that across the bus. So we had to transition ourselves to the invalid state. Now, we're actually going to get a message from the directory controller. So let's, let's walk through this. But it's, it's almost exactly the same as what we saw before. So this is the, the cache date for a particular line for processor P1. we'll start with the entry points. We start off an invalid and let's say we want to get a read, a readable copy of this line. So we're going to take a read miss. So what we're going to do is plus serve one is actually going to send a read miss message through the directory controller. And during that time, it does not have a readable copy. It cannot go and access the data. It's, it's a, it's effectively still in the I state. [COUGH] Sometimes people will actually have sort of a pending state here depending on how you go to implement this depends if you have a side structure sort or something like a mishandling registrar where you'll track that in. Or you can track that in the, the cache data itself. [COUGH] So you're going to read miss. You send the read miss message, and you're waiting for a response. This response is going to have the data that you need. And, it's going to be synchronization points saying, okay you're safe to transition to S. Okay that seems pretty simple. Similar sort of thing here for write miss if we're in the in invalid state and we do a write we're going to send a write miss request to the directory controller. It's going to do something and it may we may have to be waiting for awhile here cause it may have to go invalidate all of the other lines in the system. [COUGH] And then it gets a response and once it gets a response we have a data that we can transition to the modified state. So as we said, these arcs are pretty easy you can read by P1 and nothing changes or we can read or write from the M state by P1 and we also communicate with anybody. But now we have a few different messages coming in here. If we're in the shared state, we have to be responsive to an invalidation message. Which is a little bit different than a bus snoop. So before, we saw another processor trying to write. That's when transition goes to I, but now the directory controller sends us a message which says, invalidate this line and that will transition us to I here. Note, there will probably be a reply. We will probably have to send a reply, because the director controller wants to know. When all of the cache lines in the system have been validated and it may take a variable amount of time and its sending messages so it wants to wait for a reply to come back so we're going to have to send a reply. So this arc here is similar. Except, we need to write back data, cause we had modified data. We had writable data. We get an invalidate message from the directory controller. So we need to write back the data, and then reply afterwards. Similar, similar sort of idea here. Okay, so that leaves two arcs left here in the middle. We're in shared, and we want to do a right to a, to that cache line. So, our cache we have in the shared state. We want to do a write to it. [COUGH] Before we can actually do a write we have to send a message to the directory saying, I'm doing a write miss here. I want to get this data writable. And we have to wait for a reply before we transition here. [COUGH] because we have to wait for the directory contror to communicate with all the other cache's so that they don't have redoing copies and we can have a writable copy. So it's going to invalidate all the other readable copies in the meantime. And then finally, we have an edge coming this way which is from modified down to shared. And this is a little bit different. well, it's the same idea here. Another processor is tying to do a read. So we have in a modified state when another processor tries to do a read. So we receive a read miss message. We don't need to invalidate the data, but we need to write back the data. because we have the most up to date copy, because we had it modified. So we're going to go into write back the data and that's going to be response, and then we're going to transition to share and state. We can keep a read copy of this, because the other, the other core is, is, is only having a, a readable copy of it also. Okay, so that's the. Any questions about that so far? Okay, so two interesting arcs that we're going to add in here is this one and this one. Which we didn't have in our base MSI protocol. And you know, you may not need these. But what these correspond to is, if our cache has the data in it and then because of let's say a conflict miss, or capacity miss it gets bumped out. It might be a good idea to go update the directory, and tell the directory that in the future if some other cache wants to go get that data, that it doesn't need to go contact you again. [COUGH] So, if it's in the modified state we can write back the data because we have dirty data we write back that the directory and then we notify the directory saying we don't have a copy of this anymore you can transition to having it uncached. Likewise here, if we have a read-only copy we may or may not want to do this. If we, if there's, you know, extra bandwidth on the, on the interconnect we might want to send a message when we do an invalidation here. And this is not an invalidation because of an invalidation message, but this is an invalidation, because it just gets bumped out of the cache. We may want to notify the directory saying please remove us from the sharer list. And if the sharer list is already empty, the, the directory might change the cache line from shared to being uncached completely. but I do want to point out that these are not strictly necessary. The reason they're not strictly necessary is, if we build the cache controller system such that if you're in the invalid state for a particular cached line, and you get some message coming in that would have been let's say this message, or that message, or some other arc. We can just reply back saying yeah, we don't have it anymore. We're invalid, you know, we don't really care about that, that transition. So if you were, you were here, the only message that's going to come really to you is an invalidation message that would just take you to this state anyway. So, we can just ignore the message or just reply the same as we would to the normal invalidation message. Okay, so directory state transition looks a little different here. We have uncached, shared, and exclusive. [COUGH] As we said, shared means there can be multiple read-only copies in the system. Exclusive means there's only one cache in the system with that data. What's interesting here is if you were to actually have a MESI protocol running, that would not change the protocol running in the directory. Because exclusive here is effectively the same, same state, with respect to how the directory sees the line you won't have to do anything different. Okay, so let's walk through a few transition here of the state of the cache line in the directory and this is not in the cache. Let's start off uncashed and let's say we're getting a message which is a read miss from processor P. Well, we should transition to S now. We should give it a readable copy and we should reply with the actual data and we should put P on the sharer list, so that we know that if someone else needs to go invalidate that line we need to go contact P. Now that we're in the shared state, let's say there's other read misses from other P's other processors here. Well were going to give it up the data and we're going to add it to the sharer list so we're take sharers and add to it. The processor the sharer list is just going to grow. Okay lets, lets start here and go the other way where an un uncached in all the sun in we get a rightness from proster P. Well we give it the data and the sharer list or the owner is going to get P uniquely on to it, ever going to give it in these causes day because we're on cache reform. We don't want to contact anybody else. let's look at this art here before we go to these. So this is a little bit different. Quite a bit different than what we had in these slides, because it's doing something different. But in this state here, we know, let's say, processor P zero has the data exclusively. But all of a sudden, a different processor, let's say processor two goes to access the data. Well, we already have the data in the exclusive state. So we're going to stay in this exclusive state because some other caches going to want to get it exclusive, but it's different cache. So what has to happen here is we need to go invalidate the data out of P zero. P zero is going to write back the data, it's going to transition to the invalid state. [COUGH] The, we need to then provide the data to the new processor P2 we'll say and add that P2 to the sharer list. So we can, we can transition to this state and then finally let's look at the edges between these two points oh, actually let's go this way first. if you've data that gets ridden back. so this is that arc, which I said is similar to the arc here, which is optional. Let's say you have data that gets right, ridden back here. Actually this, this arc may not be optional, let's think about that for a second. This arc may not be optional. no it's still optional. because you can just NACK the message effectively, and, and tell it it's in main memory. okay, so let's hear, and you see a data write back happening. So, message gets sent to you which is the equivalent of this arg here. The data was writeable, was exclusive to some cache, and it's no longer writeable. It's probably a good idea to go contact the directory, write back the data, and clear the sharer list. The sharer list is empty, so it knows that no one has a copy of it, at that point. Okay few other financials here, okay we are in the shared state. So we have multiple read-only copies. And one cache comes along and says,"Oh, I need to do a writeness message." I need to get a writtable. Well, now we actually have to go through a pretty long process. We're going to walk through the entire sharer list and send messages to all the sharers in the sharer list saying, invalidate this copy and tell me when you're done. We're going to collect all the responses at the directory. And once all the responses have come back, we know no one else has readable copy. We can give the data value to the requester. And add it to the sharer or owner list. Okay, last arc here is from E to S. This orange arc and that happens if we have a particular line as writable in one cash, and another cash wants to go read it now. Will send a read miss the other cache is going to downgrade from E to S, excuse me from M to S in its vocal cache. But the directory is going to transition from E to S here and we have to go get the most up to date from the node. So, we're going to send a fetches and a fetch request to the node that had it before and exclusive, once you get the up to most up to date data you can forward that to the new reader and everyone and, and we add their processor to the sharer list. Okay, so questions about that one so far? These, these do start to get a little complicated because you have multiple state machines interacting. Okay, so were going to speed up a little bit here. I include this chart from your book just to give you an example of. We went through, very quickly here, all the different messages. And, this chart here sums up all the different message types. And from who they could go from and who they could go to. And this is, this is in your textbook. and sometimes messages need to communicate addresses. Sometimes they need to communicate data. Sometimes they need to communicate which node the message is coming from. To add it to the, the sharer list. But I'm not going to go through this into, to great detail. One think I did want to say is, these message types here, do not include [INAUDIBLE]. So, when you go to request something, there's replies that come back. These replies after, that's not drawn in this diagram. We, we see data value reply but that's not, that's just what of, actual data. There's not like a, response coming back from the sharer acking the, the sharer, or acking the invalidator or something like that. Another type of message that is pretty common, that is not drawn here is a negative acknowledgement. So it's pretty common if you have a cache line that is being transitioned, it's in a pending state, at the directory, and get a request coming in. You might need to tell that cach retry later. I can't handle this case later right now.