Okay. 
So, let's introduce a model of, and introduce some semantics to describe 
ordering of memory operations between different processors. 
One of the most common ones that you're going to see is called Sequential 
Consistencies by gnoming the only one out there. 
This is a very strong ordering. compared to any of the processors you 
guys run on your computers today, this is extremely strong. 
None of your computers actually implement this strong of an ordering. 
But, it's a good basis to think reason about and reason about parallel programs. 
So, we're going to study sequential consistency sums. 
So, up here, we have one, two, three, four, five processors, one big memory, 
still very basic model. We don't have, we've not introduced any 
caches or anything crazy like that yet because that makes all these things a 
little bit harder. Either caches are out of order. 
And, what sequential consistency is, is the idea that you take all of the 
instructions and all of the programs executing on all of the processors, 
and you guarantee that the execution sequence that is seen by all of the 
processors is some valid in order interleaving of those instructions of the 
relative processors themselves at the beginning. 
So, to give you an example. If you have processor one, 
[SOUND] [COUGH] and processor two. And processor one goes one, two, three, 
four. And processor two goes execution 
instruction five, six, seven, eight, you're guaranteed that some intervening 
here is what happened in sequential consistency. 
So, let's say, one and two happens. 
And then, five happens, and then three happens, 
and then six happens, and then seven, and then eight, and then 
four. That is a valid sequential consistent 
order. Likewise, another one is easier to reason 
out is one, two, three, four, five, six, seven, eight. 
That's valid in sequential consistent, sequential consistency. 
Likewise, I'm going to say five, six, seven, one, 
two, three, four, eight is valid order. What is not valid is going to be 
something like [SOUND] five, one, I don't know. 
Three, two, four, seven, eight. I see it's missing one. 
five. No, five, six, no, six. 
Okay. We'll put six in there. 
Because we've reordered one of the individual address sequences, two and 
three should be ordered. So, 
this gives us some guarantees. so the, the basic idea is that, 
it's an arbitrary, order-preserving interweaving. So, each processors 
addresses' are preserved in order, but can be interweaved in between the 
different threads arbitrarily. Okay. I want that to sink in for a 
second. Anyone have questions about what the 
model is? Or, why this is a good model? [SOUND] Okay. I'm going to, I'm going to 
ask a few questions here. Why is this a model that almost no one 
ever implements? So, the reason no one actually goes to 
try to do this is as all the things we talked about when we try out of our 
processors, you want to try to re-order your lows in stores for performance 
reasons. and the second, you try to introduce 
something like cache into your processor, this gets very hard because communication 
becomes a real bottleneck. You might have one processor that did a, 
for instance, a store into its cache and not everyone has seen that value update. 
So, unless you actually want to have one monolithic memory that you play all loads 
and stores against in some interleaved yet in order to the processor order, 
and everyone sees that exact ordering, you're not going to want to actually 
implement that. That's very hard to do. Okay. So, 
[SOUND] fun, funny example here of sequential consistency. 
Let's look at two threads, T1 and T2. And two variables, actually, we'll talk 
four variables here. X, Y, X prime, and Y prime. 
X prime and Y prime are the outputs here so we do stores to those. 
[COUGH] At the beginning of time, we are going to say X and Y are 
initialized to zero and ten, respectively as shown. 
Okay. Let's, let's talk about what our valid sequential consistent outcomes here 
for X prime and Y prime. Now, this is not actually easy to detect 
right off the get go. Does anyone know which of these four, I 
think the answer is on your sheet, but don't go look at it. 
it's not actually easy to detect which one is sequentially consistent, which one 
is not sequentially consistent. So, let's, let's walk through a few 
interweavings here to see what happens. Okay. So, we're going to have thread one, 
thread two, X, Y and X, Y primes. Let's say, we execute T1, 
and then T2 in time. So we're, we'll draw time this way. 
So we do, we do a store of one. 
We do a store word eleven. And then, we do a load word of Y. A store 
word of Y prime, load word of X and a store word of X 
prime. We do this store here. 
at the beginning of time, as I said, X has zero and Y has ten. 
We do a store here and this is going to be storing to X one, so we're going to 
update the value of X there. Now, we're going to store eleven to Y. 
Okay. So, eleven gets loaded there in time. 
[SOUND] We do this load. Nothing, no, nothing changes. 
We do a store here to Y prime and we look at what we got when we did this load of 
Y. We loaded eleven and we're going to store 
eleven to Y prime, at this point. Then, we do a load of X which has one in 
it now, and we do a store to X prime. It's going to have one. 
So, to sum up, you have eleven and one. Okay. 
That's this one. That's a valid output. 
And what's interesting to see here is, even with the stitch memory model, 
we're going to end up with different possible outcomes. 
There's not only one possible outcome. There's actually multiple possible valid 
outcomes here. Okay. So, let's look at another. 
Let's use the, the same columns at the top here and let's say we have these two 
split so that the first store happens here and the second, the second store 
from thread one happens there. Still sequentially consistent because 
they're being ordered inside of the respective threads, and it's a global 
ordering that everyone sees. Okay. So, let's, let's go do that one. 
So we do store word of one. Then, we execute in thread two load word 
of R1, Y. 
Store word of R1, Y prime. Okay. 
So, let's go to the store here. The store's going to, 
oh well, actually, we'll finish writing it first. 
Load word of R2 from X store word R2, X prime. 
And then finally, we, we finish off here writing the store word of one to X. Okay. 
Oops, it is wrong. Yep. 
[SOUND] That's what I do. Okay. 
Store word's going to happen here first. And, remember we started out with zero 
and ten, and zero in, in X and Y. So store word of one is going to happen. 
It's going to update X to be one. We do this load and it's going to be at 
ten now. And we store ten into Y prime. 
We do a load word here of X, or we're basically going to store that 
back into X prime. So, what was X? 
We look back at this column. Most up to date value was one. 
Okay. So we come over here, and we say, that's 
one. And then finally, we do a store word of 
eleven into Y. Okay? 
Or Y to Y, so it gets two, eleven into Y. 
But what's the output here? Well, the output is one and ten. 
And this value, this store, didn't do anything. 
I mean, it updated the value of Y but it didn't effect Y prime or X prime in any 
way, shape, or form. Okay. 
So that ends up being one in ten is a valid sequentially consistent output. 
Let's try a non-sequentially consistent execution order and see what happens. 
[SOUND] Let's say, I don't know, which is a good one to execute. 
there's a couple different ways to get this. 
Let's execute this instruction here first. 
So, to store to Y. And we're going to reorder that with a 
stored X. And, we're going to execute thread two in 
between those two operations. So, we use store word of eleven to Y. 
So, we've out of, let's say, T1 is an out of order processor here, for instance. 
Then, we're going to execute all of thread two, and we'll execute all of 
thread two in order. [SOUND] And then finally, we're going to 
come back and thread one here and do the store of let's say, one to X. 
Okay. So, X and Y start out with zero and 
eleven, or excuse me, zero and ten, rather. 
Store of eleven of to Y, okay that's going to update. 
This could be eleven. We're going to load Y now, 
so we're going to load this eleven into R1. 
And then, we're going to store up to Y prime eleven. 
Then we're going to load, let's say, X here. 
Well, X is just the initial value, so we're going to get zero into R2, then 
we're going to store our zero into X prime there. 
And then finally, here, we're going to do a store of one to X. 
So, what's our output? Well, 
our output is zero and eleven. And, but we had a non-sequentially 
consistent, this is a non-sequentially consistent output here. 
So, there's no way we can work through all the different possible combinations. 
But the only way you're possibly going to get a zero for X prime and eleven for Y 
prime is if you execute non-sequentially consistent execution sequence. 
And this was a, because we flipped the ordering of these two instructions, 
that made this not sequentially consistent. 
So, we have a big X over it. We could work through all the other 
possible combinations here. But, there's actually another, 
interestingly enough, there's actually another way, another interweaving that 
gets you this same non sequentially consistently output from a different 
sequentially consistent ordering of these instructions. 
Or, or a different non-sequentially consistent ordering of the instructions. 
So that, that can't be true. Okay. So, what this is really trying to 
say is that, in our processors we've looked at up to this point, 
we've had some arcs. And example of an arc here is that if you 
do a load into R1 and then use the value of R1 to do the store, 
sorry these should be order flipped for MIPs compatibility. 
Yeah. I flipped them there, I missed here. 
Okay. for this store here to happen, it needs 
to wait for the value of R1. So, just simple read after write 
dependence. Likewise, here you have a read after 
write dependence. And it's sequential consistency you can 
have other dependencies. So for instance, if you do a load in the 
store and their to the same address, there needs to be order between those. 
Which we've already talked about in this class. 
What we've not talked about is additional sequential consistency requirements. 
So, if you want to have additional sequential consistency requirements, what 
it looks like is every memory operation is dependent on the prior memory 
operations in its own thread. So, we're going to draw that as a red arc 
here. And you can sort of see it here, but 
also, this is dependent on this, and this is dependent on that, and that is 
dependent on that, but sort of through transitivity they, they don't draw all 
the arcs. So, some, something to think about. 
Okay. So, we've got a question that comes up here. 
I think we already asked this question but, 
does a system of caches and out-of-order processors provide a sequentially 
consistent unit of memory? [SOUND] Answer's probably not. 
It's pretty hard to do. You could potentially try to build a 
processor which is out of order and, and do this. 
But, we talked about breaking and reordering all of these instructions in 
an out of order processor. So, we're going to try to think about 
what is the current, correct model if you cannot go for full sequential 
consistency. You're asking me, is that the Java 
compiler to guarantee this ordering. Okay. So, what we're actually talking 
about here is the hardware breaking the ordering And just reorder these randomly 
that the compiler can possibly never be able to try to control. 
That's, the, the programmer is at fault there. 
The programmer should not be assuming that. 
[LAUGH] The programmer assumes, assumes some, 
basically, programmers try to assume something like sequential consistency. 
And what we're saying is that none of the hardware out there implements sequential 
consistency. So the programmer can't assume that, 
it was a bad assumption on the programmer's perspective. 
so the compiler, for instance, the reason why the programmer assumes that is 
because it's kind of a rational thing to do. 
It's like the, the reasonable rational thing to do. 
But, it's really hard to implement something fast under those constraints. 
So, instead what people do is they try to come up with something weaker than that 
which still gives the programmer some simblence of programmability. 
And that's what we are going to be talking about the rest of today's lecture 
and next lecture what are those simblences of rationality that we can 
give the programmer. But, this is one program and this is 
another program. Or this is one thread and this is another 
thread in the same program. But the compiler cannot guarantee this 
ordering because the out of order processor will just move things around, 
disregards the compiler. We already talked about the out of load 
processors moving loads past storers and storers past loads in the hardware. 
So, and what we're saying here is if you want to actually implement full 
sequential consistency, all those optimizations that the hardware out of 
order processor want to do are invalid. We don't want to really do that. 
okay. So, that's sequential consistency.