Okay.
So that brings up the, the question of register renaming.
What is register renaming? Hopefully some of you skimmed the Thomas
UIlo algorithm paper that I assigned, cuz we're going to be discussing that and the
motivation for that work. Okay.
So, what is learning our performance in these pipelines, these out of order
pipelines that we've discussed so far? Okay.
A couple things. Write after write and write after read
dependencies. Let's, let's talk about these.
In write after write dependence, you're going to write to one register, then
you're going to write to the register again, and in our pipelines we've talked
about so far, you're basically gonna stall the pipe while you're waiting for the
first right to commit. Because we're not able to handle multiple
rights in, in the pipeline at the same time.
But these are not fundamental dependencies.
So computer architects put on our thinking cap and we came up with ways to break
these dependencies. A read after write, that, that also is
true for a write after a read. So, if you do write to let's says register
four after a read from register four, well, there should be nothing wrong with
that. But, if you try to execute the
instructions out of order, then you need to think about that.
A read after write is a true dependence, because you actually need the data, you
need the value to go execute the subsequent instruction.
And we're going to call write after read, write after write and write after read
dependencies name dependencies, and we're going to call a read after write a true
dependence that we can't break. Okay.
So let's, let's look at a some example code here, and see what can go wrong if
you just ignore all the name dependencies. Like I said, they're not true
dependencies, so maybe we just don't need them.
Okay, so we have, a different code sequence here, we have a mul, mul, then
two add immediates. Let's, let's identify some important
things in here. First let's identify the true read after
write dependencies. So I, I put some circles and some arrows
here. So let's take a look here.
This writes register one in the mul. The second mall reads the results of that.
Okay? Let's see here, this add reads register
four in the previous instruction and writes register four, so that's a true
dependence. We can't break those.
We make talk at the end of class at the end of the term, maybe about some ways to
break those, but they get pretty, pretty crazy.
So, let's, let's look at the, write after write dependence.
So the first write after write dependence is here.
We're writing register four, and then we write register four again, but an
out-of-order processor, if we try to break all these dependencies, we can see that
we're actually writing to register four here, and then we write register four
here. And that's, like, out of order.
Whoa, what just happened here? Well, we said we just broke the
dependency. We're not gonna stall the front of the
pipe on this. So if you to execute this on one of our
out of order issue pipes that we've, looked at so far.
Let's say this is executing on the in order fetch, out of order issue, out of
order execute, and, right back in out, in order commit pipe.
We can see that we will write to register four in time before we wrote to register
four here. Now that's going to cause some major
problems. We just wrote the wrong value.
Oops. And the other one here is a write after
read dependence. So here we have a read of register four.
And here we have a write of register four. And because this add got pulled so early
in the, in the execution order, we actually wrote before this instruction had
a chance to read the value. And the reason this instruction got
delayed was cuz it was also dependent on a true dependency here.
But it's dependent on two things. But all of a sudden we wrote register four
with the value from this add, and then we went and read it, and we read the wrong
value. So, we can't just go change and break
right after write dependencies and right after read dependencies very easily.
We need to think a little bit harder about this.
One, one last interesting thing that happens here.
This is, this is kind of fun. We do commit in order in this pipe.
But look what happens to register four. We wrote register four, here, then we
write register four again. And then we commit from physical register
four to architectural register four. So we just committed the wrong state,
also, to the architectural register file. So we're having lots of, lots of problems
with here. It's not just, basic things.
So what's the solution to this? Well solution, as we can start thinking
about how to add more registers, so at the top here is the same example I had in the
previous slide so nothing, nothing new there but I wanted to compare this to,
let's say our conservatively stalling pipeline from performance perspective
first. So here we have our in order, our in order
fetch out of order issue out of order write back and in order commit pipe.
This is our most advanced one from last time.
But, it conservatively stalls on write after write and write after read
dependencies, and that's drawn here of these arrows.
So we can't even issue this instruction until we know that, let's say these two
instructions here, which have something to do with register four, commit.
And then we can go to issue this. Now this might be a little bit
conservative. This might be even a little bit over
conservative. It might be possible to pull this back one
or two cycles, maybe to, sort of to the point where this instructions does the
write back. But one of the challenges there is, you
don't necessarily, you can't necessarily track that very easily inside of your
reorder buffer, unless you have something there that scans for this exact case.
It's not going to save you that much performance either.
What, what I'm trying to get across here though is that the performance of this
instruction sequence is actually worse. It takes longer than our incorrect, but
we'll call ideal case here on top. So let's, let's do one little change to
the instruction sequence, highlighted in red here, and see what happens to our
execution. So we took this ad which wrote to register
for and changed it. We added another register usage here and
now we write to register eight. And lo and behold, it breaks all the write
after write dependencies, and it breaks all the write after read dependencies.
And all of a sudden, we get to, our idealized performance, this, exact, this
is the exact same case as that. But we require another register.
Hm. Well, please add infinite numbers of
registers. So, what is, what's, what's the con of
adding infinite numbers of registers? Anyone have any ideas?
So, if we're trying to use more registers here we might use up all of our
architecture registers. Can we just add more architecture
registers to our instruction set? Yeah so, so it takes up space.
So if we look at our registers, we could have a larger name space for our
registers, but if we have 32 registers, that takes five bits.
If we have 128 registers, that's gonna take, you know, seven bits.
And if we have infinite, that will take infinite number of bits.
But what we're gonna talk about in today's lecture is how to do this in hardware such
that you have some more registers in your physical register space, but not more
registers in your architecture register space.
And this is, I should point out this is not only a register problem.
This could also happen with memory. If you name your memory inappropriately if
you have a very small amounts of memory and you try to reuse it very aggressively,
you can get name, naming problems in that also.
But for today we're going to mostly be focusing on registering.
And so I'll define registry name as, well you're gonna change the naming of the
registers in hardware to eliminate these write after write name dependencies and
the write after read dependencies. Okay, so we're gonna be talking about two
major schemes, and there, they are mirrors of each other and have slightly different,
hardware requirements. What their lot, mostly when you think of
them, they're just sort of logically duals of each other and there's different ways
of thinking of the same problem. So the first scheme we're gonna be looking
at is we are going to add pointers at, in our instruction queue and reorder buffer.
To allow us to have different register names in them and not actually just have
architectural register names in those data structures.
The other option, which is, if you, if anyone actually read the Thomas Ullo
algorithm paper, is to actually store the actual value, the data value, in these
data structures. In the reorder buffer and in the
instruction queue. And, and they look, they look very
similar, if you sort of think about it, and they're gonna have the same
performance. I'll, I'll, I'll give you the sort of end
of the novel at the beginning here. They're gonna have the same performance,
they're, they're, they're doing the same, doing the same thing but they're, they're
slightly different in the mechanics perspective.
And, and we are gonna start off by looking at this first one here, we are gonna have
pointers in the instruction queue in the reorder buffer, Mainly, because we already
have pointers in our design, that we looked at the last time, this in order
issue, out of order, or similarly, in order fetch, out of order issue, out of
order executing right back and in order commit design.