Okay, so today, we're gonna start our
third installment of. Ele 475 computer architecture.
And this is going to be more review and we're going to finish up talking about
hazards. And today we're going to be talking about
control hazards. And then a little bit later, we're going
to start talking about caches and why we have caches.
So let's start off by looking at control hazards.
And just to recap, the four different types of or it's going to be the three
different types of hazards we've talked about in this class so far.
We're, we're talking about structural hazards, data hazards, and now we're going
to talk about control hazards. Okay, so why, or excuse me, what do exec
information do we need to calculate the next program counter?
Hm, well, is it the same thing for every instruction?
When we go to execute an arithmetic logic instruction, an add instruction, do we
need the same information to calculate the next program counter?
You might say well isn't there some piece of magical hardware which just calculates
the next program counter? Well, yes but we need to talk about what
that magical piece of hardware is. So, as you might have guessed, it's
actually different for branches and jumps than it is for sort of more traditional
instructions or, or arithmetic, logical instructions or everything else.
So let's start off by looking at jumps. So, if we look at a jump, a jump, you need
to look at the op code to make sure that it's actually a jump.
You also need to look at the offset within the instruction, and you need to look at
the current program counter. And you take that all together, and like
oh, it's a jump. The decoder says it's a jump, the decode
pipe stage of your processor says okay it's a jump, and then you can take the
permanent counter, add it to the offset, and you probably need to do that either in
ALU, or if you need a special adder to do it.
And the pipelines we've drawn so far on our 5-stage pipe, we get a special adder
just for that. And you do that offsite calculation, and
then you want to go and vector your machine so the next instruction you go to
execute is at the target of the jump. Now, this, gets a little more complicated
when you start looking at jump, register. So jump register, you don't know where
you're going until you go to code the instruction, fetch the information from
the register file, to me at least you don't need to, do some conditional
calculation, but you will down here. But we're just gonna have to look at the
op code to know that it's a, a jump register.
We don't need to look at any offset, because we are jumping directly to a
register value in something like MIPS. In other instruction sets, you might need
to look at other, other sorts of, information.
So you might have, for instance, a register indirect, jump register type of
instruction. Or even a memory indirect jump sort of
instruction. Conditional branches, now things start
getting a little more complicated. We need to look at the op code, we need to
look at the current program counter, we need to go look at that register, which is
gonna give us the condition. So we're branching based on whether some
value is, let's say greater than or less than zero.
Hm, okay. So we need to go look at that, but we
don't know that until quite a bit down, farther down the pipe, and we also need to
take the offset and add it to the program counter when we do a PC relative
conditional branch. That's how it's defined in MIPS.
In other instruction set you can have different types of conditional branches,
either a absolute addressing scheme or something where it might be register,
register indirect, or, or some, some other thing like that.
But, for MIPS, we're just going to take program counter, add it to our offset.
And branch that if the register is, that we, the register or the condition is what
we were looking for. If not, you wanna just follow through, and
go to PC+4 or the next instruction, if you, if your instruction is four bytes
long. Hm.
Okay, everything else. Believe it or not, we do need to actually
think about this case. It's not some magical piece of hardware,
we're gonna discuss this magical piece of hardware today.
You need to take the op code and you need to take the PC, and you need to add some
constant to it to compute that. You know, you want to fall through to the
next, next instruction. So, you know, while we're looking at this
we, we might have to look at the program counter, but we also have to look at
information which comes at different stages in the pipeline.
The op code doesn't get decoded probably until something like the decode stage.
Registers don't get fetched, let's say, until the instruction, register fetch or
decode stage. And, for condition, you may even need to
do some comparison of zero or comparison of another register.
So you need to do some math or, or run it through your ALU in your execution stage.
Something like jump register is similar there.
You're not gonna know the destination until maybe, once you've gone into the
either execute stage or possibly way, way at the end of the decode stage.
So let's take a look at a basic control hazard and the basic control hazard is we
wanna execute instructions and we wanna fall through to the next instruction which
sounds. Pretty basic, but.
You would say, "Why, why is there any structural hazard there?We're not changing
the control flow." So let's draw the pipeline diagram.
Assuming that we've no branch delay slots in our architecture.
And we'll talk, more about branch delay slots in a second.
Let's draw a pipeline diagram here. So we're gonna plot time, and then we're
going to step through a basic instruction sequence here, and this basic instruction
sequence is actually going to start here. We're going to have instruction one and
instruction two. Instruction one is taking some register
and adding it to something else. We talked about there being data
dependencies or data hazards. In this case, there is no data dependence
and no data hazard here. You'll see that this is range register r1
and this is reading from register r2. So there's no, there's no data hazards
here. We just want to look at the, the control
hazard. The first instruction just goes down the
pipe. Such decode, execute, memory, write back.
Our five stage MIPS pipe. The second instruction it starts going
down the pipe. And the first goes into the, the fetch
stage. But the problem here is we actually need
to stall the fetch stage, or we need to, be, because we don't know that the second
instruction is the second instruction yet. We don't, for instance, we don't know that
this first instruction is not a branch or a jump.
So we don't know the address of the next instruction.
That's kind of odd. Now why do we not know this?
So, going back to, to this example here. One thing that's common through all these
different cases, is they all need to decode the op code.
Well, where do we do the decoding of the op code?
We don't do that until the decode stage of the pipe.
So we don't do that until here. And we're not able to use that information
until the end of the cycle, which would be sort of here.
And we would need that information to determine what's going on here.
So if you had a branch, for instance here where it's not able to get around and
change the program counter and change what is being indexed into the instruction
memory on this cycle. So, what we're going to have to do, is
we're going to have to insert a decode bubble here for this structural hazard.
Now, if you sort of play this forward for more instructions, what you're going
realize is this is not very efficient. Every instruction that goes down the pipe
is going to hit a control hazard, and every instruction that goes down the pipe,
you're going to basically going to hit this decode, decoding hazard, and every
instruction now takes two cycles. So your clock per instruction for this is
not gonna be very good. Let's, let's analyze that now so we can,
we can draw this in the other pipeline diagram axis and see that what's happening
here is we're, let's take the execute stage.
We're executing instruction I1, there were no oping.
Instruction I2, no op. I3 no op.
And, and you compute this all out, you end up with a CPI of two.
So your machine is running at strictly half the performance that you want it to
run at. Well that's, that's not very good.
So let's start to talk about some techniques to, mitigate the effect of
control hazards. And, we're gonna actually have a whole
lecture later in the course about branch prediction which is one of the main
techniques, in order to, mitigate control hazards.
But let's, let's move forward here and, and take a look at one of these
techniques, and this technique is speculation.
So what's the solution to this? So the most basic solution is we actually
speculate that the next address is going to not be a branch, or the current
instruction is not going to be a branch, the next address is going to be the PC
plus four. So what does this look like in a pipe?
Well, there's this nice adder here. We're going to take the PC, and if nothing
else is happening sort of later on in the pipe we're just going to be selecting PC+4
on this control path here. So we're just going to be sort of walking
down here, we are gonna be, doing, executing 96, 100, 104 and we are not
actually gonna even look at the instructions, until, lets say, something
more interesting happens. So, we can just speculate, that the next,
next address is PC + four. So, that's, that's great, but that adds
some wriggles. What happens when we have, like a, a jump
here? Hm, so this jump, if we speculated PC plus
four, we went and fetched instruction three here which is at address 104, but
the jumps says we're supposed to go to 304, so this instruction is not even
supposed to execute. So we need some mechanism to kill
instructions in the pipeline, kill live instructions in the pipe.
So how do we, how do we go about doing this?
So let's, let's look at a brief example here.
So we need some way to kill an instruction, and what we're going to do is
we're going to add a multiplexer here, which will, multiplex in a no-op.
And if we have a jump that gets to the decode stage of the pipe, we're going to
wire back in and say, oh, that instruction that we just executed, or the instruction
we just fetched, this one here, it, it's not actually supposed to go down the pipe.
We should, we should kill it. So we're going to swing this mux, and
right at the end of the cycle, we're gonna say, no, that's actually a no op.
We're gonna insert into the pipe, and we're gonna redirect this multiplexer here
to the actual jump location. And this is what I was talking about
before about the extra adder here. Here's our extra adder which is computing
our destination. And sometimes people try to sort of put
these two things together but we're gonna take part of the instruction and we're
gonna take current PC and add to that and that's gonna compute our new destination
of the jump. Yeah.
Sorry, so here's the control on this muck, so we just have to look to see if it's a
jump, or jump and link. And we inserted no op.
Otherwise, we actually take the thing coming out of construction memory.
So let's look at this sort of as a things flowing down the pipe.
If we have instruction one, the add at the beginning of the execute stage,
instruction two here now in the decode stage and we just fetched 104 out of the
PC. As we go forward one cycle, we're gonna
actually take what we, took out of the instruction memory that was 104, and we're
gonna, kill it, and put a new op in it's place.
The jump is now entering the execute stage.
The add is entering the memory stage, and we've redirected the front of the pipe
here, and we're actually fetching now, the destination of the jump, or the
instruction at 304. So an important question pops up here on
the screen. What happens if we have a stall and a jump
in the decode stage at the same time. Are there interactions here, that we
should be worrying about? Hm, that's a tricky one.
Well, the first question is, what is a jump?
What are reasons that a jump would actually stall in the decode stage?
It's not a whole lot. [laugh] In a basic pipe, probably a jump
would not actually stall, in the, in the decode stage.
The more complex pipes, you know, there are sometimes just stall signals that say,
there's some big structural conflict later in the pipe, just stall the whole rest of
the pipe. So it is possible for things to stall.
One important thing is that in a, in a very simple pipe like this, if you have a
[inaudible], let's say there actually is some reason this jump is stalling, and you
have a, a jump in that stage, what happens sort of do we kill the instruction, do we
let it go forward? Both of them are actually possible to do.
More complex pipes might even think about actually allowing the jump to happen, and
sort of squishing out any no ops that get inserted later on in the pipe.
And let's, let's do that from a pipeline diagram perspective, cuz that might, shed
a little light on this. And, instead of drawing this instruction
as continuing down the pipe, we're just gonna put no ops here and dashes there.
So the first instruction goes on the pipe. The jump goes on the pipe, doesn't install
on anything, because we have the PC plus four speculation.
There's no stall here, this add gets fetched, but doesn't ever make it into the
next stage of the pipe because it gets killed.
Then we have, the next add, the, the target of the jump, showing up, and we go
and execute that. And if we look for the resource
utilization, we can plot it the other direction, you'll see the no op moving
forward in pipe stages over time.