Let's continue on and move on to a
different item here. We've talked about structural hazards, now
we're gonna move on and talk about data hazards.
Okay, so what it a, what is a data hazard? So a data hazard occurs when you have one
instruction that depends on the data of a previous instruction, or a previous data
value. Sorry, previous data value depending on
previous instruction is not precise enough.
We want to depend on a previous data value or data value that is generated by a
previous instruction that is still in the pipeline.
And like stall like, structural hazards, data hazards also have a couple different
approaches which we will not talk about all of them today.
But let's, let's start talking let's introduce them at least.
First thing is you can schedule around it. Okay, so what does this mean.
Let's say we have a processor pipeline, and the processor pipeline is generating
values and we have one instruction which dependent on another instruction but the
first instruction let's say takes a couple cycles to generate that value.
The value won't be ready. So, can't go and issue subsequent
instruction. So, we have data hazard here, data
dependency hazard, while you can schedule around it.
So, you could, for instance introduce, no operation instruction into your
instruction sequence, and have the programmer avoid the hazard, if the
programmer knows the micro architecture of the machine.
And this is actually shown up in some earlier processors.
Another famous example of this is the floating point unit for the Intel I860U,
which is a old, sort of, early risk architecture made by Intel.
In the I860 the floating point unit was not interlocked so if you execute a
floating point instruction and you have another instruction, which is coming down
the pipe which uses the result of that, you might get the wrong value.
So the program was the program's responsibility to make sure that didn't
occur. And you actually, put in no ops there.
Next approach, which we'll be talking more about in today's lecture, is to stall.
So if you have a data dependency, you can actually stall earlier, excuse me, stall
later instructions dependent on earlier instructions.
And some of the, the important thing here to note, is you're going to freeze the
pipeline until the, preceding instruction has, generated the value, but hardware
does this freezing. The hardware, does the freezing, and,
we're gonna, develop this more today, but, you actually have to freeze, everything
before that instruction. You just can't, freeze the dependent
instruction. And, that's because the, the sort of,
traffic behind it will catch up on the earlier traffic and sort of pile up into
it. So, if you want to, make a pipeline, which
works out like this, you're going to actually want to, stall everything
earlier. And we'll look at the wiring that you need
to do to do that. Another solution is you can bypass, so
what this, an example of this is, you can add extra hardware to your data path, and
the extra hardware is going to send the value as soon as it gets created, so you
may not have to wait for it to get to the end of the pipeline.
So if the data value gets made early, you can just forward that to an instruction
which needs it, but that adds extra hardware and complexity to your design.
And finally we're gonna talk about not in this lecture, this is the one thing, the
solution we'll talk about later is, you can speculate.
So, if you have a data hazard, you could assume it's not a problem or you could
assume that, you know, everything's gonna be okay.
I'll just use the encrypt value for a little bit of time and we'll assume that
the value, the old value is equal to the new value or you do data speculations,
other ways to do this. And if you make a mistake you catch it by
the time you get to the end. And you basically have to re-execute the
instruction with the correct value then. So you can do speculation.
And this is kind of like a big guessing game here.
But this is used in out of order processors in multiple different ways.
And we'll be talking about that in a couple lectures from now.
Okay so let's look at the an example data hazard executing on our processing
pipeline. So we have two instructions here.
We have an add with an immediate. Of register zero plus ten in your register
one. So we're gonna be using in this class the
notation where the left most operand or the left most register is the destination.
And the right operands are the input and the source operands.
So we're gonna add ten plus zero R0 in, in MIPS R0 is hardwired to zero, and we're
gonna add that into R1. And then this next instruction here is
going to exhibit a read after write data dependence.
And this read after write data dependence, we have an addi, and it's just going to
use exactly this value which gets created, the instruction right before.
So this is going to take R1, add seventeen to it, and put it in, deposit it into R4
or register four. But we have a little bit of a challenge
here because it uses the result of the instruction right before it.
Hm. Okay.
So, what, what happens in this design? Let's say our instructions are merging
down. And, we have the first add here and the
second adds back here. Okay, nothing bad happened so far.
Now the first add goes here and the second add is here, okay nothing bad has happened
so far. But the question is what do we fetch out
of the register file for the second add. Because the result of running a one is
available here but it hasn't made it back into the register file yet.
So it's actually gonna fetch the old value.
Hm. That's, that's not good.
So if you were just to play this without any stalling or interlocking, you'll
actually have this instruction read the old value of R, or excuse me this
instruction, the second instruction read the old value of R1, and not the new value
that we want it to read. So, we need to think about this a lot
harder. Yeah, R1 is stale.
Oops, we made a mistake. So how do we go about resolving these
hazards. Well we want to somehow detect them, these
data hazards and then we want to feed this information back and, later stages provide
dependence information to earlier stages. So this is a later stage.
This is an earlier stage. And we're going to feed information back
here. And dependent on that information that is
fed back, we're either going to stall or kill instructions.
So the most basic example here is we're gonna have stage four influence stage
three, and can say stage three can make some decisions based on that and can maybe
stall or kill instructions. And likewise stage three will influence
stage two and stage two will influence stage one.
But this is not really good enough. Because let's say stage four tells stage
three to do something. If stage three doesn't tell the earlier
stages, we are gonna pile up, like cars are gonna pile up here all into stage
three. So this typically means that you need some
higher level, sort of, feedback here, where you have Stage four giving
information to all the previous stages, Stage three giving information to all the
previous stages, and vice versa. So that you'll actually be able to provide
information to all the previous stages, and then everyone can make a decision
based on this. And it's really important here is, control
your pipeline like this. Basically requires that a pipeline this is
really important because if you don't and if you have sort of feedback going the
other direction. You might end up with some sort of
deadlock in your processor because if you have a early instruction which dependent
on a later instruction then you might have that resource might never get free cause
the. Later instruction might be dependent on
the earlier instruction, vice versa. And, all of a sudden, you have a sort of
big cycle, and everyone is dependent on everyone, and the machine just stops.
So it's really important that when you're doing this that stage I plus one only
feeds strictly information back to stages I to, or one to I.
Okay, so let's resolve some data hazards and look at how we'd actually do this on
our simplified pipeline here. We're going to use the same example case
here, we have two adds and we have a dependence between R through register one.
So, the first thing to realize that we're going to have to do is we're gonna have,
where, where do we need to stall the pipe line?
Or where do we need to stop the pipe line? Or interlock the pipe line?
And we're gonna look at these two adds, it wasn't a problem until the second add went
to go and read the register file here. If the first if we, if we didn't read the
register file until later, this would not have been a problem.
Because the value might have been up to date.
But because we read it so early here, we pipeline sort of when data gets computed
and when it gets written back. And read into different stages.
We have this, this, this challenge. So, what we're going to do is we're going
to stall here. And reread the register file over and over
again until the first value. Of our one gets written to the register
file. Unfortunately, we're gonna be waiting a
while here. Because we don't write the register file
here. We don't write the register file here.
We write the register file here. Through that wire.
So we like to we're gonna stall multiple cycles here.
So when we're doing the stalling in the stage we're gonna wanna think really hard
on what should be going down the pipe during that time.
Well, we've stalled instruction, the second ad here, instruction two here, here
and obstruction one keeps flowing down the pipe so we don't want to stall these later
stages. We only want to stall this stage and we
need to stall the previous stages so, instruction don't pile up into us.
Now how do we go about, from a wiring perspective.
Executing instructions, and sort of having the first set of instructions clear out of
the pipeline. Well, we're actually going to insert a
multiplexer here on the instruction register side of things.
And this multiplexer is going to insert no op instructions, or no operation
instructions. We're actually going to be inserting
bubbles into the pipe right here. So that the first instruction, the first
add, can clear out of the pipe. And we know what's, what's, executing
here. So, we don't want to accidentally have the
second instruction start executing, even though it's stalled here.
Cuz then it's gonna possible change state as it goes down the pipeline.
And it'll also affect sort of dependency calculations here.
So we wanna, we wanna insert no ops. And the stall condition, as I said, goes
to the program counter, the instruction register.
So these flip flops before the stall point and it's also going to go into the select
line on this multiplexer to choose to insert no loss.
Okay, so that's the beginning of this and I want to just say that this is sometimes
called interlocking or interlocks. Its an important nomenclature here.
You're interlocking the execution of the instruction here and depending on the
other ones then you're actually checking that sometimes people call stalling it or
locking. Okay.
So let's draw a pipeline diagram of what's going on here.
So we're going to plot time, on the X axis, and instructions on the vertical
axis. And we'll look at the other the resource
graph also. Okay, so we have our first instruction
going down the pipe here. It takes register zero, adds ten to it,
and it's gonna go fetch, decode, execute, memory, write back.
Second instruction starts going down the pipe here.
Third, fourth, fifth well you'll note something here we've stalled and the
pipeline. Because we need to wait.
For the first instruction to write back to the register file, before we can go read
the result, or read that value from the register file.
So we strictly have to have this instruction here, the second instruction,
the dependent instruction, in the decode stage, and read the register file here on
the cycle after the write-back occurs. And we detect this stall condition this
whole time. As, as denoted here with the, the nice
purple box. And.
As I said, we need to stall earlier instructions.
So this doesn't stall not only instruction I2 but installs instruction I3 cuz it's in
a earlier stage in the pipe and it needs to be, it needs to be stalled.
Okay. So we could also graph this the other
direction and, and the reason it is useful to graph the other direction is you can
where no operations get inserted or no ops get inserted.
So here we, we've plotted the other direction of time versus stage of the
pipeline or resource. And you can see the we can see what's in
the different stages, in the different stages, and at some point here, there's
nothing in these stages. Instead we've actually inserted no ops,
and it's, comes from the fact that we basically have I3 sitting in the
instruction fetch for three cycles and i2 sitting in the decode for three cycles,
and the later stages of the pipe get no ops inserted and that is what that
multiplexor is doing. So now that we've talked about how
stalling happens in a pipeline diagram, lets move on and look at, what the logic
looks like inside of this. So here we have our data path for our
stall our data path for our five stage pipe.
And in stalling, what we are really trying to do here is, detects the case when a
earlier instruction writes are register which a later instruction is going to use.
So in this case we are going to detect the FA instruction in the decode stage is
reading a value which instruction in the execute.
Memory stage or write back stage writes. So an uncommitted instruction writes to a
register. And a previous instruction here.
Or, excuse me, a later instruction goes and reads that, and we stall at the decode
stage. When we stall, we're gonna actually wanna
stall everything behind it. As we'd sort of already talked about here.
Okay. So let's start calculating the signal
here. We'll call it.
It's the control signal, so we'll call it c.
We'll call it c stall. So we'll draw a little, blob up here.
We'll call this the, the stall calculation.
And what goes into this calculation? Well, it's sort of a complex calculation.
But the first thing we're gonna wanna to check is, is we're gonna want to check,
the, destination for the operand. So we're gonna check the destination for
in some instruction that was an earlier instruction.
And this is the register identifier, not the data value, but instead the register
identifier. So this would be RD in a typical MIPS
instruction. And we're gonna compare that, and we'll
call it WS in this calculation. And we're gonna compare it to the two
source inputs here or the, the source operand register identifiers.
And these are both because of 32 registers in MIPS, these are both five, or these are
all five bit values, all three of these values.
And we're gonna wire that all into our stall control unit here.
Okay. And if we get a match, the most basic
thing we're gonna wanna do is we're gonna say stall everything earlier.
And, insert no op instructions later down the pipe.
So the stall's going to control a lot of things.
It's basically going to control the front end of the pipe.
And it's gonna disallow in the instruction here for moving far forward in the pipe,
if anything in these later stages has a same-destination operand, or in this case
so far, we're comparing against just this location.
So, we're comparing against if the write back stage has the same register
identifier destination as either of the two source operands for the instruction at
the, at the decode stage. Okay, so, what should we do if, should we
just stall always if the RS fields, one of these source fields, here or here, matches
RD here? Some RD.
Should we just always stall in that case? Well, hm.
Not every instruction is going to write a register.
So, what if we have an instruction for instance something like a store
instruction which does not write a register.
So if this, we have a storage instruction which is not ready to register, we
probably shouldn't be doing this compare operation because we can have better
performance if we don't stall under those conditions, and we're gonna introduce a
signal here called write enabler, WE. And WE's gonna get wired into our stall
calculation. Likewise, not every instruction reads,
both input operands. And a good example of this is an immediate
instruction. An immediate instruction only reads one of
the source, operands. Or the source register identified
operands. And the other value comes from the
immediate bits, which are in the instruction coding.
So this is going to do, introduce a read enable calculation here, because not every
instruction reads, reads a register. Okay, so lets calculate this a little bit
more here, so we are going to have something which calculates the destination
and I will talk about what this blob is here in a second.
But then we need to add effectively, we write enable bits or we write the enable
for this location in the pipe. And need to these read enable signals
here. And these get calculated from the
instruction registers in the respective locations in the pipe.
So, some decode bits there happening, or maybe it all gets decoded here in the
decode stage, and then we just pipeline those bits forward.
And we do this calculation here, and now we can say well if the instruction in the
decode stage matches what is being written back, the write back value, then we stall.
Okay. That's, that's close to our full solution.
Let's talk about this circle here. What's going on in this circle?
Well, what you might notice is, something like a jump and link or a jump and link
register have an implicit destination target.
The destination is not actually encoded in RD field of the instruction.
So, instead we need to add a multiplexor here, which multiplexes from the bits out
the instruction register and just a hard coded value of 31, which is gonna denote
register 31. And by doing that, we can handle jump in
links and jump in link registers. Okay, so to, to finish this out, we want
to compare not against just in the right back stage of the pipe.
But we wanna compare in all three of these subsequent stages here, here, and here.
So we add extra calculation logic here, which computes the, right enable.
And, the register identifier, based on the instruction register.
And, this might just be, generate early in the pipe4 depending on your pipeline
design. That's sort of more traditional pipe line
control. In, all this right now, is ignoring a
jumps and branches. If you introduce jumps and branches,
things get, a little bit more complicated. We'll talk about that in the control
section, control hazard section of today's lecture.
So now lets go on and talk about different instructions.
And where they have, where they, yet there's source operands.
And what is their destination operens? And if every instruction in something like
Mips, has all sources and all deaths. So this is just summing up what I said
before, is that not every instruction reads, and not every instruction writes.
So, as we can see here, ALU instructions. Read two operands and running operands,
but to store reads two operands, writes no operand.
Jumps and links only write and don't read. So, it's a mix and match, even in
something like MIPS. And one other thing I wanted to point out
is, where this is actually encoded in the instruction moves around a little bit.
Mips tried to make this relatively uniform, but there's some examples here
where you see the destination changes a little bit between immediates versus
non-immediates, and this is just cuz they didn't the encoding space there to leave
everything in a fixed location. Something writes a destination or not.
And, we have two things. We have a source, or the source operator
and identifier. And then we have, the right enable,
whether it is, being ridden or not. And, as you can see, there's a little,
like, case statement here dependent on the instruction type.
And this is, the instruction which is in the, later stages of the pipeline that is
executing. So we have if the ALU instruction comes
out of our D, if it's a ALU, immediate instruction or a load, it comes out of RT,
if it's a jump, jump link, it's R31, and this says what you need to compare
against. And then, whether you need to write or not
is a little bit complex here if you have a LU, an LUI, or a write.
It writes it, except the case if the right source, or the right, right numbers or,
the right register identifier is zero. Cuz in MIPS, the zero register is a throw
away register, you don't need to interlock against it.
You wouldn't be incorrect if you did interlock against it, you would just have
slower performance. And then jump and link, jump and link
register, always write and then everything else doesn't write the register file.
So, we've, this is a, so the first part of our calculation.
And now we need to, do a calculation, of whether we actually read the value.
And there's gonna be two of these, one for the first operand, one for the second
operand. Okay let's build up for the different
instructions here. So what's basically transforming this
table into some logic equations that we're gonna use.
So, ALU, ALUI, loads, stores, branches, they all read.
Jump it register jump and link, register all read, at least the first source
operand. So, RE1 gets set to true or one or on if
it's any of these op codes. But for jump and jump link, which don't
read a first operand it doesn't, that comparison has to not compare against this
value otherwise you'd be stalling too often.
For, the second operand, only, true ALU instructions, so not immediate
instructions, and stores read that second operand.
And everything else does it. Okay, so now let's put together the actual
stall signal, and this is the stall signal for sort of the decode stage and the fetch
stage of the pipe. And we're gonna end up with stall equaling
a comparison between the source register identifier in the decode stage compared
with the right. Register identifier in the execute stage
And. That it's actually writing or we need to
check with the memory stage. So the same calculation with the memory
stage. Same calculation here for the write back
stage. And then we also want to make sure, so we
take this whole expression, and we end it with whether we actually have a read
enable for the first source operand. Cuz if we don't read the first source
operand, there's no reason to stall for it.
And we do a similar sort of thing here, for the second source operand.
We use, RE2 that we derived here, and we ended together with an expression which
says. Does the RT, the registry identifier RT in
the instruction, is it the same thing as, the different destination, register
identifiers in the subsequent instructions?
In the different stages of the pipe, later stages of the pipe.
Okay, so is this everything? Hm, it looks pretty complicated.
And this is well, this isn't so bad so far.
If we, if we make the pipe longer, we're gonna end up with more terms inside of
these two equations. Well, no, that's not quite the full story.
So what are, what are we missing here? Why else would we have to stall a
pipeline? Well, unfortunately, this only takes into
account instructions where the destination is available right at the end of the
execute stage. Here and here.
This encapsulates it. These two comparisons encapsulate that.
Well, something like a load doesn't necessarily encapsulate that because the
load value is not ready until all the way down here.
So we might need to insert some extra stalls for that.
Also, loads and stores are more complicated because you might have a data
dependence through the data memory itself. This example here.
We have a little snippet of code here. We have a load which is going to take or
excuse me a store here which is going to take register two and write it to some
place in memory. And then we have a load here that we're
going to read out of some place in memory and put into register four.
Okay so the question comes up. Is there any possible data hazard here?
Yes, cuz what if R1 plus seven is equal to R3 + five.
So we're going to be having a case where you actually have two different values
here where one is, needs to pick up the data value of the previous store.
So the load needs to pick up the data value of the previous store if, and only
if, this is equal to this. Hm.
Okay. So that's, that's not so bad.
So let's look at these, data hazards a little bit more and figure out how we can
derive the equation to check for them. So, just a recap here, our example is we
have registers R2 are storing it into a location here, and we're reading from
possibly the same location. We don't know.
So what if our, our one plus seven is equal to R3 plus five?
Well, first of all we're writing and reading to the same address in time.
Right next to each other. Well, our hazard is actually avoided
because our memory system is so fast. Because everything goes down the, the
pipeline, an order will actually go to, right to the memory.
And the next cycle we'll be able to read out of that memory.
And we'll pick up the new value. Pick up the, the changed value.
But I want to introduce this because in more realistic memory systems, this
requires much more careful handling, because if you have a memory system which
takes multiple cycles for the store to happen, or the store happens let's say, at
the end of the pipe into the, the memory then you're not gonna necessarily get that
value, and you might need to bypass that or you might need to do something more
intelligent. Okay, so let's, we talked about stalling
the pipeline. Now let's look at, if we want to improve
the performance some more. So one of the things you may not have
noticed in that stall, but did happen, is that if there were any instructions in the
later portion of the pipeline. Which in earlier, its instruction decode
stage [inaudible], it stops. So, no place do we actually afford the
data values early. And now we want to talk about forwarding
and bypassing, of how to add extra data paths, to allow a value to be sent from a
later stage to an earlier stage, faster than having to wait for the right back of
the pipeline to occur. So here we have the, bypassing, or here we
have our data path that we had from before.
And, what I'm trying to get at is, you have the problem that you have a value
here or you have, you have an instruction here.
And if there was any instruction which write the source, writes through a operand
register or identifier that this instruction is gonna wanna read, it's
going to stall. But, as you might notice, a little bit of
insight here, is, if you have an add, you can actually try to read this value early.
But our data path is not good enough to do that right now.
Okay, so let's add in a bypass here. So we're gonna add in this bypass which
takes the result of the ALU, turns it around, and puts a multiplexer here, and
we can now detect whether using sort of a similar sort of signal as our stall
signal, we can detect whether two operands match, and if they do, we get the result
value out of the ALU early, and run it through this multiplexer.
Okay, so an important question is does this help our example we had before from
performance perspective. So our stalling logic that we put in was
good enough to make sure there wasn't error.
But its not good enough to actually have good performance because you have to wait
for the value to get to the register file before you go ahead.
So, here we have the same example that we had earlier in class where you have,
something which writes register one and something that reads register one, and to
ALU add instruction, in this instruction back to back.
So does, does this get help? Well, yes.
So this you can see, clearly see that this instruction is, the, the result here is
gonna come back around and we only, we, we effectively don't have to stall the second
instruction. Because it can pick up that data value
right then and there. The data value gets calculated in this
stage, it can sort of loop around real fast here.
And we don't have to stall at all. In this case.
Okay so, quick quiz question here. Two other cases.
We have a memory operation, we just erased memory one, and then we have a jump in
link, and then an add. Does this bypass, right here, help, with
these two cases? Well.
We said it helped here. This case here.
Well, when is the memory, load? This is a load.
When does the load result get calculated? The load result doesn't get calculated
until the output here of the data memory or right here.
So all the son. That's after this bypass.
So it's too late. So we still need to stall the pipeline
here for load with a dependent instruction, dependent on the load.
So we're gonna stall there. Okay.
Now, a trickier one. Jump and link 500.
And then something which reads R31. So, little bit of, background here on the
MIPS instruction set. Jump and link implicitly writes to
register 31. It's the, the link register.
Okay. So, so that means we have a data
dependence. Where, where does the jump in link, what,
what gets put into R31? So it's the program counter or the program
counter plus four is what it is architected as in MIPS.
You could probably build it either way depending on how you do jump register.
So on first, first look this looks like it should actually like solve a lot of
problems we should be able to by pass our result of our jump in link to right where
it needs to go. Mm.
It's a little unsatisfying though, because, if you look at the rest of the
pipe here, if you have a jump. Let's say in the execute stage.
How, how are, you know, is, is the consumer of that instruction gonna be here
or not? So this one's kind of a trick question.
So, does it help? Well, you can bypass out and around, but
the thing after it in the pipe is probably not going to be appropriate instruction.
So, if I were to answer this question, does it help?
I'd probably say no. Because at least in the pipe drawing here,
you're not gonna be executing the subsequent instruction, or executing the,
even if this is, the instruction at 500, you're not going be actually executing
that. You'll probably have to, sort of, wait for
that, jump to resolve somewhere further down the pipe, and, then go pick it up.
So the bypass, bypasses don't always help. And, especially in something where it's
not a, a fully bypassed pipeline. Okay.
Oh, before I move off this slide, I wanted to, say that this is called bypassing.
Sometimes this is also called, forwarding values.
And we're gonna be using those terms interchangeably in this class.
Okay so now, now we get more, more details here.
We start to look at how to derive the bypass signal.
We're gonna do, we're gonna build this the same way we derived the stall signal and
we're gonna take terms out of the stall calculation we had before.
So, if you will recall, we have the pipeline diagram here with the stall
signal. We have stalled sa, stages, and we ended
up having to stall in this case where we have a ALU op followed by an ALU op that's
dependent. So each, each stall or kill introduces a
bubble into the pipeline and this is gonna give us a clocks per instruction over one.
With that new data path, which bypasses out of the ALU into input operand A, we
can see that we actually can remove all these stalls and just do the bypassing.
So, it actually shrinks the time taking to execute this code.
And this new data path is really a great thing here.
This bypass has taken us from greater than one clock per instruction to one clock per
instruction. And we're actually forwarding out of the
execution unit for one into the decode unit here and gets consumed here.
And the execution unit in time three for the, the instruction two.
Okay so let's drive the bypass signal. And we'll start off by looking at our
original stall signal. So this is just the stall signal we had
before. And, first thing we're gonna do.
Is if you look at this, this case right here where we compared, the execute right
destination to the, decode first source operand.
We don't need that anymore. We added a bypass.
And we added a forwarding signal to handle that case.
So just sort of put a line through that. Okay, the next question that comes up is
we had in this diagram here, we added this multiplexer here, to choose between
reading from the register file and reading the data which came out of the arithmetic
logic unit. And we need to ask, what is the control on
that multiplexer? Well, it's the exact same case that we
just crossed out. When that case is true, we wanna do the
bypass. So we just take that, those terms, and put
it here. And that's actually the control in the
multiplexer, a source here. Is this correct?
Hm. Is this the full story?
Well, unfortunately no. It's close, really close.
It looks like it should work, but unfortunately.
Only ALU and arithmetic logic immediate instructions can benefit from this.
If you have something like a load, you need to wait for the data value to show
up. So, this, a source here needs to have some
component saying, make sure it's a load, or make sure it's not a load, if you will.
And up here, we actually reintroduce that term, checking to see whether it's
something with a load. So what we're gonna do is we're gonna
split the write-enable into two components.
The write enable that you're bypassing and the write enable that you're stalling.
And we're gonna re-introduce these sort of two components here with two slightly
different write enables, dependent on the decode of the instructions in the execute
stage of the pipeline, Okay so let's do that.
And we replaced, we still have this term back in the stall signal.
We still have this term in the, the bypassing signal, but we now have two
different write enables. One for bypass calculation and one for a
stall calculation. And the bypass calculation these two
different signals are, calculated based on the decode of the instruction in the
execute stage. So we bypass only when it's a, a,
arithmetic logic unit, or an immediate arithmetic instruction and let's say the
destination is not zero. And we stall if it's a load or a jump in
link or jump in link register also falls into that case.
And that's when we have to do the stall. Because the ju, jump in link, and jump in
link register And at least. This data path we only have this
multiplexer here, register 31 at the end. You can build data paths which have
different multiplexer's for that and you might be able to remove that clause from
this. Okay, so what I notice here is our loads,
and jumping link registers, and jumping links are gonna stall when we have a match
on the registers. And something like ALU instructions
generally, write and able to bypass is going to not stall and use the bypass of
the following logic. Okay, so let's take a look at what this
looks like for a fully by-pass data path. So our fully by-pass data path we're going
to add all the destinations locations out of here, out of here.
We're going to run that back and we're going to add to big multiplexors here.
Because in our first case we only multiplexed for the first source operand,
or the A source operand. But we want actually want to multiplex the
inputs for a and b the two source operands.
And we're also gonna add this PC here for the jump and link that handles some of the
more complex pieces here because we, otherwise we have to put multiplexers here
for sort of R31 of multiplexing in the PC or something like that.
So we've effectively be able to bypass everything here.
So the question here is, is there still a need for the stall signal.
So this is more than what we had before. This is more than just a source.
We now can bypass out of not only here to there but we can bypass out of after the
memory operations. So maybe this changes our stall signal so
that we don't need to stall on loads anymore.
That'd be great. They'd have better performance.
Well, unfortunately, no. We still need this.
You still need to check. If the opcode is, is a, is a load, in this
stage of the pipe, even with a fully bypassed data path.
So we've, we've resolved a bunch of the data hazards but the loads, still need to,
wait, or the instructions dependent on loads still need to wait, because you
don't know, the results of the value, until, you come out of here.
So you can't issue a subsequent instruction, into the ALU stage early.
You need to stall. But, this is basically our full stall
calculation at this point. Because we add all those bypasses, we've
removed a lot of the other complexity from our stall signal.
And in this case you'll see that loads have a latency of two cycles.
Okay. So, as I said, the last technique you can
look at is speculation, where you try to guess things.
Guess data values. Guess things like that, or try to execute
code out of order. We're gonna talk about that later in the
course. That's, that's not really in today's
lecture. It's not really review material, but we
will, we will discuss that to some, some extent.
So now we're gonna move on to talking about control hazards, and because we're
running a little low on time we'll look at that a little bit more in the next
lecture.