1
00:00:03,037 --> 00:00:07,051
Let's continue on and move on to a
different item here.

2
00:00:07,051 --> 00:00:13,036
We've talked about structural hazards, now
we're gonna move on and talk about data

3
00:00:13,036 --> 00:00:17,089
hazards.
Okay, so what it a, what is a data hazard?

4
00:00:17,089 --> 00:00:24,040
So a data hazard occurs when you have one
instruction that depends on the data of a

5
00:00:24,040 --> 00:00:27,097
previous instruction, or a previous data
value.

6
00:00:27,097 --> 00:00:34,382
Sorry, previous data value depending on
previous instruction is not precise

7
00:00:34,382 --> 00:00:37,729
enough.
We want to depend on a previous data value

8
00:00:37,729 --> 00:00:44,206
or data value that is generated by a
previous instruction that is still in the

9
00:00:44,206 --> 00:00:49,963
pipeline.
And like stall like, structural hazards,

10
00:00:49,963 --> 00:00:54,378
data hazards also have a couple different
approaches which we will not talk about

11
00:00:54,378 --> 00:00:58,034
all of them today.
But let's, let's start talking let's

12
00:00:58,034 --> 00:01:02,020
introduce them at least.
First thing is you can schedule around it.

13
00:01:02,084 --> 00:01:07,049
Okay, so what does this mean.
Let's say we have a processor pipeline,

14
00:01:07,049 --> 00:01:13,019
and the processor pipeline is generating
values and we have one instruction which

15
00:01:13,019 --> 00:01:18,637
dependent on another instruction but the
first instruction let's say takes a couple

16
00:01:18,637 --> 00:01:23,069
cycles to generate that value.
The value won't be ready.

17
00:01:23,069 --> 00:01:26,034
So, can't go and issue subsequent
instruction.

18
00:01:26,034 --> 00:01:30,069
So, we have data hazard here, data
dependency hazard, while you can schedule

19
00:01:30,069 --> 00:01:33,057
around it.
So, you could, for instance introduce, no

20
00:01:33,057 --> 00:01:37,064
operation instruction into your
instruction sequence, and have the

21
00:01:37,064 --> 00:01:42,023
programmer avoid the hazard, if the
programmer knows the micro architecture of

22
00:01:42,023 --> 00:01:45,059
the machine.
And this is actually shown up in some

23
00:01:45,059 --> 00:01:49,067
earlier processors.
Another famous example of this is the

24
00:01:49,067 --> 00:01:54,066
floating point unit for the Intel I860U,
which is a old, sort of, early risk

25
00:01:54,066 --> 00:01:59,020
architecture made by Intel.
In the I860 the floating point unit was

26
00:01:59,020 --> 00:02:04,013
not interlocked so if you execute a
floating point instruction and you have

27
00:02:04,013 --> 00:02:09,031
another instruction, which is coming down
the pipe which uses the result of that,

28
00:02:09,031 --> 00:02:13,043
you might get the wrong value.
So the program was the program's

29
00:02:13,043 --> 00:02:15,833
responsibility to make sure that didn't
occur.

30
00:02:15,833 --> 00:02:22,231
And you actually, put in no ops there.
Next approach, which we'll be talking more

31
00:02:22,231 --> 00:02:27,590
about in today's lecture, is to stall.
So if you have a data dependency, you can

32
00:02:27,590 --> 00:02:33,032
actually stall earlier, excuse me, stall
later instructions dependent on earlier

33
00:02:33,032 --> 00:02:36,635
instructions.
And some of the, the important thing here

34
00:02:36,635 --> 00:02:43,565
to note, is you're going to freeze the
pipeline until the, preceding instruction

35
00:02:43,565 --> 00:02:47,029
has, generated the value, but hardware
does this freezing.

36
00:02:47,029 --> 00:02:51,068
The hardware, does the freezing, and,
we're gonna, develop this more today, but,

37
00:02:51,068 --> 00:02:55,038
you actually have to freeze, everything
before that instruction.

38
00:02:55,038 --> 00:02:58,019
You just can't, freeze the dependent
instruction.

39
00:02:58,019 --> 00:03:02,053
And, that's because the, the sort of,
traffic behind it will catch up on the

40
00:03:02,053 --> 00:03:05,011
earlier traffic and sort of pile up into
it.

41
00:03:05,011 --> 00:03:09,056
So, if you want to, make a pipeline, which
works out like this, you're going to

42
00:03:09,056 --> 00:03:12,020
actually want to, stall everything
earlier.

43
00:03:12,020 --> 00:03:15,071
And we'll look at the wiring that you need
to do to do that.

44
00:03:16,034 --> 00:03:21,008
Another solution is you can bypass, so
what this, an example of this is, you can

45
00:03:21,008 --> 00:03:26,000
add extra hardware to your data path, and
the extra hardware is going to send the

46
00:03:26,000 --> 00:03:30,098
value as soon as it gets created, so you
may not have to wait for it to get to the

47
00:03:30,098 --> 00:03:34,069
end of the pipeline.
So if the data value gets made early, you

48
00:03:34,069 --> 00:03:39,025
can just forward that to an instruction
which needs it, but that adds extra

49
00:03:39,025 --> 00:03:45,003
hardware and complexity to your design.
And finally we're gonna talk about not in

50
00:03:45,003 --> 00:03:49,083
this lecture, this is the one thing, the
solution we'll talk about later is, you

51
00:03:49,083 --> 00:03:53,009
can speculate.
So, if you have a data hazard, you could

52
00:03:53,009 --> 00:03:57,096
assume it's not a problem or you could
assume that, you know, everything's gonna

53
00:03:57,096 --> 00:04:00,079
be okay.
I'll just use the encrypt value for a

54
00:04:00,079 --> 00:04:05,096
little bit of time and we'll assume that
the value, the old value is equal to the

55
00:04:05,096 --> 00:04:08,941
new value or you do data speculations,
other ways to do this.

56
00:04:08,941 --> 00:04:13,045
And if you make a mistake you catch it by
the time you get to the end.

57
00:04:13,045 --> 00:04:17,013
And you basically have to re-execute the
instruction with the correct value then.

58
00:04:17,013 --> 00:04:20,013
So you can do speculation.
And this is kind of like a big guessing

59
00:04:20,013 --> 00:04:22,009
game here.
But this is used in out of order

60
00:04:22,009 --> 00:04:25,078
processors in multiple different ways.
And we'll be talking about that in a

61
00:04:25,078 --> 00:04:30,070
couple lectures from now.
Okay so let's look at the an example data

62
00:04:30,070 --> 00:04:34,010
hazard executing on our processing
pipeline.

63
00:04:34,010 --> 00:04:39,020
So we have two instructions here.
We have an add with an immediate.

64
00:04:39,020 --> 00:04:42,095
Of register zero plus ten in your register
one.

65
00:04:42,095 --> 00:04:49,071
So we're gonna be using in this class the
notation where the left most operand or

66
00:04:49,071 --> 00:04:56,032
the left most register is the destination.
And the right operands are the input and

67
00:04:56,032 --> 00:05:01,064
the source operands.
So we're gonna add ten plus zero R0 in, in

68
00:05:01,064 --> 00:05:05,099
MIPS R0 is hardwired to zero, and we're
gonna add that into R1.

69
00:05:05,099 --> 00:05:11,061
And then this next instruction here is
going to exhibit a read after write data

70
00:05:11,061 --> 00:05:15,033
dependence.
And this read after write data dependence,

71
00:05:15,033 --> 00:05:21,002
we have an addi, and it's just going to
use exactly this value which gets created,

72
00:05:21,002 --> 00:05:26,001
the instruction right before.
So this is going to take R1, add seventeen

73
00:05:26,001 --> 00:05:29,094
to it, and put it in, deposit it into R4
or register four.

74
00:05:29,094 --> 00:05:34,013
But we have a little bit of a challenge
here because it uses the result of the

75
00:05:34,013 --> 00:05:36,063
instruction right before it.
Hm.

76
00:05:36,063 --> 00:05:40,037
Okay.
So, what, what happens in this design?

77
00:05:40,037 --> 00:05:44,037
Let's say our instructions are merging
down.

78
00:05:44,072 --> 00:05:51,006
And, we have the first add here and the
second adds back here.

79
00:05:51,006 --> 00:05:55,609
Okay, nothing bad happened so far.
Now the first add goes here and the second

80
00:05:55,609 --> 00:05:58,530
add is here, okay nothing bad has happened
so far.

81
00:05:58,530 --> 00:06:05,123
But the question is what do we fetch out
of the register file for the second add.

82
00:06:05,123 --> 00:06:12,122
Because the result of running a one is
available here but it hasn't made it back

83
00:06:12,122 --> 00:06:17,480
into the register file yet.
So it's actually gonna fetch the old

84
00:06:17,480 --> 00:06:18,300
value.
Hm.

85
00:06:18,300 --> 00:06:24,077
That's, that's not good.
So if you were just to play this without

86
00:06:24,077 --> 00:06:29,090
any stalling or interlocking, you'll
actually have this instruction read the

87
00:06:29,090 --> 00:06:34,083
old value of R, or excuse me this
instruction, the second instruction read

88
00:06:34,083 --> 00:06:39,028
the old value of R1, and not the new value
that we want it to read.

89
00:06:39,028 --> 00:06:42,025
So, we need to think about this a lot
harder.

90
00:06:43,093 --> 00:06:47,080
Yeah, R1 is stale.
Oops, we made a mistake.

91
00:06:48,043 --> 00:06:52,028
So how do we go about resolving these
hazards.

92
00:06:52,028 --> 00:06:59,064
Well we want to somehow detect them, these
data hazards and then we want to feed this

93
00:06:59,064 --> 00:07:06,237
information back and, later stages provide
dependence information to earlier stages.

94
00:07:06,237 --> 00:07:09,573
So this is a later stage.
This is an earlier stage.

95
00:07:09,573 --> 00:07:12,502
And we're going to feed information back
here.

96
00:07:12,502 --> 00:07:18,773
And dependent on that information that is
fed back, we're either going to stall or

97
00:07:18,773 --> 00:07:22,613
kill instructions.
So the most basic example here is we're

98
00:07:22,613 --> 00:07:27,415
gonna have stage four influence stage
three, and can say stage three can make

99
00:07:27,415 --> 00:07:31,653
some decisions based on that and can maybe
stall or kill instructions.

100
00:07:31,653 --> 00:07:36,433
And likewise stage three will influence
stage two and stage two will influence

101
00:07:36,433 --> 00:07:39,867
stage one.
But this is not really good enough.

102
00:07:39,867 --> 00:07:43,124
Because let's say stage four tells stage
three to do something.

103
00:07:43,124 --> 00:07:47,558
If stage three doesn't tell the earlier
stages, we are gonna pile up, like cars

104
00:07:47,558 --> 00:07:50,399
are gonna pile up here all into stage
three.

105
00:07:50,399 --> 00:07:55,148
So this typically means that you need some
higher level, sort of, feedback here,

106
00:07:55,148 --> 00:07:59,485
where you have Stage four giving
information to all the previous stages,

107
00:07:59,485 --> 00:08:04,133
Stage three giving information to all the
previous stages, and vice versa.

108
00:08:04,133 --> 00:08:09,053
So that you'll actually be able to provide
information to all the previous stages,

109
00:08:09,053 --> 00:08:12,544
and then everyone can make a decision
based on this.

110
00:08:12,544 --> 00:08:19,190
And it's really important here is, control
your pipeline like this.

111
00:08:19,190 --> 00:08:26,180
Basically requires that a pipeline this is
really important because if you don't and

112
00:08:26,180 --> 00:08:30,303
if you have sort of feedback going the
other direction.

113
00:08:30,303 --> 00:08:36,058
You might end up with some sort of
deadlock in your processor because if you

114
00:08:36,058 --> 00:08:42,114
have a early instruction which dependent
on a later instruction then you might have

115
00:08:42,114 --> 00:08:45,338
that resource might never get free cause
the.

116
00:08:45,338 --> 00:08:50,418
Later instruction might be dependent on
the earlier instruction, vice versa.

117
00:08:50,418 --> 00:08:55,644
And, all of a sudden, you have a sort of
big cycle, and everyone is dependent on

118
00:08:55,644 --> 00:09:00,692
everyone, and the machine just stops.
So it's really important that when you're

119
00:09:00,692 --> 00:09:06,585
doing this that stage I plus one only
feeds strictly information back to stages

120
00:09:06,585 --> 00:09:12,017
I to, or one to I.
Okay, so let's resolve some data hazards

121
00:09:12,017 --> 00:09:17,099
and look at how we'd actually do this on
our simplified pipeline here.

122
00:09:17,099 --> 00:09:22,345
We're going to use the same example case
here, we have two adds and we have a

123
00:09:22,345 --> 00:09:29,602
dependence between R through register one.
So, the first thing to realize that we're

124
00:09:29,602 --> 00:09:35,626
going to have to do is we're gonna have,
where, where do we need to stall the pipe

125
00:09:35,626 --> 00:09:39,053
line?
Or where do we need to stop the pipe line?

126
00:09:39,053 --> 00:09:43,512
Or interlock the pipe line?
And we're gonna look at these two adds, it

127
00:09:43,512 --> 00:09:48,953
wasn't a problem until the second add went
to go and read the register file here.

128
00:09:48,953 --> 00:09:54,725
If the first if we, if we didn't read the
register file until later, this would not

129
00:09:54,725 --> 00:09:58,676
have been a problem.
Because the value might have been up to

130
00:09:58,676 --> 00:10:01,476
date.
But because we read it so early here, we

131
00:10:01,476 --> 00:10:06,463
pipeline sort of when data gets computed
and when it gets written back.

132
00:10:06,463 --> 00:10:11,259
And read into different stages.
We have this, this, this challenge.

133
00:10:11,259 --> 00:10:16,440
So, what we're going to do is we're going
to stall here.

134
00:10:16,440 --> 00:10:22,130
And reread the register file over and over
again until the first value.

135
00:10:22,130 --> 00:10:25,427
Of our one gets written to the register
file.

136
00:10:25,427 --> 00:10:28,680
Unfortunately, we're gonna be waiting a
while here.

137
00:10:28,680 --> 00:10:31,285
Because we don't write the register file
here.

138
00:10:31,285 --> 00:10:36,285
We don't write the register file here.
We write the register file here.

139
00:10:36,285 --> 00:10:41,899
Through that wire.
So we like to we're gonna stall multiple

140
00:10:41,899 --> 00:10:46,007
cycles here.
So when we're doing the stalling in the

141
00:10:46,007 --> 00:10:51,808
stage we're gonna wanna think really hard
on what should be going down the pipe

142
00:10:51,808 --> 00:10:55,342
during that time.
Well, we've stalled instruction, the

143
00:10:55,342 --> 00:11:01,032
second ad here, instruction two here, here
and obstruction one keeps flowing down the

144
00:11:01,032 --> 00:11:04,463
pipe so we don't want to stall these later
stages.

145
00:11:04,463 --> 00:11:09,622
We only want to stall this stage and we
need to stall the previous stages so,

146
00:11:09,622 --> 00:11:16,808
instruction don't pile up into us.
Now how do we go about, from a wiring

147
00:11:16,808 --> 00:11:21,541
perspective.
Executing instructions, and sort of having

148
00:11:21,541 --> 00:11:25,095
the first set of instructions clear out of
the pipeline.

149
00:11:25,095 --> 00:11:30,412
Well, we're actually going to insert a
multiplexer here on the instruction

150
00:11:30,412 --> 00:11:34,346
register side of things.
And this multiplexer is going to insert no

151
00:11:34,346 --> 00:11:37,107
op instructions, or no operation
instructions.

152
00:11:37,107 --> 00:11:41,849
We're actually going to be inserting
bubbles into the pipe right here.

153
00:11:41,849 --> 00:11:46,320
So that the first instruction, the first
add, can clear out of the pipe.

154
00:11:46,320 --> 00:11:49,331
And we know what's, what's, executing
here.

155
00:11:49,331 --> 00:11:54,503
So, we don't want to accidentally have the
second instruction start executing, even

156
00:11:54,503 --> 00:11:58,995
though it's stalled here.
Cuz then it's gonna possible change state

157
00:11:58,995 --> 00:12:03,799
as it goes down the pipeline.
And it'll also affect sort of dependency

158
00:12:03,799 --> 00:12:08,819
calculations here.
So we wanna, we wanna insert no ops.

159
00:12:08,819 --> 00:12:16,271
And the stall condition, as I said, goes
to the program counter, the instruction

160
00:12:16,271 --> 00:12:20,065
register.
So these flip flops before the stall point

161
00:12:20,065 --> 00:12:25,911
and it's also going to go into the select
line on this multiplexer to choose to

162
00:12:25,911 --> 00:12:30,026
insert no loss.
Okay, so that's the beginning of this and

163
00:12:30,026 --> 00:12:36,008
I want to just say that this is sometimes
called interlocking or interlocks.

164
00:12:36,008 --> 00:12:41,007
Its an important nomenclature here.
You're interlocking the execution of the

165
00:12:41,007 --> 00:12:44,936
instruction here and depending on the
other ones then you're actually checking

166
00:12:44,936 --> 00:12:47,752
that sometimes people call stalling it or
locking.

167
00:12:47,752 --> 00:12:50,327
Okay.
So let's draw a pipeline diagram of what's

168
00:12:50,327 --> 00:12:53,041
going on here.
So we're going to plot time, on the X

169
00:12:53,041 --> 00:12:55,390
axis, and instructions on the vertical
axis.

170
00:12:55,390 --> 00:12:59,138
And we'll look at the other the resource
graph also.

171
00:12:59,138 --> 00:13:04,615
Okay, so we have our first instruction
going down the pipe here.

172
00:13:04,615 --> 00:13:09,814
It takes register zero, adds ten to it,
and it's gonna go fetch, decode, execute,

173
00:13:09,814 --> 00:13:14,445
memory, write back.
Second instruction starts going down the

174
00:13:14,445 --> 00:13:19,524
pipe here.
Third, fourth, fifth well you'll note

175
00:13:19,524 --> 00:13:24,425
something here we've stalled and the
pipeline.

176
00:13:24,425 --> 00:13:31,611
Because we need to wait.
For the first instruction to write back to

177
00:13:31,611 --> 00:13:36,244
the register file, before we can go read
the result, or read that value from the

178
00:13:36,244 --> 00:13:38,779
register file.
So we strictly have to have this

179
00:13:38,779 --> 00:13:43,417
instruction here, the second instruction,
the dependent instruction, in the decode

180
00:13:43,417 --> 00:13:48,991
stage, and read the register file here on
the cycle after the write-back occurs.

181
00:13:48,991 --> 00:13:53,441
And we detect this stall condition this
whole time.

182
00:13:53,441 --> 00:13:59,510
As, as denoted here with the, the nice
purple box.

183
00:13:59,510 --> 00:14:02,253
And.
As I said, we need to stall earlier

184
00:14:02,253 --> 00:14:05,563
instructions.
So this doesn't stall not only instruction

185
00:14:05,563 --> 00:14:10,412
I2 but installs instruction I3 cuz it's in
a earlier stage in the pipe and it needs

186
00:14:10,412 --> 00:14:13,430
to be, it needs to be stalled.
Okay.

187
00:14:13,430 --> 00:14:18,064
So we could also graph this the other
direction and, and the reason it is useful

188
00:14:18,064 --> 00:14:22,312
to graph the other direction is you can
where no operations get inserted or no ops

189
00:14:22,312 --> 00:14:27,944
get inserted.
So here we, we've plotted the other

190
00:14:27,944 --> 00:14:32,758
direction of time versus stage of the
pipeline or resource.

191
00:14:32,758 --> 00:14:39,072
And you can see the we can see what's in
the different stages, in the different

192
00:14:39,072 --> 00:14:43,956
stages, and at some point here, there's
nothing in these stages.

193
00:14:43,956 --> 00:14:50,201
Instead we've actually inserted no ops,
and it's, comes from the fact that we

194
00:14:50,201 --> 00:14:55,638
basically have I3 sitting in the
instruction fetch for three cycles and i2

195
00:14:55,638 --> 00:15:00,838
sitting in the decode for three cycles,
and the later stages of the pipe get no

196
00:15:00,838 --> 00:15:04,649
ops inserted and that is what that
multiplexor is doing.

197
00:15:04,649 --> 00:15:09,494
So now that we've talked about how
stalling happens in a pipeline diagram,

198
00:15:09,494 --> 00:15:13,677
lets move on and look at, what the logic
looks like inside of this.

199
00:15:13,677 --> 00:15:19,557
So here we have our data path for our
stall our data path for our five stage

200
00:15:19,557 --> 00:15:22,648
pipe.
And in stalling, what we are really trying

201
00:15:22,648 --> 00:15:28,783
to do here is, detects the case when a
earlier instruction writes are register

202
00:15:28,783 --> 00:15:36,042
which a later instruction is going to use.
So in this case we are going to detect the

203
00:15:36,042 --> 00:15:42,075
FA instruction in the decode stage is
reading a value which instruction in the

204
00:15:42,075 --> 00:15:45,077
execute.
Memory stage or write back stage writes.

205
00:15:45,077 --> 00:15:48,083
So an uncommitted instruction writes to a
register.

206
00:15:48,083 --> 00:15:53,021
And a previous instruction here.
Or, excuse me, a later instruction goes

207
00:15:53,021 --> 00:15:56,009
and reads that, and we stall at the decode
stage.

208
00:15:56,009 --> 00:16:00,017
When we stall, we're gonna actually wanna
stall everything behind it.

209
00:16:00,017 --> 00:16:03,074
As we'd sort of already talked about here.
Okay.

210
00:16:03,074 --> 00:16:06,039
So let's start calculating the signal
here.

211
00:16:06,039 --> 00:16:09,077
We'll call it.
It's the control signal, so we'll call it

212
00:16:09,077 --> 00:16:11,025
c.
We'll call it c stall.

213
00:16:11,025 --> 00:16:15,031
So we'll draw a little, blob up here.
We'll call this the, the stall

214
00:16:15,031 --> 00:16:18,027
calculation.
And what goes into this calculation?

215
00:16:18,027 --> 00:16:23,025
Well, it's sort of a complex calculation.
But the first thing we're gonna wanna to

216
00:16:23,025 --> 00:16:27,181
check is, is we're gonna want to check,
the, destination for the operand.

217
00:16:27,181 --> 00:16:33,114
So we're gonna check the destination for
in some instruction that was an earlier

218
00:16:33,114 --> 00:16:36,564
instruction.
And this is the register identifier, not

219
00:16:36,564 --> 00:16:39,988
the data value, but instead the register
identifier.

220
00:16:39,988 --> 00:16:44,001
So this would be RD in a typical MIPS
instruction.

221
00:16:44,001 --> 00:16:48,842
And we're gonna compare that, and we'll
call it WS in this calculation.

222
00:16:48,842 --> 00:16:53,302
And we're gonna compare it to the two
source inputs here or the, the source

223
00:16:53,302 --> 00:16:57,634
operand register identifiers.
And these are both because of 32 registers

224
00:16:57,634 --> 00:17:02,606
in MIPS, these are both five, or these are
all five bit values, all three of these

225
00:17:02,606 --> 00:17:05,520
values.
And we're gonna wire that all into our

226
00:17:05,520 --> 00:17:08,249
stall control unit here.
Okay.

227
00:17:08,249 --> 00:17:13,039
And if we get a match, the most basic
thing we're gonna wanna do is we're gonna

228
00:17:13,039 --> 00:17:18,309
say stall everything earlier.
And, insert no op instructions later down

229
00:17:18,309 --> 00:17:21,606
the pipe.
So the stall's going to control a lot of

230
00:17:21,606 --> 00:17:24,943
things.
It's basically going to control the front

231
00:17:24,943 --> 00:17:28,804
end of the pipe.
And it's gonna disallow in the instruction

232
00:17:28,804 --> 00:17:34,712
here for moving far forward in the pipe,
if anything in these later stages has a

233
00:17:34,712 --> 00:17:40,426
same-destination operand, or in this case
so far, we're comparing against just this

234
00:17:40,426 --> 00:17:43,779
location.
So, we're comparing against if the write

235
00:17:43,779 --> 00:17:48,923
back stage has the same register
identifier destination as either of the

236
00:17:48,923 --> 00:17:54,303
two source operands for the instruction at
the, at the decode stage.

237
00:17:54,303 --> 00:18:01,561
Okay, so, what should we do if, should we
just stall always if the RS fields, one of

238
00:18:01,561 --> 00:18:05,541
these source fields, here or here, matches
RD here?

239
00:18:05,541 --> 00:18:11,030
Some RD.
Should we just always stall in that case?

240
00:18:11,030 --> 00:18:16,148
Well, hm.
Not every instruction is going to write a

241
00:18:16,148 --> 00:18:19,781
register.
So, what if we have an instruction for

242
00:18:19,781 --> 00:18:24,439
instance something like a store
instruction which does not write a

243
00:18:24,439 --> 00:18:27,774
register.
So if this, we have a storage instruction

244
00:18:27,774 --> 00:18:32,133
which is not ready to register, we
probably shouldn't be doing this compare

245
00:18:32,133 --> 00:18:36,624
operation because we can have better
performance if we don't stall under those

246
00:18:36,624 --> 00:18:41,400
conditions, and we're gonna introduce a
signal here called write enabler, WE.

247
00:18:41,400 --> 00:18:45,027
And WE's gonna get wired into our stall
calculation.

248
00:18:45,027 --> 00:18:49,312
Likewise, not every instruction reads,
both input operands.

249
00:18:49,312 --> 00:18:52,936
And a good example of this is an immediate
instruction.

250
00:18:52,936 --> 00:18:57,519
An immediate instruction only reads one of
the source, operands.

251
00:18:57,519 --> 00:19:00,636
Or the source register identified
operands.

252
00:19:00,636 --> 00:19:04,941
And the other value comes from the
immediate bits, which are in the

253
00:19:04,941 --> 00:19:09,008
instruction coding.
So this is going to do, introduce a read

254
00:19:09,008 --> 00:19:14,821
enable calculation here, because not every
instruction reads, reads a register.

255
00:19:14,821 --> 00:19:20,256
Okay, so lets calculate this a little bit
more here, so we are going to have

256
00:19:20,256 --> 00:19:25,878
something which calculates the destination
and I will talk about what this blob is

257
00:19:25,878 --> 00:19:30,454
here in a second.
But then we need to add effectively, we

258
00:19:30,454 --> 00:19:35,778
write enable bits or we write the enable
for this location in the pipe.

259
00:19:35,778 --> 00:19:38,274
And need to these read enable signals
here.

260
00:19:38,274 --> 00:19:42,600
And these get calculated from the
instruction registers in the respective

261
00:19:42,600 --> 00:19:46,059
locations in the pipe.
So, some decode bits there happening, or

262
00:19:46,059 --> 00:19:50,708
maybe it all gets decoded here in the
decode stage, and then we just pipeline

263
00:19:50,708 --> 00:19:54,960
those bits forward.
And we do this calculation here, and now

264
00:19:54,960 --> 00:20:01,134
we can say well if the instruction in the
decode stage matches what is being written

265
00:20:01,134 --> 00:20:05,438
back, the write back value, then we stall.
Okay.

266
00:20:05,438 --> 00:20:12,041
That's, that's close to our full solution.
Let's talk about this circle here.

267
00:20:12,041 --> 00:20:17,947
What's going on in this circle?
Well, what you might notice is, something

268
00:20:17,947 --> 00:20:24,257
like a jump and link or a jump and link
register have an implicit destination

269
00:20:24,257 --> 00:20:28,094
target.
The destination is not actually encoded in

270
00:20:28,094 --> 00:20:35,005
RD field of the instruction.
So, instead we need to add a multiplexor

271
00:20:35,005 --> 00:20:41,083
here, which multiplexes from the bits out
the instruction register and just a hard

272
00:20:41,083 --> 00:20:45,087
coded value of 31, which is gonna denote
register 31.

273
00:20:45,087 --> 00:20:52,000
And by doing that, we can handle jump in
links and jump in link registers.

274
00:20:52,000 --> 00:20:58,154
Okay, so to, to finish this out, we want
to compare not against just in the right

275
00:20:58,154 --> 00:21:03,782
back stage of the pipe.
But we wanna compare in all three of these

276
00:21:03,782 --> 00:21:10,302
subsequent stages here, here, and here.
So we add extra calculation logic here,

277
00:21:10,302 --> 00:21:16,051
which computes the, right enable.
And, the register identifier, based on the

278
00:21:16,051 --> 00:21:19,703
instruction register.
And, this might just be, generate early in

279
00:21:19,703 --> 00:21:22,488
the pipe4 depending on your pipeline
design.

280
00:21:22,488 --> 00:21:25,595
That's sort of more traditional pipe line
control.

281
00:21:25,595 --> 00:21:29,068
In, all this right now, is ignoring a
jumps and branches.

282
00:21:29,068 --> 00:21:33,663
If you introduce jumps and branches,
things get, a little bit more complicated.

283
00:21:33,663 --> 00:21:38,541
We'll talk about that in the control
section, control hazard section of today's

284
00:21:38,541 --> 00:21:43,894
lecture.
So now lets go on and talk about different

285
00:21:43,894 --> 00:21:49,286
instructions.
And where they have, where they, yet

286
00:21:49,286 --> 00:21:53,020
there's source operands.
And what is their destination operens?

287
00:21:53,020 --> 00:21:57,075
And if every instruction in something like
Mips, has all sources and all deaths.

288
00:21:57,093 --> 00:22:02,040
So this is just summing up what I said
before, is that not every instruction

289
00:22:02,040 --> 00:22:06,099
reads, and not every instruction writes.
So, as we can see here, ALU instructions.

290
00:22:06,099 --> 00:22:12,069
Read two operands and running operands,
but to store reads two operands, writes no

291
00:22:12,069 --> 00:22:16,042
operand.
Jumps and links only write and don't read.

292
00:22:16,042 --> 00:22:20,008
So, it's a mix and match, even in
something like MIPS.

293
00:22:20,008 --> 00:22:25,093
And one other thing I wanted to point out
is, where this is actually encoded in the

294
00:22:25,093 --> 00:22:31,006
instruction moves around a little bit.
Mips tried to make this relatively

295
00:22:31,006 --> 00:22:36,028
uniform, but there's some examples here
where you see the destination changes a

296
00:22:36,028 --> 00:22:41,021
little bit between immediates versus
non-immediates, and this is just cuz they

297
00:22:41,021 --> 00:22:45,059
didn't the encoding space there to leave
everything in a fixed location.

298
00:22:45,077 --> 00:22:49,049
Something writes a destination or not.
And, we have two things.

299
00:22:49,049 --> 00:22:52,083
We have a source, or the source operator
and identifier.

300
00:22:52,083 --> 00:22:58,023
And then we have, the right enable,
whether it is, being ridden or not.

301
00:22:58,023 --> 00:23:03,053
And, as you can see, there's a little,
like, case statement here dependent on the

302
00:23:03,053 --> 00:23:07,040
instruction type.
And this is, the instruction which is in

303
00:23:07,040 --> 00:23:11,008
the, later stages of the pipeline that is
executing.

304
00:23:11,008 --> 00:23:16,051
So we have if the ALU instruction comes
out of our D, if it's a ALU, immediate

305
00:23:16,051 --> 00:23:22,009
instruction or a load, it comes out of RT,
if it's a jump, jump link, it's R31, and

306
00:23:22,009 --> 00:23:25,001
this says what you need to compare
against.

307
00:23:25,031 --> 00:23:32,062
And then, whether you need to write or not
is a little bit complex here if you have a

308
00:23:32,062 --> 00:23:37,056
LU, an LUI, or a write.
It writes it, except the case if the right

309
00:23:37,056 --> 00:23:42,056
source, or the right, right numbers or,
the right register identifier is zero.

310
00:23:42,056 --> 00:23:47,743
Cuz in MIPS, the zero register is a throw
away register, you don't need to interlock

311
00:23:47,743 --> 00:23:51,043
against it.
You wouldn't be incorrect if you did

312
00:23:51,043 --> 00:23:55,043
interlock against it, you would just have
slower performance.

313
00:23:55,043 --> 00:24:00,070
And then jump and link, jump and link
register, always write and then everything

314
00:24:00,070 --> 00:24:05,083
else doesn't write the register file.
So, we've, this is a, so the first part of

315
00:24:05,083 --> 00:24:09,048
our calculation.
And now we need to, do a calculation, of

316
00:24:09,048 --> 00:24:13,361
whether we actually read the value.
And there's gonna be two of these, one for

317
00:24:13,361 --> 00:24:16,764
the first operand, one for the second
operand.

318
00:24:16,764 --> 00:24:21,322
Okay let's build up for the different
instructions here.

319
00:24:21,322 --> 00:24:27,286
So what's basically transforming this
table into some logic equations that we're

320
00:24:27,286 --> 00:24:33,962
gonna use.
So, ALU, ALUI, loads, stores, branches,

321
00:24:33,962 --> 00:24:39,800
they all read.
Jump it register jump and link, register

322
00:24:39,800 --> 00:24:43,903
all read, at least the first source
operand.

323
00:24:43,903 --> 00:24:50,481
So, RE1 gets set to true or one or on if
it's any of these op codes.

324
00:24:50,481 --> 00:24:57,287
But for jump and jump link, which don't
read a first operand it doesn't, that

325
00:24:57,287 --> 00:25:04,536
comparison has to not compare against this
value otherwise you'd be stalling too

326
00:25:04,536 --> 00:25:09,377
often.
For, the second operand, only, true ALU

327
00:25:09,377 --> 00:25:16,243
instructions, so not immediate
instructions, and stores read that second

328
00:25:16,243 --> 00:25:20,450
operand.
And everything else does it.

329
00:25:20,450 --> 00:25:26,528
Okay, so now let's put together the actual
stall signal, and this is the stall signal

330
00:25:26,528 --> 00:25:31,111
for sort of the decode stage and the fetch
stage of the pipe.

331
00:25:31,111 --> 00:25:41,052
And we're gonna end up with stall equaling
a comparison between the source register

332
00:25:41,052 --> 00:25:47,032
identifier in the decode stage compared
with the right.

333
00:25:47,032 --> 00:25:52,686
Register identifier in the execute stage
And.

334
00:25:52,686 --> 00:26:08,038
That it's actually writing or we need to
check with the memory stage.

335
00:26:08,038 --> 00:26:11,077
So the same calculation with the memory
stage.

336
00:26:12,079 --> 00:26:15,082
Same calculation here for the write back
stage.

337
00:26:15,082 --> 00:26:21,009
And then we also want to make sure, so we
take this whole expression, and we end it

338
00:26:21,009 --> 00:26:25,079
with whether we actually have a read
enable for the first source operand.

339
00:26:25,079 --> 00:26:30,074
Cuz if we don't read the first source
operand, there's no reason to stall for

340
00:26:30,074 --> 00:26:34,066
it.
And we do a similar sort of thing here,

341
00:26:34,066 --> 00:26:39,061
for the second source operand.
We use, RE2 that we derived here, and we

342
00:26:39,061 --> 00:26:42,085
ended together with an expression which
says.

343
00:26:42,085 --> 00:26:48,095
Does the RT, the registry identifier RT in
the instruction, is it the same thing as,

344
00:26:48,095 --> 00:26:53,084
the different destination, register
identifiers in the subsequent

345
00:26:53,084 --> 00:26:57,079
instructions?
In the different stages of the pipe, later

346
00:26:57,079 --> 00:27:03,048
stages of the pipe.
Okay, so is this everything?

347
00:27:04,048 --> 00:27:12,556
Hm, it looks pretty complicated.
And this is well, this isn't so bad so

348
00:27:12,556 --> 00:27:17,034
far.
If we, if we make the pipe longer, we're

349
00:27:17,034 --> 00:27:22,639
gonna end up with more terms inside of
these two equations.

350
00:27:22,639 --> 00:27:31,023
Well, no, that's not quite the full story.
So what are, what are we missing here?

351
00:27:31,023 --> 00:27:34,080
Why else would we have to stall a
pipeline?

352
00:27:34,080 --> 00:27:41,070
Well, unfortunately, this only takes into
account instructions where the destination

353
00:27:41,070 --> 00:27:45,094
is available right at the end of the
execute stage.

354
00:27:45,094 --> 00:27:48,085
Here and here.
This encapsulates it.

355
00:27:48,085 --> 00:27:59,531
These two comparisons encapsulate that.
Well, something like a load doesn't

356
00:27:59,531 --> 00:28:05,482
necessarily encapsulate that because the
load value is not ready until all the way

357
00:28:05,482 --> 00:28:11,046
down here.
So we might need to insert some extra

358
00:28:11,078 --> 00:28:16,084
stalls for that.
Also, loads and stores are more

359
00:28:16,084 --> 00:28:25,086
complicated because you might have a data
dependence through the data memory itself.

360
00:28:25,086 --> 00:28:29,092
This example here.
We have a little snippet of code here.

361
00:28:29,092 --> 00:28:35,086
We have a load which is going to take or
excuse me a store here which is going to

362
00:28:36,008 --> 00:28:40,006
take register two and write it to some
place in memory.

363
00:28:40,006 --> 00:28:46,000
And then we have a load here that we're
going to read out of some place in memory

364
00:28:46,000 --> 00:28:52,085
and put into register four.
Okay so the question comes up.

365
00:28:52,085 --> 00:29:05,110
Is there any possible data hazard here?
Yes, cuz what if R1 plus seven is equal to

366
00:29:05,110 --> 00:29:10,224
R3 + five.
So we're going to be having a case where

367
00:29:10,224 --> 00:29:19,978
you actually have two different values
here where one is, needs to pick up the

368
00:29:19,978 --> 00:29:26,083
data value of the previous store.
So the load needs to pick up the data

369
00:29:26,083 --> 00:29:32,020
value of the previous store if, and only
if, this is equal to this.

370
00:29:32,070 --> 00:29:33,028
Hm.
Okay.

371
00:29:33,028 --> 00:29:38,056
So that's, that's not so bad.
So let's look at these, data hazards a

372
00:29:38,056 --> 00:29:44,044
little bit more and figure out how we can
derive the equation to check for them.

373
00:29:45,079 --> 00:29:54,034
So, just a recap here, our example is we
have registers R2 are storing it into a

374
00:29:54,034 --> 00:30:00,093
location here, and we're reading from
possibly the same location.

375
00:30:00,093 --> 00:30:06,938
We don't know.
So what if our, our one plus seven is

376
00:30:06,938 --> 00:30:13,073
equal to R3 plus five?
Well, first of all we're writing and

377
00:30:13,073 --> 00:30:18,945
reading to the same address in time.
Right next to each other.

378
00:30:18,945 --> 00:30:26,550
Well, our hazard is actually avoided
because our memory system is so fast.

379
00:30:26,550 --> 00:30:31,987
Because everything goes down the, the
pipeline, an order will actually go to,

380
00:30:31,987 --> 00:30:35,096
right to the memory.
And the next cycle we'll be able to read

381
00:30:35,096 --> 00:30:37,623
out of that memory.
And we'll pick up the new value.

382
00:30:37,623 --> 00:30:43,686
Pick up the, the changed value.
But I want to introduce this because in

383
00:30:43,686 --> 00:30:47,867
more realistic memory systems, this
requires much more careful handling,

384
00:30:47,867 --> 00:30:52,015
because if you have a memory system which
takes multiple cycles for the store to

385
00:30:52,015 --> 00:30:56,500
happen, or the store happens let's say, at
the end of the pipe into the, the memory

386
00:30:56,500 --> 00:31:01,593
then you're not gonna necessarily get that
value, and you might need to bypass that

387
00:31:01,593 --> 00:31:05,477
or you might need to do something more
intelligent.

388
00:31:05,477 --> 00:31:12,042
Okay, so let's, we talked about stalling
the pipeline.

389
00:31:15,050 --> 00:31:19,049
Now let's look at, if we want to improve
the performance some more.

390
00:31:19,049 --> 00:31:24,038
So one of the things you may not have
noticed in that stall, but did happen, is

391
00:31:24,038 --> 00:31:28,079
that if there were any instructions in the
later portion of the pipeline.

392
00:31:28,079 --> 00:31:33,025
Which in earlier, its instruction decode
stage [inaudible], it stops.

393
00:31:33,025 --> 00:31:36,098
So, no place do we actually afford the
data values early.

394
00:31:36,098 --> 00:31:42,031
And now we want to talk about forwarding
and bypassing, of how to add extra data

395
00:31:42,031 --> 00:31:47,063
paths, to allow a value to be sent from a
later stage to an earlier stage, faster

396
00:31:47,063 --> 00:31:51,090
than having to wait for the right back of
the pipeline to occur.

397
00:31:56,046 --> 00:32:05,022
So here we have the, bypassing, or here we
have our data path that we had from

398
00:32:05,022 --> 00:32:09,048
before.
And, what I'm trying to get at is, you

399
00:32:09,048 --> 00:32:14,279
have the problem that you have a value
here or you have, you have an instruction

400
00:32:14,279 --> 00:32:17,033
here.
And if there was any instruction which

401
00:32:17,033 --> 00:32:22,045
write the source, writes through a operand
register or identifier that this

402
00:32:22,045 --> 00:32:26,006
instruction is gonna wanna read, it's
going to stall.

403
00:32:26,006 --> 00:32:31,006
But, as you might notice, a little bit of
insight here, is, if you have an add, you

404
00:32:31,006 --> 00:32:36,045
can actually try to read this value early.
But our data path is not good enough to do

405
00:32:36,045 --> 00:32:42,003
that right now.
Okay, so let's add in a bypass here.

406
00:32:42,068 --> 00:32:48,031
So we're gonna add in this bypass which
takes the result of the ALU, turns it

407
00:32:48,031 --> 00:32:54,067
around, and puts a multiplexer here, and
we can now detect whether using sort of a

408
00:32:54,067 --> 00:33:00,037
similar sort of signal as our stall
signal, we can detect whether two operands

409
00:33:00,037 --> 00:33:06,001
match, and if they do, we get the result
value out of the ALU early, and run it

410
00:33:06,001 --> 00:33:12,495
through this multiplexer.
Okay, so an important question is does

411
00:33:12,495 --> 00:33:18,046
this help our example we had before from
performance perspective.

412
00:33:18,046 --> 00:33:25,000
So our stalling logic that we put in was
good enough to make sure there wasn't

413
00:33:25,000 --> 00:33:28,068
error.
But its not good enough to actually have

414
00:33:28,068 --> 00:33:33,056
good performance because you have to wait
for the value to get to the register file

415
00:33:33,056 --> 00:33:37,028
before you go ahead.
So, here we have the same example that we

416
00:33:37,028 --> 00:33:41,069
had earlier in class where you have,
something which writes register one and

417
00:33:41,069 --> 00:33:46,040
something that reads register one, and to
ALU add instruction, in this instruction

418
00:33:46,040 --> 00:33:50,434
back to back.
So does, does this get help?

419
00:33:50,434 --> 00:33:54,362
Well, yes.
So this you can see, clearly see that this

420
00:33:54,362 --> 00:34:00,896
instruction is, the, the result here is
gonna come back around and we only, we, we

421
00:34:00,896 --> 00:34:05,345
effectively don't have to stall the second
instruction.

422
00:34:05,345 --> 00:34:09,486
Because it can pick up that data value
right then and there.

423
00:34:09,486 --> 00:34:14,441
The data value gets calculated in this
stage, it can sort of loop around real

424
00:34:14,441 --> 00:34:17,976
fast here.
And we don't have to stall at all.

425
00:34:17,976 --> 00:34:24,532
In this case.
Okay so, quick quiz question here.

426
00:34:24,532 --> 00:34:32,391
Two other cases.
We have a memory operation, we just erased

427
00:34:32,391 --> 00:34:37,988
memory one, and then we have a jump in
link, and then an add.

428
00:34:37,988 --> 00:34:42,618
Does this bypass, right here, help, with
these two cases?

429
00:34:42,618 --> 00:34:47,681
Well.
We said it helped here.

430
00:34:47,681 --> 00:34:53,008
This case here.
Well, when is the memory, load?

431
00:34:53,008 --> 00:34:57,095
This is a load.
When does the load result get calculated?

432
00:34:57,095 --> 00:35:02,309
The load result doesn't get calculated
until the output here of the data memory

433
00:35:02,309 --> 00:35:05,924
or right here.
So all the son.

434
00:35:05,924 --> 00:35:10,544
That's after this bypass.
So it's too late.

435
00:35:10,544 --> 00:35:16,882
So we still need to stall the pipeline
here for load with a dependent

436
00:35:16,882 --> 00:35:23,458
instruction, dependent on the load.
So we're gonna stall there.

437
00:35:23,458 --> 00:35:26,109
Okay.
Now, a trickier one.

438
00:35:26,109 --> 00:35:31,299
Jump and link 500.
And then something which reads R31.

439
00:35:31,299 --> 00:35:37,290
So, little bit of, background here on the
MIPS instruction set.

440
00:35:37,290 --> 00:35:41,960
Jump and link implicitly writes to
register 31.

441
00:35:41,960 --> 00:35:45,458
It's the, the link register.
Okay.

442
00:35:45,458 --> 00:35:48,916
So, so that means we have a data
dependence.

443
00:35:48,916 --> 00:35:53,468
Where, where does the jump in link, what,
what gets put into R31?

444
00:35:53,468 --> 00:35:58,372
So it's the program counter or the program
counter plus four is what it is

445
00:35:58,372 --> 00:36:02,084
architected as in MIPS.
You could probably build it either way

446
00:36:02,084 --> 00:36:06,778
depending on how you do jump register.
So on first, first look this looks like it

447
00:36:06,778 --> 00:36:12,046
should actually like solve a lot of
problems we should be able to by pass our

448
00:36:12,046 --> 00:36:15,886
result of our jump in link to right where
it needs to go.

449
00:36:15,886 --> 00:36:18,843
Mm.
It's a little unsatisfying though,

450
00:36:18,843 --> 00:36:24,438
because, if you look at the rest of the
pipe here, if you have a jump.

451
00:36:24,438 --> 00:36:30,585
Let's say in the execute stage.
How, how are, you know, is, is the

452
00:36:30,585 --> 00:36:34,978
consumer of that instruction gonna be here
or not?

453
00:36:34,978 --> 00:36:40,891
So this one's kind of a trick question.
So, does it help?

454
00:36:40,891 --> 00:36:45,657
Well, you can bypass out and around, but
the thing after it in the pipe is probably

455
00:36:45,657 --> 00:36:50,691
not going to be appropriate instruction.
So, if I were to answer this question,

456
00:36:50,691 --> 00:36:53,021
does it help?
I'd probably say no.

457
00:36:53,021 --> 00:36:58,019
Because at least in the pipe drawing here,
you're not gonna be executing the

458
00:36:58,019 --> 00:37:03,055
subsequent instruction, or executing the,
even if this is, the instruction at 500,

459
00:37:03,055 --> 00:37:06,032
you're not going be actually executing
that.

460
00:37:06,032 --> 00:37:11,036
You'll probably have to, sort of, wait for
that, jump to resolve somewhere further

461
00:37:11,036 --> 00:37:16,046
down the pipe, and, then go pick it up.
So the bypass, bypasses don't always help.

462
00:37:16,046 --> 00:37:20,093
And, especially in something where it's
not a, a fully bypassed pipeline.

463
00:37:20,093 --> 00:37:24,011
Okay.
Oh, before I move off this slide, I wanted

464
00:37:24,011 --> 00:37:29,057
to, say that this is called bypassing.
Sometimes this is also called, forwarding

465
00:37:29,057 --> 00:37:32,044
values.
And we're gonna be using those terms

466
00:37:32,044 --> 00:37:38,010
interchangeably in this class.
Okay so now, now we get more, more details

467
00:37:38,010 --> 00:37:40,060
here.
We start to look at how to derive the

468
00:37:40,060 --> 00:37:43,085
bypass signal.
We're gonna do, we're gonna build this the

469
00:37:43,085 --> 00:37:48,056
same way we derived the stall signal and
we're gonna take terms out of the stall

470
00:37:48,056 --> 00:37:53,028
calculation we had before.
So, if you will recall, we have the

471
00:37:53,028 --> 00:37:56,010
pipeline diagram here with the stall
signal.

472
00:37:56,010 --> 00:38:01,023
We have stalled sa, stages, and we ended
up having to stall in this case where we

473
00:38:01,023 --> 00:38:04,063
have a ALU op followed by an ALU op that's
dependent.

474
00:38:05,091 --> 00:38:11,066
So each, each stall or kill introduces a
bubble into the pipeline and this is gonna

475
00:38:11,066 --> 00:38:18,038
give us a clocks per instruction over one.
With that new data path, which bypasses

476
00:38:18,038 --> 00:38:24,019
out of the ALU into input operand A, we
can see that we actually can remove all

477
00:38:24,019 --> 00:38:30,015
these stalls and just do the bypassing.
So, it actually shrinks the time taking to

478
00:38:30,015 --> 00:38:34,071
execute this code.
And this new data path is really a great

479
00:38:34,071 --> 00:38:38,083
thing here.
This bypass has taken us from greater than

480
00:38:38,083 --> 00:38:42,088
one clock per instruction to one clock per
instruction.

481
00:38:44,051 --> 00:38:54,003
And we're actually forwarding out of the
execution unit for one into the decode

482
00:38:54,003 --> 00:39:00,018
unit here and gets consumed here.
And the execution unit in time three for

483
00:39:00,018 --> 00:39:08,005
the, the instruction two.
Okay so let's drive the bypass signal.

484
00:39:09,021 --> 00:39:12,056
And we'll start off by looking at our
original stall signal.

485
00:39:12,056 --> 00:39:15,017
So this is just the stall signal we had
before.

486
00:39:15,063 --> 00:39:22,036
And, first thing we're gonna do.
Is if you look at this, this case right

487
00:39:22,036 --> 00:39:34,029
here where we compared, the execute right
destination to the, decode first source

488
00:39:34,029 --> 00:39:37,822
operand.
We don't need that anymore.

489
00:39:37,822 --> 00:39:41,464
We added a bypass.
And we added a forwarding signal to handle

490
00:39:41,464 --> 00:39:45,545
that case.
So just sort of put a line through that.

491
00:39:45,545 --> 00:39:52,097
Okay, the next question that comes up is
we had in this diagram here, we added this

492
00:39:52,097 --> 00:39:57,061
multiplexer here, to choose between
reading from the register file and reading

493
00:39:57,061 --> 00:40:00,076
the data which came out of the arithmetic
logic unit.

494
00:40:02,094 --> 00:40:07,027
And we need to ask, what is the control on
that multiplexer?

495
00:40:07,027 --> 00:40:11,031
Well, it's the exact same case that we
just crossed out.

496
00:40:11,031 --> 00:40:14,069
When that case is true, we wanna do the
bypass.

497
00:40:14,069 --> 00:40:18,030
So we just take that, those terms, and put
it here.

498
00:40:18,030 --> 00:40:23,030
And that's actually the control in the
multiplexer, a source here.

499
00:40:25,076 --> 00:40:28,042
Is this correct?
Hm.

500
00:40:28,042 --> 00:40:38,046
Is this the full story?
Well, unfortunately no.

501
00:40:39,050 --> 00:40:44,086
It's close, really close.
It looks like it should work, but

502
00:40:44,086 --> 00:40:53,047
unfortunately.
Only ALU and arithmetic logic immediate

503
00:40:53,047 --> 00:41:01,073
instructions can benefit from this.
If you have something like a load, you

504
00:41:01,073 --> 00:41:05,071
need to wait for the data value to show
up.

505
00:41:06,003 --> 00:41:11,075
So, this, a source here needs to have some
component saying, make sure it's a load,

506
00:41:11,075 --> 00:41:17,082
or make sure it's not a load, if you will.
And up here, we actually reintroduce that

507
00:41:17,082 --> 00:41:21,082
term, checking to see whether it's
something with a load.

508
00:41:22,086 --> 00:41:28,028
So what we're gonna do is we're gonna
split the write-enable into two

509
00:41:28,028 --> 00:41:31,083
components.
The write enable that you're bypassing and

510
00:41:31,083 --> 00:41:36,039
the write enable that you're stalling.
And we're gonna re-introduce these sort of

511
00:41:36,039 --> 00:41:40,083
two components here with two slightly
different write enables, dependent on the

512
00:41:40,083 --> 00:41:47,344
decode of the instructions in the execute
stage of the pipeline, Okay so let's do

513
00:41:47,344 --> 00:41:52,027
that.
And we replaced, we still have this term

514
00:41:52,027 --> 00:41:56,055
back in the stall signal.
We still have this term in the, the

515
00:41:56,055 --> 00:42:00,091
bypassing signal, but we now have two
different write enables.

516
00:42:00,091 --> 00:42:05,013
One for bypass calculation and one for a
stall calculation.

517
00:42:05,013 --> 00:42:11,019
And the bypass calculation these two
different signals are, calculated based on

518
00:42:11,019 --> 00:42:14,084
the decode of the instruction in the
execute stage.

519
00:42:14,084 --> 00:42:20,755
So we bypass only when it's a, a,
arithmetic logic unit, or an immediate

520
00:42:20,755 --> 00:42:27,935
arithmetic instruction and let's say the
destination is not zero.

521
00:42:27,935 --> 00:42:38,033
And we stall if it's a load or a jump in
link or jump in link register also falls

522
00:42:38,033 --> 00:42:43,061
into that case.
And that's when we have to do the stall.

523
00:42:43,061 --> 00:42:52,043
Because the ju, jump in link, and jump in
link register And at least.

524
00:42:52,043 --> 00:42:58,007
This data path we only have this
multiplexer here, register 31 at the end.

525
00:42:58,007 --> 00:43:02,074
You can build data paths which have
different multiplexer's for that and you

526
00:43:02,074 --> 00:43:05,053
might be able to remove that clause from
this.

527
00:43:07,061 --> 00:43:16,082
Okay, so what I notice here is our loads,
and jumping link registers, and jumping

528
00:43:16,082 --> 00:43:21,133
links are gonna stall when we have a match
on the registers.

529
00:43:21,133 --> 00:43:26,536
And something like ALU instructions
generally, write and able to bypass is

530
00:43:26,536 --> 00:43:31,030
going to not stall and use the bypass of
the following logic.

531
00:43:31,030 --> 00:43:36,524
Okay, so let's take a look at what this
looks like for a fully by-pass data path.

532
00:43:36,524 --> 00:43:41,802
So our fully by-pass data path we're going
to add all the destinations locations out

533
00:43:41,802 --> 00:43:45,614
of here, out of here.
We're going to run that back and we're

534
00:43:45,614 --> 00:43:50,329
going to add to big multiplexors here.
Because in our first case we only

535
00:43:50,329 --> 00:43:54,488
multiplexed for the first source operand,
or the A source operand.

536
00:43:54,488 --> 00:43:59,419
But we want actually want to multiplex the
inputs for a and b the two source

537
00:43:59,419 --> 00:44:03,515
operands.
And we're also gonna add this PC here for

538
00:44:03,515 --> 00:44:10,230
the jump and link that handles some of the
more complex pieces here because we,

539
00:44:10,230 --> 00:44:17,263
otherwise we have to put multiplexers here
for sort of R31 of multiplexing in the PC

540
00:44:17,263 --> 00:44:22,197
or something like that.
So we've effectively be able to bypass

541
00:44:22,197 --> 00:44:26,137
everything here.
So the question here is, is there still a

542
00:44:26,137 --> 00:44:31,279
need for the stall signal.
So this is more than what we had before.

543
00:44:31,279 --> 00:44:36,096
This is more than just a source.
We now can bypass out of not only here to

544
00:44:36,096 --> 00:44:40,229
there but we can bypass out of after the
memory operations.

545
00:44:40,229 --> 00:44:45,487
So maybe this changes our stall signal so
that we don't need to stall on loads

546
00:44:45,487 --> 00:44:47,044
anymore.
That'd be great.

547
00:44:47,044 --> 00:44:51,727
They'd have better performance.
Well, unfortunately, no.

548
00:44:51,727 --> 00:44:56,294
We still need this.
You still need to check.

549
00:44:56,294 --> 00:45:03,365
If the opcode is, is a, is a load, in this
stage of the pipe, even with a fully

550
00:45:03,365 --> 00:45:10,091
bypassed data path.
So we've, we've resolved a bunch of the

551
00:45:10,091 --> 00:45:16,198
data hazards but the loads, still need to,
wait, or the instructions dependent on

552
00:45:16,198 --> 00:45:21,879
loads still need to wait, because you
don't know, the results of the value,

553
00:45:21,879 --> 00:45:26,337
until, you come out of here.
So you can't issue a subsequent

554
00:45:26,337 --> 00:45:30,349
instruction, into the ALU stage early.
You need to stall.

555
00:45:30,349 --> 00:45:34,643
But, this is basically our full stall
calculation at this point.

556
00:45:34,643 --> 00:45:39,910
Because we add all those bypasses, we've
removed a lot of the other complexity from

557
00:45:39,910 --> 00:45:44,897
our stall signal.
And in this case you'll see that loads

558
00:45:44,897 --> 00:45:50,249
have a latency of two cycles.
Okay.

559
00:45:50,249 --> 00:45:55,047
So, as I said, the last technique you can
look at is speculation, where you try to

560
00:45:55,047 --> 00:45:56,940
guess things.
Guess data values.

561
00:45:56,940 --> 00:46:00,275
Guess things like that, or try to execute
code out of order.

562
00:46:00,275 --> 00:46:03,371
We're gonna talk about that later in the
course.

563
00:46:03,371 --> 00:46:05,761
That's, that's not really in today's
lecture.

564
00:46:05,761 --> 00:46:11,551
It's not really review material, but we
will, we will discuss that to some, some

565
00:46:11,551 --> 00:46:14,504
extent.
So now we're gonna move on to talking

566
00:46:14,504 --> 00:46:19,535
about control hazards, and because we're
running a little low on time we'll look at

567
00:46:19,535 --> 00:46:22,014
that a little bit more in the next
lecture.