1
00:00:01,052 --> 00:00:05,067
Why is branch prediction important?
So talk a little bit about motivation.

2
00:00:05,067 --> 00:00:09,801
Then we're going to move on and start
talking about branch prediction and the

3
00:00:09,801 --> 00:00:13,059
two things we need to predict when we're
predicting branches.

4
00:00:13,059 --> 00:00:18,053
And the top thing that jumps to mind is,
is the branch take-in or not; the outcome

5
00:00:18,053 --> 00:00:21,063
of the branch.
But, that's only half of the, half the

6
00:00:21,063 --> 00:00:24,037
story.
And today, we are also going to talk about

7
00:00:24,037 --> 00:00:27,091
figuring out where you actually go when
you take a branch.

8
00:00:27,091 --> 00:00:32,066
When I say branch, we're going to loosely
put all forms of flow control into this.

9
00:00:32,066 --> 00:00:35,052
So, it's not just a branch; it's a branch,
a jump.

10
00:00:35,256 --> 00:00:40,039
You might even think about trying to put
something like a interrupt, cuz that

11
00:00:40,039 --> 00:00:45,045
changes your control flow of your program.
But, most people try not to predict their

12
00:00:45,045 --> 00:00:47,039
interrupts.
It's hypothetically possible.

13
00:00:47,092 --> 00:00:52,078
So let's start by talking about why, why
branch prediction and what is the big

14
00:00:52,078 --> 00:00:59,050
motivation for branch prediction?
So as I said longer pipelines and more

15
00:00:59,050 --> 00:01:05,027
complex pipelines require us to have
relatively good accuracy trying to figure

16
00:01:05,027 --> 00:01:09,014
when we take a branch or when we don't
take a branch.

17
00:01:09,014 --> 00:01:14,099
So here we have our in order issue or it's
gonna be in order fetch, out of order

18
00:01:14,099 --> 00:01:18,079
issue, out of order execute, in order
commit pipeline.

19
00:01:18,079 --> 00:01:23,081
And a couple of things we should note here
is, you know, we added this extra stage

20
00:01:23,081 --> 00:01:28,034
out here, we added this issue stage.
But we also added this issue queue or

21
00:01:28,034 --> 00:01:33,061
instruction buffer here, or issue window,
depending what book you read in the front

22
00:01:33,061 --> 00:01:36,015
here.
Well, instructions pile up into this.

23
00:01:37,045 --> 00:01:41,429
And if you don't actually figure out if
the branch is taken or not, let's say

24
00:01:41,429 --> 00:01:45,966
until somewhere here, the execute stage.
Then, you're going to have more,

25
00:01:45,966 --> 00:01:50,636
basically, instructions you need to kill
when you take a branch mis-predict.

26
00:01:50,636 --> 00:01:55,693
So, when you start to go to these out of
order processors, when you sort of have

27
00:01:55,693 --> 00:02:00,020
this seemingly short pipeline, seemingly
easy pipeline.

28
00:02:00,048 --> 00:02:04,025
More instructions can get sort of queued
up into some of these structures,

29
00:02:04,025 --> 00:02:07,074
especially if you have a queue.
So this effectively lengthens the front of

30
00:02:07,074 --> 00:02:11,042
your pipeline and makes it such that if
you mispredict or you fetch the wrong

31
00:02:11,042 --> 00:02:14,647
instructions relatively often, you're just
going to be out in the weeds and you're

32
00:02:14,647 --> 00:02:18,689
going to be killing lots of instructions
and having done extra work that you didn't

33
00:02:18,689 --> 00:02:26,759
really want to do.
So also, if you wait all the way until the

34
00:02:26,759 --> 00:02:30,654
end of the pipe in sort of these out of
order processors to resolve your branch,

35
00:02:30,654 --> 00:02:34,224
that's also going to make life even worse
here cuz that makes your mispredict

36
00:02:34,224 --> 00:02:36,935
penalty even longer.
Most people don't actually do that.

37
00:02:36,935 --> 00:02:40,968
I mean you might say oh, I don't want to
actually kill the instructions until I

38
00:02:40,968 --> 00:02:45,346
know the branch commits and that was sort
of a simplistic example we had when we are

39
00:02:45,346 --> 00:02:48,528
talking about these out of order
processors we wait all the way till the

40
00:02:48,528 --> 00:02:50,692
end of the pipe and then sort of cleaned
out things.

41
00:02:50,856 --> 00:02:54,923
You can wait for it to go to the end of
the pipe to actually fully clean out

42
00:02:54,923 --> 00:02:58,950
things but you don't want to redirect in
front of the pipe or redirect the fetch or

43
00:02:58,950 --> 00:03:02,538
the PC in the front of the pipe as quickly
as possible cuz you just don't want to be

44
00:03:02,538 --> 00:03:06,111
fetching off into the weeds cuz you are
then just wasting cycles.

45
00:03:06,111 --> 00:03:11,987
Here we have going back to our super
pipelining lecture that we had before.

46
00:03:11,987 --> 00:03:17,423
And we look at the, for some real
processors the, the Pentium three, and the

47
00:03:17,423 --> 00:03:21,758
Pentium four, what their branch
mis-predict penalty is.

48
00:03:21,758 --> 00:03:26,081
And you know, in this Pentium four here,
you have twenty, twenty odd cycles here of

49
00:03:26,081 --> 00:03:30,070
branch mispredicts penalty.
That can be pretty painful if you take

50
00:03:30,070 --> 00:03:35,030
branch mispredicts quite often cuz you're
going to be taking branches, and the

51
00:03:35,030 --> 00:03:39,066
branch penalty is going to be quite, quite
high if you don't have the correct

52
00:03:39,066 --> 00:03:43,071
subsequent instructions after you.
Now, you know, we talked about, some

53
00:03:43,071 --> 00:03:46,059
techniques.
You could just stall and wait, so you

54
00:03:46,059 --> 00:03:50,098
don't actually predict the branch.
But then, if you have to wait for every

55
00:03:50,098 --> 00:03:55,097
branch to get to, let's say, the twentieth
stage of the pipe before you go and fetch

56
00:03:55,097 --> 00:03:58,092
the subsequent instruction, that's pretty
painful.

57
00:03:58,092 --> 00:04:03,084
So we talked about speculating the next PC
or the PC plus four, we'll say in a NIP

58
00:04:03,084 --> 00:04:08,047
style architecture or our architecture
where the, each instruction is 32 bits

59
00:04:08,047 --> 00:04:10,777
long.
But that doesn't really help you when

60
00:04:10,777 --> 00:04:16,103
you're trying to predict or when, doesn't
really help you if you high probability

61
00:04:16,103 --> 00:04:20,727
you think the branch is going to be taken
or you think control flow is going to be

62
00:04:20,727 --> 00:04:23,668
taken.
So you need to start thinking about how to

63
00:04:23,668 --> 00:04:27,523
actually deal with that in a pipe line.
And up to this point we've only talked

64
00:04:27,523 --> 00:04:31,548
about speculating the fall through case.
We talked briefly about speculating the

65
00:04:31,548 --> 00:04:34,795
non fall through case, but we didn't say
how you could possibly do that.

66
00:04:34,795 --> 00:04:40,137
And today we're going to talk about what
the hardware is to do that.

67
00:04:40,137 --> 00:04:46,602
Also making, making life worse is if you
start to go wide.

68
00:04:46,602 --> 00:04:50,998
This hurts also.
So, if we start to go wide here let's say

69
00:04:50,998 --> 00:04:56,188
we have a dual issue processor but if you
go wide here, when you go to kill

70
00:04:56,188 --> 00:05:01,792
instructions you are killing twice as many
instructions in flight in the pipe if you

71
00:05:01,792 --> 00:05:05,816
take a branch the wrong direction or
mispredict the branch.

72
00:05:05,816 --> 00:05:11,662
So showing that from our pipeline diagram
perspective here, this is just recapping

73
00:05:11,662 --> 00:05:16,583
the incidence in a previous lecture.
But, here we have a fetch for this branch.

74
00:05:16,583 --> 00:05:19,754
And, we're fetching two instructions per
cycle here.

75
00:05:19,754 --> 00:05:26,614
So even if we're relatively short
pipeline, you end up with one, two, three,

76
00:05:26,614 --> 00:05:31,026
four, five, six, seven dead instructions
on a mispredict.

77
00:05:31,026 --> 00:05:36,067
So, what this really comes down to here is
you have the pipeline width or, or

78
00:05:36,067 --> 00:05:42,015
approximately how much depth you end up
getting killed, is the pipeline width

79
00:05:42,015 --> 00:05:47,320
multiplied by the branch penalty.
So, its width times length before you can

80
00:05:47,320 --> 00:05:51,097
resolve the branch.
In, if you can shorten the time it takes

81
00:05:51,097 --> 00:05:56,075
you to resolve the branch, that's good.
Or if you can make the processor narrower,

82
00:05:56,075 --> 00:05:59,056
that may be good.
It's good from, you know, fewer

83
00:05:59,056 --> 00:06:03,003
instructions being killed.
But, we like to execute multiple

84
00:06:03,003 --> 00:06:06,056
instructions at a time cuz that improves
our performance.

85
00:06:11,638 --> 00:06:16,046
So this is really the motivation for
thinking about trying to put something

86
00:06:16,046 --> 00:06:22,005
useful in this time here and to also
trying to reduce the probability that we

87
00:06:22,005 --> 00:06:25,056
start fetching in correcting instructions
at all.