Why is branch prediction important?
So talk a little bit about motivation. Then we're going to move on and start
talking about branch prediction and the two things we need to predict when we're
predicting branches. And the top thing that jumps to mind is,
is the branch take-in or not; the outcome of the branch.
But, that's only half of the, half the story.
And today, we are also going to talk about figuring out where you actually go when
you take a branch. When I say branch, we're going to loosely
put all forms of flow control into this. So, it's not just a branch; it's a branch,
a jump. You might even think about trying to put
something like a interrupt, cuz that changes your control flow of your program.
But, most people try not to predict their interrupts.
It's hypothetically possible. So let's start by talking about why, why
branch prediction and what is the big motivation for branch prediction?
So as I said longer pipelines and more complex pipelines require us to have
relatively good accuracy trying to figure when we take a branch or when we don't
take a branch. So here we have our in order issue or it's
gonna be in order fetch, out of order issue, out of order execute, in order
commit pipeline. And a couple of things we should note here
is, you know, we added this extra stage out here, we added this issue stage.
But we also added this issue queue or instruction buffer here, or issue window,
depending what book you read in the front here.
Well, instructions pile up into this. And if you don't actually figure out if
the branch is taken or not, let's say until somewhere here, the execute stage.
Then, you're going to have more, basically, instructions you need to kill
when you take a branch mis-predict. So, when you start to go to these out of
order processors, when you sort of have this seemingly short pipeline, seemingly
easy pipeline. More instructions can get sort of queued
up into some of these structures, especially if you have a queue.
So this effectively lengthens the front of your pipeline and makes it such that if
you mispredict or you fetch the wrong instructions relatively often, you're just
going to be out in the weeds and you're going to be killing lots of instructions
and having done extra work that you didn't really want to do.
So also, if you wait all the way until the end of the pipe in sort of these out of
order processors to resolve your branch, that's also going to make life even worse
here cuz that makes your mispredict penalty even longer.
Most people don't actually do that. I mean you might say oh, I don't want to
actually kill the instructions until I know the branch commits and that was sort
of a simplistic example we had when we are talking about these out of order
processors we wait all the way till the end of the pipe and then sort of cleaned
out things. You can wait for it to go to the end of
the pipe to actually fully clean out things but you don't want to redirect in
front of the pipe or redirect the fetch or the PC in the front of the pipe as quickly
as possible cuz you just don't want to be fetching off into the weeds cuz you are
then just wasting cycles. Here we have going back to our super
pipelining lecture that we had before. And we look at the, for some real
processors the, the Pentium three, and the Pentium four, what their branch
mis-predict penalty is. And you know, in this Pentium four here,
you have twenty, twenty odd cycles here of branch mispredicts penalty.
That can be pretty painful if you take branch mispredicts quite often cuz you're
going to be taking branches, and the branch penalty is going to be quite, quite
high if you don't have the correct subsequent instructions after you.
Now, you know, we talked about, some techniques.
You could just stall and wait, so you don't actually predict the branch.
But then, if you have to wait for every branch to get to, let's say, the twentieth
stage of the pipe before you go and fetch the subsequent instruction, that's pretty
painful. So we talked about speculating the next PC
or the PC plus four, we'll say in a NIP style architecture or our architecture
where the, each instruction is 32 bits long.
But that doesn't really help you when you're trying to predict or when, doesn't
really help you if you high probability you think the branch is going to be taken
or you think control flow is going to be taken.
So you need to start thinking about how to actually deal with that in a pipe line.
And up to this point we've only talked about speculating the fall through case.
We talked briefly about speculating the non fall through case, but we didn't say
how you could possibly do that. And today we're going to talk about what
the hardware is to do that. Also making, making life worse is if you
start to go wide. This hurts also.
So, if we start to go wide here let's say we have a dual issue processor but if you
go wide here, when you go to kill instructions you are killing twice as many
instructions in flight in the pipe if you take a branch the wrong direction or
mispredict the branch. So showing that from our pipeline diagram
perspective here, this is just recapping the incidence in a previous lecture.
But, here we have a fetch for this branch. And, we're fetching two instructions per
cycle here. So even if we're relatively short
pipeline, you end up with one, two, three, four, five, six, seven dead instructions
on a mispredict. So, what this really comes down to here is
you have the pipeline width or, or approximately how much depth you end up
getting killed, is the pipeline width multiplied by the branch penalty.
So, its width times length before you can resolve the branch.
In, if you can shorten the time it takes you to resolve the branch, that's good.
Or if you can make the processor narrower, that may be good.
It's good from, you know, fewer instructions being killed.
But, we like to execute multiple instructions at a time cuz that improves
our performance. So this is really the motivation for
thinking about trying to put something useful in this time here and to also
trying to reduce the probability that we start fetching in correcting instructions
at all.