1 00:00:01,052 --> 00:00:05,067 Why is branch prediction important? So talk a little bit about motivation. 2 00:00:05,067 --> 00:00:09,801 Then we're going to move on and start talking about branch prediction and the 3 00:00:09,801 --> 00:00:13,059 two things we need to predict when we're predicting branches. 4 00:00:13,059 --> 00:00:18,053 And the top thing that jumps to mind is, is the branch take-in or not; the outcome 5 00:00:18,053 --> 00:00:21,063 of the branch. But, that's only half of the, half the 6 00:00:21,063 --> 00:00:24,037 story. And today, we are also going to talk about 7 00:00:24,037 --> 00:00:27,091 figuring out where you actually go when you take a branch. 8 00:00:27,091 --> 00:00:32,066 When I say branch, we're going to loosely put all forms of flow control into this. 9 00:00:32,066 --> 00:00:35,052 So, it's not just a branch; it's a branch, a jump. 10 00:00:35,256 --> 00:00:40,039 You might even think about trying to put something like a interrupt, cuz that 11 00:00:40,039 --> 00:00:45,045 changes your control flow of your program. But, most people try not to predict their 12 00:00:45,045 --> 00:00:47,039 interrupts. It's hypothetically possible. 13 00:00:47,092 --> 00:00:52,078 So let's start by talking about why, why branch prediction and what is the big 14 00:00:52,078 --> 00:00:59,050 motivation for branch prediction? So as I said longer pipelines and more 15 00:00:59,050 --> 00:01:05,027 complex pipelines require us to have relatively good accuracy trying to figure 16 00:01:05,027 --> 00:01:09,014 when we take a branch or when we don't take a branch. 17 00:01:09,014 --> 00:01:14,099 So here we have our in order issue or it's gonna be in order fetch, out of order 18 00:01:14,099 --> 00:01:18,079 issue, out of order execute, in order commit pipeline. 19 00:01:18,079 --> 00:01:23,081 And a couple of things we should note here is, you know, we added this extra stage 20 00:01:23,081 --> 00:01:28,034 out here, we added this issue stage. But we also added this issue queue or 21 00:01:28,034 --> 00:01:33,061 instruction buffer here, or issue window, depending what book you read in the front 22 00:01:33,061 --> 00:01:36,015 here. Well, instructions pile up into this. 23 00:01:37,045 --> 00:01:41,429 And if you don't actually figure out if the branch is taken or not, let's say 24 00:01:41,429 --> 00:01:45,966 until somewhere here, the execute stage. Then, you're going to have more, 25 00:01:45,966 --> 00:01:50,636 basically, instructions you need to kill when you take a branch mis-predict. 26 00:01:50,636 --> 00:01:55,693 So, when you start to go to these out of order processors, when you sort of have 27 00:01:55,693 --> 00:02:00,020 this seemingly short pipeline, seemingly easy pipeline. 28 00:02:00,048 --> 00:02:04,025 More instructions can get sort of queued up into some of these structures, 29 00:02:04,025 --> 00:02:07,074 especially if you have a queue. So this effectively lengthens the front of 30 00:02:07,074 --> 00:02:11,042 your pipeline and makes it such that if you mispredict or you fetch the wrong 31 00:02:11,042 --> 00:02:14,647 instructions relatively often, you're just going to be out in the weeds and you're 32 00:02:14,647 --> 00:02:18,689 going to be killing lots of instructions and having done extra work that you didn't 33 00:02:18,689 --> 00:02:26,759 really want to do. So also, if you wait all the way until the 34 00:02:26,759 --> 00:02:30,654 end of the pipe in sort of these out of order processors to resolve your branch, 35 00:02:30,654 --> 00:02:34,224 that's also going to make life even worse here cuz that makes your mispredict 36 00:02:34,224 --> 00:02:36,935 penalty even longer. Most people don't actually do that. 37 00:02:36,935 --> 00:02:40,968 I mean you might say oh, I don't want to actually kill the instructions until I 38 00:02:40,968 --> 00:02:45,346 know the branch commits and that was sort of a simplistic example we had when we are 39 00:02:45,346 --> 00:02:48,528 talking about these out of order processors we wait all the way till the 40 00:02:48,528 --> 00:02:50,692 end of the pipe and then sort of cleaned out things. 41 00:02:50,856 --> 00:02:54,923 You can wait for it to go to the end of the pipe to actually fully clean out 42 00:02:54,923 --> 00:02:58,950 things but you don't want to redirect in front of the pipe or redirect the fetch or 43 00:02:58,950 --> 00:03:02,538 the PC in the front of the pipe as quickly as possible cuz you just don't want to be 44 00:03:02,538 --> 00:03:06,111 fetching off into the weeds cuz you are then just wasting cycles. 45 00:03:06,111 --> 00:03:11,987 Here we have going back to our super pipelining lecture that we had before. 46 00:03:11,987 --> 00:03:17,423 And we look at the, for some real processors the, the Pentium three, and the 47 00:03:17,423 --> 00:03:21,758 Pentium four, what their branch mis-predict penalty is. 48 00:03:21,758 --> 00:03:26,081 And you know, in this Pentium four here, you have twenty, twenty odd cycles here of 49 00:03:26,081 --> 00:03:30,070 branch mispredicts penalty. That can be pretty painful if you take 50 00:03:30,070 --> 00:03:35,030 branch mispredicts quite often cuz you're going to be taking branches, and the 51 00:03:35,030 --> 00:03:39,066 branch penalty is going to be quite, quite high if you don't have the correct 52 00:03:39,066 --> 00:03:43,071 subsequent instructions after you. Now, you know, we talked about, some 53 00:03:43,071 --> 00:03:46,059 techniques. You could just stall and wait, so you 54 00:03:46,059 --> 00:03:50,098 don't actually predict the branch. But then, if you have to wait for every 55 00:03:50,098 --> 00:03:55,097 branch to get to, let's say, the twentieth stage of the pipe before you go and fetch 56 00:03:55,097 --> 00:03:58,092 the subsequent instruction, that's pretty painful. 57 00:03:58,092 --> 00:04:03,084 So we talked about speculating the next PC or the PC plus four, we'll say in a NIP 58 00:04:03,084 --> 00:04:08,047 style architecture or our architecture where the, each instruction is 32 bits 59 00:04:08,047 --> 00:04:10,777 long. But that doesn't really help you when 60 00:04:10,777 --> 00:04:16,103 you're trying to predict or when, doesn't really help you if you high probability 61 00:04:16,103 --> 00:04:20,727 you think the branch is going to be taken or you think control flow is going to be 62 00:04:20,727 --> 00:04:23,668 taken. So you need to start thinking about how to 63 00:04:23,668 --> 00:04:27,523 actually deal with that in a pipe line. And up to this point we've only talked 64 00:04:27,523 --> 00:04:31,548 about speculating the fall through case. We talked briefly about speculating the 65 00:04:31,548 --> 00:04:34,795 non fall through case, but we didn't say how you could possibly do that. 66 00:04:34,795 --> 00:04:40,137 And today we're going to talk about what the hardware is to do that. 67 00:04:40,137 --> 00:04:46,602 Also making, making life worse is if you start to go wide. 68 00:04:46,602 --> 00:04:50,998 This hurts also. So, if we start to go wide here let's say 69 00:04:50,998 --> 00:04:56,188 we have a dual issue processor but if you go wide here, when you go to kill 70 00:04:56,188 --> 00:05:01,792 instructions you are killing twice as many instructions in flight in the pipe if you 71 00:05:01,792 --> 00:05:05,816 take a branch the wrong direction or mispredict the branch. 72 00:05:05,816 --> 00:05:11,662 So showing that from our pipeline diagram perspective here, this is just recapping 73 00:05:11,662 --> 00:05:16,583 the incidence in a previous lecture. But, here we have a fetch for this branch. 74 00:05:16,583 --> 00:05:19,754 And, we're fetching two instructions per cycle here. 75 00:05:19,754 --> 00:05:26,614 So even if we're relatively short pipeline, you end up with one, two, three, 76 00:05:26,614 --> 00:05:31,026 four, five, six, seven dead instructions on a mispredict. 77 00:05:31,026 --> 00:05:36,067 So, what this really comes down to here is you have the pipeline width or, or 78 00:05:36,067 --> 00:05:42,015 approximately how much depth you end up getting killed, is the pipeline width 79 00:05:42,015 --> 00:05:47,320 multiplied by the branch penalty. So, its width times length before you can 80 00:05:47,320 --> 00:05:51,097 resolve the branch. In, if you can shorten the time it takes 81 00:05:51,097 --> 00:05:56,075 you to resolve the branch, that's good. Or if you can make the processor narrower, 82 00:05:56,075 --> 00:05:59,056 that may be good. It's good from, you know, fewer 83 00:05:59,056 --> 00:06:03,003 instructions being killed. But, we like to execute multiple 84 00:06:03,003 --> 00:06:06,056 instructions at a time cuz that improves our performance. 85 00:06:11,638 --> 00:06:16,046 So this is really the motivation for thinking about trying to put something 86 00:06:16,046 --> 00:06:22,005 useful in this time here and to also trying to reduce the probability that we 87 00:06:22,005 --> 00:06:25,056 start fetching in correcting instructions at all.