1 00:00:03,048 --> 00:00:06,017 Okay. Let's, let's begin today. 2 00:00:06,017 --> 00:00:14,005 So, we are going to start where we left off last time, which was talking about in 3 00:00:14,005 --> 00:00:18,003 order super scalars, so as you may recall just a brief review. 4 00:00:18,003 --> 00:00:22,019 We were talking about these sort of, little bit more complex pipelines. 5 00:00:22,019 --> 00:00:25,093 We were able to execute multiple instructions at the same time. 6 00:00:25,093 --> 00:00:30,079 Sometimes instructions had to sort of be steered, if you will, between the A and 7 00:00:30,079 --> 00:00:34,053 the B pipe, coming from the two different instruction registers. 8 00:00:34,053 --> 00:00:38,040 We have to double our decode. We need to check to make sure that there's 9 00:00:38,040 --> 00:00:42,053 no dependencies between, the instructions we're trying to issue at the same time, 10 00:00:42,053 --> 00:00:46,046 and our decode logic will do that. So there's going to be some sort of cross 11 00:00:46,046 --> 00:00:49,086 communication inside of there. And in our previous example we were 12 00:00:49,086 --> 00:00:53,074 looking at an asymmetric pipeline. So what I mean by that is, you know, loads 13 00:00:53,074 --> 00:00:58,010 and stores went down this pipe. And branches, and you've obviously done 14 00:00:58,010 --> 00:01:01,046 this pipe, and also you've obviously done this pipe. 15 00:01:01,046 --> 00:01:07,044 So where we left off last time. Was talking about alignment and how do we 16 00:01:07,044 --> 00:01:12,032 deal with fetching multiple instructions from an instruction memory or instruction 17 00:01:12,032 --> 00:01:16,097 cache but at the same time how do we not have to make this structure have many, 18 00:01:16,097 --> 00:01:21,009 many ports for instance or do we, does it have to have many, many ports. 19 00:01:21,009 --> 00:01:26,087 So, we looked at a, a basic piece of code here, which has whole lot of interesting 20 00:01:26,087 --> 00:01:30,036 alignment issues. So, at the beginning here we were 21 00:01:30,036 --> 00:01:36,023 basically fetching the instructions which were nicely aligned into the caches, 22 00:01:36,023 --> 00:01:38,065 blocks, and then we, yeah, this one was okay. 23 00:01:38,065 --> 00:01:41,077 We jumped to the beginning of the block. That was fine. 24 00:01:41,077 --> 00:01:46,040 Here, we jump to the middle of a block. And depending on how our cache was 25 00:01:46,057 --> 00:01:49,058 implemented, we might need to do, sort of, two fetches. 26 00:01:49,058 --> 00:01:53,068 Let's say you could only, read out your cache half a block at a time. 27 00:01:53,068 --> 00:01:57,073 Then you might have to do a fetch to get this and a fetch to get that. 28 00:01:57,073 --> 00:02:02,041 And then, here, you know, things were sort of cross cache lines, across, complete 29 00:02:02,041 --> 00:02:04,095 cache blocks. Which is, quite a bit harder. 30 00:02:04,095 --> 00:02:10,074 So let's, let's take a look at this. So if we have some alignment constraints, 31 00:02:10,074 --> 00:02:16,030 So let's say the alignment constraint we have here is that you can only fetch 32 00:02:16,030 --> 00:02:20,089 either from the first half, or the second half of a block at a time. 33 00:02:20,089 --> 00:02:26,064 And if you're trying to execute something which straddles a cache line, you're gonna 34 00:02:26,064 --> 00:02:32,064 have to fetch even more data. So as you can see, if you recall from this 35 00:02:32,064 --> 00:02:38,043 figure, this, this, and that instruction or piece of data in the RAM, when we go 36 00:02:38,043 --> 00:02:44,014 forward, we're basically going to be fetching extra data with that alignment 37 00:02:44,014 --> 00:02:48,021 constraint than we would have been fetching otherwise. 38 00:02:48,021 --> 00:02:53,088 This gets even harder, you know, it's, it's okay to sort of over-fetch. 39 00:02:53,088 --> 00:02:58,053 It's another thing if you actually have to sort of straddle a cache line here. 40 00:02:58,053 --> 00:03:03,029 And, cause the question comes up, "Do you, can you fetch those two at the same time 41 00:03:03,029 --> 00:03:07,027 from the cache, or not?" and we'll look through an example here. 42 00:03:07,045 --> 00:03:12,004 We're gonna look with this alignment constraint and see that, no, if you can't 43 00:03:12,004 --> 00:03:16,062 actually fetch that you're gonna be introducing what's affection, effectively 44 00:03:16,062 --> 00:03:20,090 dead cycles going down the pipe, which hurts your clocks per instruction. 45 00:03:20,090 --> 00:03:26,130 So here is the same instruction sequence that we had here, and these stalls, or not 46 00:03:26,130 --> 00:03:32,017 stalls, I mean, dead instructions that go on the pipe here are killed. 47 00:03:32,017 --> 00:03:36,093 The instructions that go down the pipe are actually just these three X's. 48 00:03:36,093 --> 00:03:42,053 So we, effectively, over-fetched, and then you know, when we go to over-fetch, let's 49 00:03:42,053 --> 00:03:48,000 say, here, we fetched 208, but we had to fetch, 20C, and that's instruction that 50 00:03:48,000 --> 00:03:51,052 shouldn't go in the pipe and not actually do anything. 51 00:03:51,052 --> 00:03:56,025 So you can see that you can actually, whip, when you have alignment constraints 52 00:03:56,025 --> 00:04:00,081 you basically can just introduce extra stall, or extra dead instructions going 53 00:04:00,081 --> 00:04:03,067 down the pipe and you're not actually using that. 54 00:04:03,085 --> 00:04:05,078 And that's not necessarily great.