1
00:00:02,050 --> 00:00:05,048
Okay.
So now, we're going to change topics and

2
00:00:05,048 --> 00:00:09,013
start talking about our first technical
subject of this course.

3
00:00:09,013 --> 00:00:14,253
And, as an introduction to computer
architecture, we're going to be talking

4
00:00:14,253 --> 00:00:17,905
about what is architecture versus
microarchitecture.

5
00:00:17,905 --> 00:00:22,031
And, I want to just briefly say that, as
you take this class, the first three

6
00:00:22,031 --> 00:00:25,585
lectures or so should be review.
So, if you're sitting in the class and

7
00:00:25,585 --> 00:00:27,528
you're saying, oh, I've seen all this
before.

8
00:00:27,528 --> 00:00:30,763
Don't get up.
Wait 'till the fourth or fifth lecture,

9
00:00:30,763 --> 00:00:34,087
and then the content will become new.
And this is because I want to teach

10
00:00:34,087 --> 00:00:38,020
everything from first principles and get
everyone up to speed.

11
00:00:38,020 --> 00:00:41,052
But, it's that, the first three lectures
are going to go very fast.

12
00:00:41,052 --> 00:00:45,049
So, if you're lost in the first three
lectures, which should be review, then

13
00:00:45,049 --> 00:00:50,897
that's probably a bad in, indicator.
So, we'll start off by talking about

14
00:00:50,897 --> 00:00:57,858
architecture versus micro-architecture.
And I wanted to say briefly what I mean by

15
00:00:57,858 --> 00:01:01,972
architecture.
And I, I have, in this slide here, a very

16
00:01:01,972 --> 00:01:06,622
large A for what I'll sometimes call, big
A architecture.

17
00:01:06,622 --> 00:01:13,881
So, your, Patterson Hennessy calls this,
instruction set architecture, and when I

18
00:01:13,881 --> 00:01:20,098
contrast this with micro architecture, or
Patterson Hennessy calls organization.

19
00:01:21,054 --> 00:01:26,007
So, big A architecture is an abstraction
layer provided to software, or

20
00:01:26,007 --> 00:01:30,081
instructions set architectures or
abstraction layer provided to software

21
00:01:30,081 --> 00:01:37,051
which is designed to not change very much.
And, it doesn't say, it, it says how a

22
00:01:37,051 --> 00:01:43,084
theoretical fundamental, sort of, machine
executes programs.

23
00:01:44,023 --> 00:01:52,205
It does not say exactly the size of
different structures, how fast those

24
00:01:52,205 --> 00:01:58,350
things would run, the exact implementation
issues, that falls into organization.

25
00:01:58,350 --> 00:02:04,408
And, one of the things I wanted to
emphasize is that computer architecture is

26
00:02:04,408 --> 00:02:08,422
all about trade-offs.
So, when I say it's all about tradeoffs,

27
00:02:08,422 --> 00:02:12,960
you can make different design decisions up
here in the big A architecture or the

28
00:02:12,960 --> 00:02:17,840
instruction set architecture, and that'll
influence the application or influence the

29
00:02:17,840 --> 00:02:22,299
microarchitecture, but also you can make
different design decisions down here and

30
00:02:22,299 --> 00:02:26,278
make a lot of different tradeoffs on how
to go about implementing a particular

31
00:02:26,278 --> 00:02:30,044
instruction set architecture.
And, largely, when you go to look at

32
00:02:30,044 --> 00:02:34,055
computer architecture and computer
architecture implementation, the design

33
00:02:34,055 --> 00:02:38,010
space is relatively flat.
There's sort of an optimum point where

34
00:02:38,010 --> 00:02:42,032
you, you want to be, but the other points
around it are many times not horribly,

35
00:02:42,032 --> 00:02:45,027
horribly bad.
Though there are, you know, at the, at the

36
00:02:45,027 --> 00:02:47,093
extremes, probably horribly bad design
decisions.

37
00:02:47,093 --> 00:02:52,333
But, you know, a lot of different design
points are, are equally good or, or close

38
00:02:52,333 --> 00:02:56,192
to optimal.
And, the job of a computer architect is to

39
00:02:56,192 --> 00:03:01,554
make the very subtle design decisions
around how do you move around this point

40
00:03:01,554 --> 00:03:06,563
to make it both easier to program, lives
on for many years, is low power, and this

41
00:03:06,563 --> 00:03:12,123
sort of other, a little bit of aesthetic
characteristics mixed together with just

42
00:03:12,123 --> 00:03:15,657
making your computer processor go fast,
we'll say.

43
00:03:15,657 --> 00:03:20,790
And these tradeoffs, I, I will re, will
reiterate this over and over again in this

44
00:03:20,790 --> 00:03:25,045
class that, because there is multiple
different metrics.

45
00:03:25,045 --> 00:03:29,507
So, for instance, speed, energy, cost, and
they tradeoff against each other, many

46
00:03:29,507 --> 00:03:33,009
times, there is no necessary optimal
point.

47
00:03:33,009 --> 00:03:38,024
It depends on, you know, are you more cost
driven, or energy driven, or speed driven.

48
00:03:38,024 --> 00:03:43,052
And, within that point, there's sort of
some times Pareto optical curves where all

49
00:03:43,052 --> 00:03:48,074
of the points are, are equally good if
you're trying to trade off these different

50
00:03:48,074 --> 00:03:50,639
things for different cost models.
Okay.

51
00:03:50,639 --> 00:03:56,434
So, let's, let's talk about what is a
instruction set architecture, and what is

52
00:03:56,434 --> 00:04:01,599
a microarchitecture.
So, a instruction set architecture, or big

53
00:04:01,599 --> 00:04:08,091
A architecture is trying to provide the
programmer some abstract machine model.

54
00:04:08,091 --> 00:04:13,964
And many times what it, what it really
boils to is it's all the programmer

55
00:04:13,964 --> 00:04:18,103
visible state.
So, for instance, how, does the machine

56
00:04:18,103 --> 00:04:21,003
have memory?
Does it have registers?

57
00:04:21,003 --> 00:04:24,868
So that's the, that's the programmer
visible state.

58
00:04:24,868 --> 00:04:29,921
It also encompasses the fundamental
operations that the computer can run, so

59
00:04:29,921 --> 00:04:35,063
these are called instructions.
And, it defines the instructions and how

60
00:04:35,063 --> 00:04:37,348
they operate.
So, for instance, add.

61
00:04:37,348 --> 00:04:42,782
Add might be a fundamental instruction or
fundamental operation in your compu,

62
00:04:42,782 --> 00:04:49,063
instructional set architecture.
And, it says, the exact semantics on how

63
00:04:49,063 --> 00:04:55,386
to take one word in a register and add it
to another word in a register, and where

64
00:04:55,386 --> 00:05:00,068
it ends, ends up.
Then, there's more complicated execution

65
00:05:00,068 --> 00:05:04,053
semantics.
So, what do we mean by execution

66
00:05:04,053 --> 00:05:07,236
semantics?
Well, if you just say adds take two

67
00:05:07,236 --> 00:05:11,814
numbers and add them together and put them
in another register, that many times does

68
00:05:11,814 --> 00:05:15,033
not encompass all of the instruction set
architecture.

69
00:05:15,033 --> 00:05:19,061
You'll have other things going on, for
instance, IO interrupts, and you have to

70
00:05:19,061 --> 00:05:23,023
define in your instructions set
architecture, or your big A computer

71
00:05:23,023 --> 00:05:26,792
architecture what is the exact semantics
of an interrupter, a instruction, or a

72
00:05:26,792 --> 00:05:31,036
piece of data coming in on an IO.
How does that interact with the rest of

73
00:05:31,036 --> 00:05:34,025
the processor?
So, many times instruction execution

74
00:05:34,025 --> 00:05:38,049
semantics is only half of i, and we have
to worry about is the, the rest of the

75
00:05:38,049 --> 00:05:44,032
machine execution semantics.
Big A architecture has to define how the

76
00:05:44,032 --> 00:05:49,307
inputs and outputs work.
And finally, it has to define the data

77
00:05:49,307 --> 00:05:54,387
types and the sizes of the fundamental,
the, the fundamental data words that you

78
00:05:54,387 --> 00:05:58,248
operate on.
So, for instance, if you operate on a byte

79
00:05:58,248 --> 00:06:01,902
at a time, four bytes at a time, two bytes
at a time.

80
00:06:01,902 --> 00:06:05,763
How big is a byte if you actually have
bytes?

81
00:06:05,763 --> 00:06:12,044
So, this just gets into sizes.
And then, data types here might mean that

82
00:06:12,044 --> 00:06:17,850
you have other types of fundamental data.
So, for instance, the most basic one is

83
00:06:17,850 --> 00:06:23,232
you have just some bits sitting on, on, in
a, in a register in your processor.

84
00:06:23,232 --> 00:06:28,996
But, it could be much more complex so you
can have, for instance, something like

85
00:06:28,996 --> 00:06:33,732
floating point numbers.
Where it's not just a bunch of bits, it's

86
00:06:33,732 --> 00:06:38,195
bits formatted in a particular way, and
has very specific meaning.

87
00:06:38,195 --> 00:06:43,971
That's a floating point number that can
range over, let's say, most of the, the

88
00:06:43,971 --> 00:06:45,480
real numbers.
Okay.

89
00:06:45,480 --> 00:06:51,000
So, in today's lecture, we're going to,
step through all these different

90
00:06:51,081 --> 00:06:56,076
characteristics and requirements of
building an instruction set architecture.

91
00:06:56,076 --> 00:07:02,002
I wanted to, I will talk about how it's
different than microarchitecture or

92
00:07:02,002 --> 00:07:05,004
organization.
So, let's take up some examples of

93
00:07:05,004 --> 00:07:10,025
microarchitecture and organization.
So, what microarchitecture and

94
00:07:10,025 --> 00:07:15,500
organization is really thinking about here
is the tradeoffs as you're going to

95
00:07:15,500 --> 00:07:19,093
implement a fixed instruction set
architecture.

96
00:07:19,093 --> 00:07:26,047
So, for instance, something like Intel's
x86 is an instruction set architecture.

97
00:07:26,047 --> 00:07:30,000
And there's many different
microarchitecture implementations.

98
00:07:30,000 --> 00:07:34,034
There's the AMD versions of the chips, and
then there's the Intel versions of the

99
00:07:34,034 --> 00:07:37,372
chips, and even inside of, let's say, the
Intel versions of the chips.

100
00:07:37,372 --> 00:07:42,120
They have their high performance version
for the laptop which looks one way, or, or

101
00:07:42,120 --> 00:07:46,375
high performance version for, let's say, a
server or a high end laptop which looks

102
00:07:46,375 --> 00:07:48,518
one way.
And then, there's another chip for

103
00:07:48,518 --> 00:07:51,079
tablets.
Intel's trying to chips for tablets these

104
00:07:51,079 --> 00:07:56,035
days and they have their Atom processors.
And, internally, they look very different

105
00:07:56,035 --> 00:07:59,062
cuz they have very different speed,
energy, cost, tradeoffs.

106
00:07:59,098 --> 00:08:05,864
But, they'll execute the same code, and
they all implement the same instruction

107
00:08:05,864 --> 00:08:09,750
set architecture.
So, let's look at some examples of things

108
00:08:09,750 --> 00:08:14,010
that you might tradeoff in a
microarchitecture.

109
00:08:14,010 --> 00:08:18,224
So, you might have different pipeline
depth, numbers of pipelines.

110
00:08:18,224 --> 00:08:24,459
So, you might have one processor pipeline,
or you might have six , like something

111
00:08:24,459 --> 00:08:30,677
like the Core i7's today, cache sizes, how
big the chip is, the silicone area, how,

112
00:08:30,677 --> 00:08:34,011
what's your peak power.
Execution ordering.

113
00:08:34,011 --> 00:08:38,059
Well, does the code run in order, or can
you execute the code out of order?

114
00:08:38,059 --> 00:08:41,061
That's right.
It is possible to take a sequential

115
00:08:41,061 --> 00:08:46,016
program, and actually execute later
portions of the program before earlier

116
00:08:46,016 --> 00:08:50,009
portions of the program.
That's kind of mind boggling, but it's a

117
00:08:50,009 --> 00:08:54,575
way to go about getting parallelism.
And if you keep your ordering correct,

118
00:08:54,575 --> 00:08:58,657
things, things, work out.
Bus widths, ALU widths, if you, if you

119
00:08:58,657 --> 00:09:03,523
have, let's say, 64-bit machine, you can
actually go and implement that as a bunch

120
00:09:03,523 --> 00:09:08,095
of 1-bit adder, for instance, and people
have done things like that in the micro

121
00:09:08,095 --> 00:09:11,185
architecture.
And, this allows you to build more

122
00:09:11,185 --> 00:09:15,024
expensive or less expensive versions of
the same processor.

123
00:09:16,077 --> 00:09:22,048
So, let's talk about the history of why we
came up with these two differentiations

124
00:09:22,048 --> 00:09:25,048
between architecture and
microarchitecture.

125
00:09:25,084 --> 00:09:31,346
And, it came about, because software is
sort of, pushed it on us and ended up

126
00:09:31,346 --> 00:09:40,113
being a nice abstraction layer.
So, back in the early '50s, late '40s, you

127
00:09:40,113 --> 00:09:45,997
had software that people mostly programmed
either in assembly language, or machine

128
00:09:45,997 --> 00:09:48,720
code language.
So, you had to write ones and zeros, or

129
00:09:48,720 --> 00:09:52,508
you had to write assembly code.
And, sometime in the, the mid '50s we

130
00:09:52,508 --> 00:09:57,055
started to see library showoffs.
So, these are sort of, floating point

131
00:09:57,055 --> 00:10:01,588
operations were made easier, we had
transcendentals as the sine, cosine

132
00:10:01,588 --> 00:10:05,166
libraries, you had some matrix and
equation solvers.

133
00:10:05,166 --> 00:10:10,163
And, you started to see some libraries
that people could call, but people were

134
00:10:10,163 --> 00:10:15,070
not necessarily writing code by themselves
or writing large bodies of code in

135
00:10:15,070 --> 00:10:18,794
assembly programming because it's, it was
pretty painful.

136
00:10:18,794 --> 00:10:24,321
And then, at some point, there was the
invention of higher-level languages.

137
00:10:24,321 --> 00:10:30,233
So, a good example of this was Fortran
that came out in 1956, and a lot of things

138
00:10:30,233 --> 00:10:34,434
came along with this.
We had assemblers, loaders, linkers,

139
00:10:34,434 --> 00:10:40,281
compilers, bunch of other software to
track how your software's being used even.

140
00:10:40,281 --> 00:10:47,294
And, because we started to see these
higher-level languages, this started to

141
00:10:47,294 --> 00:10:54,065
give some portability to programming.
It wasn't that you had to write your

142
00:10:54,065 --> 00:10:58,046
program and have it only mapped to one
prog, one processor ever.

143
00:10:58,091 --> 00:11:03,191
And, back in the, the, the '50s, even '60s
time frame here, machines required

144
00:11:03,191 --> 00:11:06,989
experienced operators who could write the
programs.

145
00:11:06,989 --> 00:11:12,485
And, you know, you, you got these machines
and they had to be sold with a lot of

146
00:11:12,485 --> 00:11:17,946
software along with them so you had to,
basically, run all the software that was

147
00:11:17,946 --> 00:11:23,065
given cuz it was, you had to be a, a
master programmer or someone who worked

148
00:11:23,065 --> 00:11:28,230
for the company to even, that built the
machines to even be able to program these

149
00:11:28,230 --> 00:11:33,398
machines back in, in the day.
And, the idea of instruction set

150
00:11:33,398 --> 00:11:38,935
architectures, and these breaking the
microarchitecture from the architecture

151
00:11:38,935 --> 00:11:46,035
didn't really exist back then.
And, back in the early '60s, IBM had four

152
00:11:46,035 --> 00:11:51,154
different product lines.
And, they're all incompatible.

153
00:11:51,154 --> 00:11:55,287
So, you couldn't run code that you ran on
one on the other.

154
00:11:55,287 --> 00:12:00,574
So, to give you an example here, the, the
IBM 701 was for scientific computing.

155
00:12:00,574 --> 00:12:03,996
The, the 1401 was mostly for business
computation.

156
00:12:03,996 --> 00:12:09,860
I think they even had a second one that
was sort of for business, but different

157
00:12:09,860 --> 00:12:14,459
types of business computation.
And, people sort of, bought into a line.

158
00:12:14,459 --> 00:12:19,164
And then, as you, as the line matured and
developed, they had to either rewrite

159
00:12:19,164 --> 00:12:21,801
their code, or they had to stick into one
line.

160
00:12:21,801 --> 00:12:26,957
But, IBM had some, had some crazy insights
here is that, they didn't want to have to,

161
00:12:26,957 --> 00:12:31,233
when they went to the next generation of
processor, they wanted one to propagate

162
00:12:31,233 --> 00:12:34,613
these four lines.
They wanted to try to unify the four

163
00:12:34,613 --> 00:12:37,493
lines.
But, one of the problems was, these

164
00:12:37,493 --> 00:12:42,567
different lines had very different
implementations and different cross

165
00:12:42,567 --> 00:12:45,390
points.
So, the thing you were building for

166
00:12:45,390 --> 00:12:50,496
scientific computing wasn't necessarily
the thing you want to build for business

167
00:12:50,496 --> 00:12:54,041
computing.
And, the one that you built for business

168
00:12:54,041 --> 00:12:59,707
computing, let's say, didn't, you wanted
to not have it have very good floating

169
00:12:59,707 --> 00:13:03,686
point performance.
So, how do, how do they go about solving

170
00:13:03,686 --> 00:13:06,263
this?
And their solution was they came up

171
00:13:06,263 --> 00:13:11,337
something called the IBM 360.
And, the IBM 360 is probably the first

172
00:13:11,337 --> 00:13:17,325
true instruction set architecture that was
implemented to be instruction set

173
00:13:17,325 --> 00:13:20,831
architecture.
And, the idea here was they wanted to

174
00:13:20,831 --> 00:13:26,718
unify all these product lines into one
platform, but then implement different

175
00:13:26,718 --> 00:13:31,072
versions that were specialized for the
different market matrix.

176
00:13:32,012 --> 00:13:37,096
So, they can build, they could unify a lot
of their software system, unify a lot of

177
00:13:37,096 --> 00:13:41,064
what they built, but still build different
versions.

178
00:13:41,064 --> 00:13:46,677
So, let's, let's take a look at the IBM
360 Instruction Set Architecture, and then

179
00:13:46,677 --> 00:13:52,048
talk about different microarchitectures
that have been built of the IBM 360.

180
00:13:53,015 --> 00:13:58,052
So, the IBM 360 is a general purpose
register machine, and we'll talk more

181
00:13:58,052 --> 00:14:04,019
about that later in this lecture.
But, to give you an idea, this is what the

182
00:14:04,019 --> 00:14:07,027
programmer saw, or what the software
system saw.

183
00:14:07,027 --> 00:14:12,079
This isn't what was actually built in the
hardware, because that would be a

184
00:14:12,079 --> 00:14:17,086
microarchitecture constraint.
But, the processor state had sixteen

185
00:14:17,086 --> 00:14:22,267
general purpose 32-bit registers.
It had four floating point registers.

186
00:14:22,859 --> 00:14:30,051
It had control, flags if you will, had a,
a condition codes and control flags.

187
00:14:30,051 --> 00:14:34,200
And, it was a 24-bit address machine, and
at the time that was huge.

188
00:14:34,200 --> 00:14:39,856
So, two to the 24 was a very large number.
Nowadays, it's not so large and they've

189
00:14:39,856 --> 00:14:42,982
since expanded that on the IBM 360
successors.

190
00:14:42,982 --> 00:14:48,381
But , they thought it was good for many,
many years, and it was good for many, many

191
00:14:48,381 --> 00:14:52,064
years.
And they define a bunch of different data

192
00:14:52,064 --> 00:14:55,059
formats.
So, there's 8-bit bytes, 16-bit half

193
00:14:55,059 --> 00:15:01,016
words, 32-bit words, 64-bit double words.
And these were the fundamental data types

194
00:15:01,016 --> 00:15:06,045
that you can work on, and you can name
these different fundamental data types.

195
00:15:06,045 --> 00:15:12,001
And, it was actually the IBM 360 that came
up with this idea that bytes should be

196
00:15:12,001 --> 00:15:18,106
8-bits long, and that's lived on, on to,
for today, Cuz before that, we had lots of

197
00:15:18,106 --> 00:15:24,033
different choices.
There was binary code decimal systems

198
00:15:24,033 --> 00:15:29,095
where the, you actually would encode a
number between zero and nine and then you

199
00:15:29,095 --> 00:15:34,048
have the, each digits and this is
sometimes good for, sort of, spreadsheet

200
00:15:34,048 --> 00:15:39,045
calculations, or business calculations, or
if you want to be very precise on your

201
00:15:39,045 --> 00:15:43,080
rounding to the penny.
And sometimes, bit-based things don't

202
00:15:43,080 --> 00:15:47,077
actually round appropriately or the, do a,
or you'll lose pennies off the end.

203
00:15:47,077 --> 00:15:51,631
And, so you have these binary code decimal
systems and, well, in IBM 360, they, they

204
00:15:51,631 --> 00:15:56,594
unified it all and said, well, no, we're
going to throw out certain things and make

205
00:15:56,594 --> 00:15:59,867
choices.
Now, they, of course, because it's the IBM

206
00:15:59,867 --> 00:16:04,187
360 and they did have business
applications, they still supported binary

207
00:16:04,187 --> 00:16:09,998
code and decimal in a, a certain way.
And, let's look at the microarchitecture

208
00:16:09,998 --> 00:16:14,700
implementations of this first instruction
set architecture.

209
00:16:14,700 --> 00:16:20,105
So, at, and this is in the same time
frame, the same generation here.

210
00:16:20,105 --> 00:16:25,587
There was the model 30 and the model 70
and this was very, very different

211
00:16:25,587 --> 00:16:30,647
performance characteristics.
So, if we, we look at the machine, let's

212
00:16:30,647 --> 00:16:35,781
start off by looking at the storage.
The, the low end model here had between

213
00:16:35,781 --> 00:16:42,001
eight and 64 kilobytes, and the high end
model had between 256 and 512 kilobytes.

214
00:16:42,001 --> 00:16:47,007
So, very, very different sizes.
And, this is what I'm trying to get across

215
00:16:47,007 --> 00:16:51,955
here is that microarchitecture can
actually change quite a bit even though

216
00:16:51,955 --> 00:16:57,698
the architecture supports 64-bit adds in
additions, you can actually implement

217
00:16:57,698 --> 00:17:02,038
different size data paths.
So, in the low end machine, they had an

218
00:17:02,038 --> 00:17:07,544
8-bit data path, and for ones that use
64-bit operation, it had to do eight,

219
00:17:07,545 --> 00:17:10,582
8-bit operations to make up a 64-bit
operation.

220
00:17:10,582 --> 00:17:15,801
And then, probably, actually even do more
than that to handle all the carries

221
00:17:15,801 --> 00:17:20,746
correctly, versus the high-end
implementation had a full adder there.

222
00:17:20,746 --> 00:17:26,993
You can actually do a 64-bit add by itself
without having to do lots of

223
00:17:26,993 --> 00:17:32,635
micro-sequenced operations.
And, oh, yes, with minor modifications, it

224
00:17:32,635 --> 00:17:36,645
lives on today.
So, this was designed in the '60s, and

225
00:17:36,645 --> 00:17:40,867
even today we still have System 360
derivative machines.

226
00:17:40,867 --> 00:17:47,247
And the piece of code you ran, or you
wrote back in 1965, will still run on

227
00:17:47,247 --> 00:17:52,377
these machines today, which is pretty,
pretty amazing, natively.

228
00:17:52,377 --> 00:18:00,248
So, how does this survive on today?
So, here's actually, the IBM 360 47 years

229
00:18:00,248 --> 00:18:08,702
later as in the Z11 microprocessor.
So, the IBM 360 has since, it renamed to

230
00:18:08,702 --> 00:18:14,899
the IBM 370, and then it has been renamed
to the IBM 370EX which was in the '80.

231
00:18:14,899 --> 00:18:18,544
There was never any IBM 380, strangely
enough.

232
00:18:18,544 --> 00:18:23,571
And then, later on, they just changed the
name to the Z series.

233
00:18:23,571 --> 00:18:29,256
So, have a, a cooler modeling, model
numbers here so we had the IBM Z series

234
00:18:29,256 --> 00:18:35,351
processors, and this lives on today.
So, going back to that 8-bit processor

235
00:18:35,351 --> 00:18:42,019
which had a one microsecond control store
read, which is forever, we now have the

236
00:18:42,019 --> 00:18:47,753
Z11 which is running at 5.2 gigahertz.
It has 1.4 billion transistors.

237
00:18:47,753 --> 00:18:55,028
They, they have updated the addressing so
it's no longer 24-bit addressing, but it

238
00:18:55,028 --> 00:18:59,037
still supports the original 360
addressing.

239
00:18:59,037 --> 00:19:08,027
It has four cores, out of order issue, out
of order memory system, big caches on, on

240
00:19:08,027 --> 00:19:14,405
chip, 24 megabytes of your L3 cache.
And, you can even put multiple of these

241
00:19:14,405 --> 00:19:20,771
together to build a multiprocessor system
out of lots and lots of multicores.

242
00:19:20,771 --> 00:19:26,364
And, what I'm trying to get across here is
that, if you go forward over time and you

243
00:19:26,364 --> 00:19:29,092
build your instruction set architecture
correct, it can live on.

244
00:19:29,092 --> 00:19:33,405
And you have many different
microarchitecture implementations and

245
00:19:33,405 --> 00:19:41,245
still leverage the same software.
And, a few, few more examples just to, to

246
00:19:41,245 --> 00:19:45,082
reinforces a little bit more.
Let's take a look at an example of

247
00:19:45,082 --> 00:19:49,079
something where you have the same
architecture but different

248
00:19:49,079 --> 00:19:54,036
microarchitectures.
So, here we have the AMD Phenom X4, and

249
00:19:54,036 --> 00:19:57,517
here we have the Atom, Intel Atom
processor.

250
00:19:57,517 --> 00:20:03,482
The first Intel Atom processor.
And, what you'll notice, actually, is that

251
00:20:03,482 --> 00:20:07,912
they have the exact same instruction set
architecture.

252
00:20:07,912 --> 00:20:14,085
They both run x86 code.
And, the Zion implementations, this is,

253
00:20:14,085 --> 00:20:18,557
just to point out here, these are the same
time frames.

254
00:20:18,557 --> 00:20:23,063
So, this is a modern, modern, roughly,
modern day processors.

255
00:20:23,063 --> 00:20:29,083
This one has four cores, 125 watts.
Here, we have, single core two watts.

256
00:20:29,083 --> 00:20:35,059
So, there's design tradeoffs.
So, you're going to want to build

257
00:20:35,059 --> 00:20:42,083
different processors in the same design
technology, we'll say, but with very

258
00:20:42,083 --> 00:20:46,223
different cost, power, performance
tradeoffs.

259
00:20:46,223 --> 00:20:52,666
This one can decode three instructions.
This one can decode two instructions so

260
00:20:52,666 --> 00:20:55,803
it's a different micro architecture
difference.

261
00:20:55,803 --> 00:21:00,751
This one has a 64 kilobyte cache.
L1 is good as a 32 kilobyte L1i cache.

262
00:21:00,751 --> 00:21:06,436
Very different cache sizes, even though
they're employing the same architecture,

263
00:21:06,436 --> 00:21:10,951
or big A architecture.
Strangely enough, they have the same L2

264
00:21:10,951 --> 00:21:15,653
size, you know, things happen.
This ones out of order versus in order,

265
00:21:15,653 --> 00:21:24,263
and clock speeds are very different.
And, I want to contrast this with

266
00:21:24,263 --> 00:21:30,888
different architecture, or different big A
architecture, and different micro

267
00:21:30,888 --> 00:21:34,809
architecture.
So, if we think about some different

268
00:21:34,809 --> 00:21:40,351
examples of instruction set architectures,
there's x86, there's PowerPC, there's IBM

269
00:21:40,351 --> 00:21:45,536
360, there's Alpha, there's ARM.
You've probably heard all these different

270
00:21:45,536 --> 00:21:49,068
names, and these are different instruction
set architectures.

271
00:21:49,068 --> 00:21:54,093
So, you can't run the same software on
those two different instruction set

272
00:21:54,093 --> 00:21:58,020
architectures.
So, here we have an example of two

273
00:21:58,020 --> 00:22:03,059
different instruction set architectures
with two different microarchitectures.

274
00:22:03,059 --> 00:22:08,998
So, we have the Phenom X4 here, versus the
IBM Power seven.

275
00:22:09,001 --> 00:22:14,176
And, we already talked about the, the X4
here, but the Power seven has the power

276
00:22:14,176 --> 00:22:18,863
instruction set, which is different than
the x86 instruction set.

277
00:22:18,863 --> 00:22:24,621
So, you can't run one piece of code that's
compiled for this over here, and vice

278
00:22:24,621 --> 00:22:28,866
versa.
And, the microarchitectures are different.

279
00:22:28,866 --> 00:22:33,557
So, here, we have eight core, 200 watts,
can decode six instructions per cycle.

280
00:22:33,557 --> 00:22:39,325
Wow, this is a, a pretty beefy processor.
It's also out of order and has the same

281
00:22:39,325 --> 00:22:43,544
clock frequency.
Something that I, that can also happen is

282
00:22:43,544 --> 00:22:48,757
you can end up with architectures where
you have different instruction set

283
00:22:48,757 --> 00:22:52,821
architecture, or different big A
architecture, but almost the same

284
00:22:52,821 --> 00:22:56,481
microarchitecture.
And, this, this does, this does happen.

285
00:22:56,481 --> 00:23:01,779
So , you end up with, let's say, two
processors that are both three wide issue,

286
00:23:01,779 --> 00:23:07,044
same cache sizes, but, let's say, one of
the implements PowerPC and the other one

287
00:23:07,044 --> 00:23:10,572
implements x86.
And things, things like that do happen.

288
00:23:10,572 --> 00:23:15,364
That's more of a coincidence, but I'm
trying to get across the idea that many

289
00:23:15,364 --> 00:23:19,933
times the, that the microarchitectures can
be the same and those are more tradeoffs

290
00:23:19,933 --> 00:23:31,889
considerations versus the instruction set
architecture which is more of a software

291
00:23:31,889 --> 00:23:39,075
programming design constraint.