1
00:00:04,075 --> 00:00:09,041
So, now we're going to start talking about
the fundamental things inside of

2
00:00:09,041 --> 00:00:13,087
instruction set architecture, and the
things you need to build inside of

3
00:00:13,087 --> 00:00:17,654
instruction set architecture.
And, we're going to start off by talking

4
00:00:17,654 --> 00:00:21,025
about machine models.
So, I have a question here.

5
00:00:21,025 --> 00:00:25,615
Where do operands come from?
And, where do operands, where do results

6
00:00:25,615 --> 00:00:28,278
go to?
So, when I say, operands, these are the

7
00:00:28,278 --> 00:00:33,443
data values you're going to operate on
with a single instruction of the processor

8
00:00:33,443 --> 00:00:39,132
and the results are where they, and where
the data that gets calculated, and where

9
00:00:39,132 --> 00:00:42,624
does it go.
And, this is actually an instruction set

10
00:00:42,624 --> 00:00:45,501
architectural, big eight architecture
concern.

11
00:00:45,501 --> 00:00:50,361
Fundamentally, you're going to have some
form of arithmetic logic unit, or some

12
00:00:50,361 --> 00:00:57,618
sort of calculation unit.
And you're going to have some type of

13
00:00:57,618 --> 00:01:01,776
memory storage.
And, in a, in a process you're probably

14
00:01:01,776 --> 00:01:07,417
going to want to take stuff from the
storage, move it to the ALU, compute on

15
00:01:07,417 --> 00:01:11,042
it, and then put it back in to storage
somehow.

16
00:01:11,093 --> 00:01:19,762
And this is, the processor here that we
wrap around the ALU, and the machine model

17
00:01:19,762 --> 00:01:27,765
is, what is the implementation, what is
the semantics, or excuse me, not the

18
00:01:27,765 --> 00:01:32,893
implementation, what is the organization
of the registers?

19
00:01:32,893 --> 00:01:36,712
How do you go to access memory?
What sort of instructions and operations

20
00:01:36,712 --> 00:01:39,729
are allowed?
And, we're going to talk a little about,

21
00:01:39,729 --> 00:01:42,750
where do the operands come from?
Where do the results go?

22
00:01:42,750 --> 00:01:45,463
And, different machine models that people
have built.

23
00:01:45,463 --> 00:01:50,135
So, this is different instruction set of
architectures and, and even more than

24
00:01:50,135 --> 00:01:54,290
instruction set architectures cuz
instruction set architectures talk about

25
00:01:54,290 --> 00:01:59,104
how you encode instructions even, this is
even a little bit more fundamental of how

26
00:01:59,104 --> 00:02:06,564
do you go about, reasoning about how to
move data from memory to the ALU and back.

27
00:02:06,564 --> 00:02:11,431
And, how do you go and store the data
close to the ALU.

28
00:02:11,431 --> 00:02:18,402
So, let's start off with a very simplistic
example here, a very simplistic processor

29
00:02:18,402 --> 00:02:25,035
that people have actually built.
Believe it or not, you don't have to have

30
00:02:25,035 --> 00:02:30,799
registers in your processor, or you don't
have to have name registers in your

31
00:02:30,799 --> 00:02:36,096
instruction set architecture.
Instead, you can have a stack.

32
00:02:37,069 --> 00:02:43,037
A stack is just a storage, element where
you put things onto the stack, and then

33
00:02:43,037 --> 00:02:49,095
they come off in the order that was the
last one that was put on that was taken,

34
00:02:49,095 --> 00:02:54,550
that gets taken off first.
Instead, in a very basic sense, we just

35
00:02:54,550 --> 00:02:59,834
take the top two things on the stack.
We fetch both of them, operate on them,

36
00:02:59,834 --> 00:03:03,499
and put then on what would be the top of
the stack.

37
00:03:03,499 --> 00:03:09,161
Building, building up from there, we can
think about something like a accumulator

38
00:03:09,161 --> 00:03:13,090
architecture.
So, typically in accumulator architecture,

39
00:03:14,013 --> 00:03:19,828
if you have one accumulator, so there's
only one register, one of the operands for

40
00:03:19,828 --> 00:03:25,305
every operation you do is always implicit.
It has to come from the accumulator.

41
00:03:25,305 --> 00:03:28,645
The other one, let's say, can come from,
from memory.

42
00:03:28,645 --> 00:03:34,138
So, you have to name one of the operands
in architecture like that.

43
00:03:34,138 --> 00:03:39,940
Building up from there, we can start
thinking about maybe register to memory

44
00:03:39,940 --> 00:03:45,342
operations where you'll name, let's say,
the oper, operand that is the source,

45
00:03:45,342 --> 00:03:50,794
let's say, coming from memory, and you'll
name, let's say, an operand coming from

46
00:03:50,794 --> 00:03:56,098
your register file.
And, optionally, you may even name the

47
00:03:56,098 --> 00:04:00,601
destination.
So, these are all, these are all options,

48
00:04:00,601 --> 00:04:04,759
and we'll call this a register memory
architecture.

49
00:04:04,759 --> 00:04:11,198
Finally, we have something like a
register, register architecture.

50
00:04:11,198 --> 00:04:17,520
So, anyone want to wager a guess here, how
many named operands do we have here?

51
00:04:17,520 --> 00:04:19,046
Three.
Okay, yeah.

52
00:04:19,046 --> 00:04:24,683
And so, this picture here, we're going to
take something from a register, something

53
00:04:24,683 --> 00:04:29,013
from another register, and we have to name
where to go put it.

54
00:04:29,051 --> 00:04:34,072
So, that's, that's, I wanted to point out
here, actually, the number of explicitly

55
00:04:34,072 --> 00:04:37,333
named operands.
For some of these, it's, it's a little bit

56
00:04:37,333 --> 00:04:41,522
more questionable than others.
So, as you see here, we have zero, we have

57
00:04:41,522 --> 00:04:45,649
a little more complicated something where
we have one, two, or three.

58
00:04:45,649 --> 00:04:48,555
Now, how do I, why do I say two or three
here?

59
00:04:48,555 --> 00:04:55,557
So, there, some of these architectures or
architectural models don't necessarily

60
00:04:55,557 --> 00:04:59,043
name the resultant, or the result
location.

61
00:04:59,043 --> 00:05:03,011
Some of them will implicitly name the
result.

62
00:05:06,030 --> 00:05:09,049
So, something like x86 for instance, the
first source operand is always the

63
00:05:09,049 --> 00:05:13,025
destination.
So, it has to only name two things.

64
00:05:13,025 --> 00:05:18,098
Something like MIPS, or more, most risk
architectures will actually name all

65
00:05:18,098 --> 00:05:22,006
three.
And you can think about that happening for

66
00:05:22,006 --> 00:05:25,000
both memory and a register, register
architecture.

67
00:05:25,000 --> 00:05:29,056
So, a good example of a three operand
memory, memory which I don't have drawn

68
00:05:29,056 --> 00:05:32,032
here.
This is just, I just have register memory

69
00:05:32,032 --> 00:05:35,216
drawn here.
But memory, memory which basically says,

70
00:05:35,216 --> 00:05:39,765
all your operands come from memory and all
your destinations are in memory is

71
00:05:39,765 --> 00:05:44,014
something like the VAX architecture which
was popular in the '70s.

72
00:05:44,014 --> 00:05:48,094
You could actually have all your operands
come from memory, do some operation on

73
00:05:48,094 --> 00:05:54,043
them, and store it back into memory.
So, one interesting trade off here is that

74
00:05:54,043 --> 00:05:59,441
once you start to have more operands that
explicitly named, you need more encoding

75
00:05:59,441 --> 00:06:01,581
space for that.
And this is one of the important

76
00:06:01,581 --> 00:06:04,393
trade-offs.
Okay.

77
00:06:04,393 --> 00:06:10,016
So, let's, let's go into a little bit of
depth about stack-based architecture.

78
00:06:10,016 --> 00:06:14,018
So, some examples of this are actually
Burrough's 5000.

79
00:06:14,042 --> 00:06:20,074
The Symbolics machines were stack-based,
and these are sort of machines from the

80
00:06:20,074 --> 00:06:23,098
past.
There's a few modern or more modern

81
00:06:23,098 --> 00:06:28,056
examples of this.
For instance, the Java Virtual Machine is

82
00:06:28,056 --> 00:06:35,033
actually a stack architecture.
And then, also, Intel's old Floating Point

83
00:06:35,033 --> 00:06:40,005
system, x87.
They've since sort of, deprecated this,

84
00:06:40,005 --> 00:06:44,074
but it's still relatively modern as a
stack-based instruction set architecture.

85
00:06:45,066 --> 00:06:52,038
Let's take a stack-based instruction set
architecture and look at how you can go

86
00:06:52,038 --> 00:06:57,812
about evaluating an expression.
So, here we have an expression, a plus b

87
00:06:57,812 --> 00:07:04,195
times c, all in parentheses, that whole
quantity, divided by a plus d times c

88
00:07:04,195 --> 00:07:07,099
minus e.
So, it's some complex math in, in,

89
00:07:07,099 --> 00:07:12,006
instruction, or math operation that we
need to do.

90
00:07:12,060 --> 00:07:19,010
Here, we actually break down this into a
parse tree of this.

91
00:07:19,010 --> 00:07:24,042
You can see, this is preserving ORS
operations.

92
00:07:24,074 --> 00:07:27,808
We have a plus, this sub-quantity here, b
times c.

93
00:07:27,808 --> 00:07:35,447
And, you take all of this and divide it by
this subexpression here.

94
00:07:35,447 --> 00:07:41,366
And, little bit of a throwback here, this
is a way that we're going to take the

95
00:07:41,366 --> 00:07:45,056
operations.
And, if we do these operations on a stack

96
00:07:45,056 --> 00:07:48,070
machine model, we're going to get the
right result.

97
00:07:48,070 --> 00:07:52,903
So, what this means is, put a on the
stack, put b on the stack, put c on the

98
00:07:52,903 --> 00:07:57,142
stack, and then multiply.
B times c, and then add the results times

99
00:07:57,142 --> 00:08:00,064
a.
And, you can see that we're going to build

100
00:08:00,064 --> 00:08:05,071
this expression on a, on a stack.
So, it's sort of a, different way to sort

101
00:08:05,071 --> 00:08:09,684
of, rethink about problems.
And, we'll, we'll walk through an example

102
00:08:09,684 --> 00:08:12,400
here.
So here, we have an evaluation stack.

103
00:08:12,400 --> 00:08:17,529
And, the top of the stack is going to be
whatever is on the top here in this

104
00:08:17,529 --> 00:08:21,980
picture.
So, first thing, we're going to do is

105
00:08:21,980 --> 00:08:26,589
we're going to say, push a.
So, a shows up on the stack.

106
00:08:26,589 --> 00:08:32,573
Then, we're going to push b, we're going
to push c, and then we're going to do a

107
00:08:32,573 --> 00:08:35,306
multiply.
So, we're going to take the two entries

108
00:08:35,306 --> 00:08:40,622
that are on the top of the stack, multiply
them and put them into this entry in the

109
00:08:40,622 --> 00:08:43,711
stack.
Then, we're going to say, add, and it's

110
00:08:43,711 --> 00:08:49,016
going to add the two top things on a stack
here, and put the result here.

111
00:08:49,016 --> 00:08:54,078
And, you can see that if you run something
like this, you can actually do full

112
00:08:54,078 --> 00:09:00,017
computations on a very small stack.
And, what's also nice here is you don't

113
00:09:00,017 --> 00:09:03,038
have to explicitly name any of the
operands.

114
00:09:03,038 --> 00:09:07,053
So, this machine model allows you to, to
run real programs.

115
00:09:08,604 --> 00:09:13,818
And, one of the things I want to get
across here is that the stack is part of

116
00:09:13,818 --> 00:09:19,888
the processor state, and it's usually, so
that's the big A instruction, or big A

117
00:09:19,888 --> 00:09:24,101
architecture, or the instruction set
architecture.

118
00:09:24,101 --> 00:09:30,047
You have a stack, and many times it's
unbounded in the big A architecture.

119
00:09:30,047 --> 00:09:35,671
But in real life, it needs to be bounded
somehow because you can't have infinitely

120
00:09:35,671 --> 00:09:40,087
long, physical stacks in your machine.
So, it's conceptually unbounded, but you

121
00:09:40,087 --> 00:09:46,029
probably want to have it overflow to main
memory if you put too many things on the

122
00:09:46,029 --> 00:09:48,083
stack.
And this is, this is an important

123
00:09:48,083 --> 00:09:53,790
characteristic cuz other times, otherwise,
there are certain computations you can't

124
00:09:53,790 --> 00:09:56,708
do.
If, for instance, the parse tree is too

125
00:09:56,708 --> 00:10:01,787
long here, or the, the depth of the tree
is too long, your stack might, might grow

126
00:10:01,787 --> 00:10:03,814
too long.
Okay.

127
00:10:03,814 --> 00:10:09,011
So, let's say, we have a
micro-architecture implementation of a

128
00:10:09,011 --> 00:10:13,621
instruction set architecture, which is a
stack-based architecture.

129
00:10:13,621 --> 00:10:19,059
And, the top two elements of the stack are
kept at registers, and the rest is in main

130
00:10:19,059 --> 00:10:23,911
memory, and it spills and fills.
Well, each push has a memory reference.

131
00:10:23,911 --> 00:10:28,507
So, when you put something on the stack,
it has a memory reference.

132
00:10:28,507 --> 00:10:34,109
And, each pop, has a memory reference cuz
you just need to put something back into

133
00:10:34,109 --> 00:10:38,356
main memory.
But, more so than that, you might have

134
00:10:38,356 --> 00:10:43,350
extra memory references here as the stack
spills over into main memory, and you

135
00:10:43,350 --> 00:10:46,918
might have to pull something back in from
main memory.

136
00:10:46,918 --> 00:10:52,595
So, that's not, not very good cuz you
might end up with a push having more than

137
00:10:52,595 --> 00:10:55,889
one number reference.
So, one optimization from a

138
00:10:55,889 --> 00:11:00,957
micro-architecture perspective, you could
think about having N elements of your

139
00:11:00,957 --> 00:11:05,578
stack in registers very close to the
processor, and only have to go access main

140
00:11:05,578 --> 00:11:08,753
memory when the register stack overflows
and underflows.

141
00:11:08,753 --> 00:11:13,364
So, instead of having to do a memory
reference, basically, every single time

142
00:11:13,364 --> 00:11:18,340
you do an operation or every single time
you do a push or a pop, you do it only

143
00:11:18,340 --> 00:11:23,337
when the stack depth gets too large, that
you can't fit everything on your stack.

144
00:11:23,337 --> 00:11:26,084
So, let's, let's walk through a brief
example here.

145
00:11:26,084 --> 00:11:31,083
We have the same expression, calculation
that we are doing before.

146
00:11:31,083 --> 00:11:36,341
And, you can see, we can do, push, push,
push, multiply, add, push, push, push,

147
00:11:36,341 --> 00:11:41,024
multiply, add, push e, subtract, and do
all the divides at the end.

148
00:11:41,024 --> 00:11:45,438
But, if we have a stack size of two,
what's going to happen here is, at the

149
00:11:45,438 --> 00:11:50,030
beginning, we're only going to, you know,
when we do a push, we're going to do a

150
00:11:50,030 --> 00:11:52,886
load from a, push we're going to do a load
from b.

151
00:11:52,886 --> 00:11:57,504
Now, we do a push of load of c, but our
stack already has two things on it.

152
00:11:57,504 --> 00:12:03,278
So, if we try to push the third thing, we
have to overflow the bottom of the stack

153
00:12:03,278 --> 00:12:06,285
somewhere.
So, we're actually going to do a stack

154
00:12:06,285 --> 00:12:11,775
save of a out to main memory.
Here, we do this multiply, and we actually

155
00:12:11,775 --> 00:12:15,034
do a stack fetch of a and get it back from
the main memory.

156
00:12:15,034 --> 00:12:21,018
So, it's a lot of extra memory operations.
So, you might want to think of a different

157
00:12:21,018 --> 00:12:24,044
micro-architecture.
And if you sort of, walk through this

158
00:12:24,044 --> 00:12:28,093
whole example here, we're going to see
that we have four stores and four fetches

159
00:12:28,093 --> 00:12:33,065
from main memory, which are all implicit,
plus the explicit ones that we're trying

160
00:12:33,065 --> 00:12:37,612
to do with the pushes and the pops.
Hm, well that's eight extra memory

161
00:12:37,612 --> 00:12:41,005
operations.
When we think about how to build a

162
00:12:41,005 --> 00:12:45,361
micro-architecture which has less, but has
the exact same instruction set

163
00:12:45,361 --> 00:12:49,069
architecture.
Well, let's say, we have a machine which

164
00:12:49,069 --> 00:12:53,049
has a stack size of four.
Well, if the stack size's a four, at the

165
00:12:53,049 --> 00:12:57,029
worse case here, we'll ever have to fit
four things on our stack.

166
00:12:57,029 --> 00:13:00,055
So, we never have to, spill out to main
memory our stack.

167
00:13:00,055 --> 00:13:05,018
Pushes and pops still have to access
memory, but that's what they're trying to

168
00:13:05,018 --> 00:13:07,079
do.
They're actually trying to access memory.

169
00:13:09,027 --> 00:13:14,722
So, to sum up here about stack-based
machine models, they look, they look

170
00:13:14,722 --> 00:13:19,054
pretty cool.
But, not, not all things are, are great at

171
00:13:19,054 --> 00:13:23,028
all times.
So, one of the, the interesting things

172
00:13:23,028 --> 00:13:27,058
here to see is, we actually push a, and we
push a again.

173
00:13:27,058 --> 00:13:32,324
So, we're doing redundant work.
And we push c, and we push C again.

174
00:13:32,324 --> 00:13:35,850
C and a have the same value both times we
push them.

175
00:13:35,850 --> 00:13:41,088
So, all of a sudden, we're doing extra
work and maybe we could have done less

176
00:13:41,088 --> 00:13:46,467
work if we had an architecture or machine
model, a big A architecture, which allows

177
00:13:46,467 --> 00:13:50,548
you to store multiple things and name
different operands.

178
00:13:50,548 --> 00:13:55,412
So, what I want to get across here is,
while a stack-based architecture is very

179
00:13:55,412 --> 00:14:00,466
simple, the instructions are very dense,
the, it may not be good for performance

180
00:14:00,466 --> 00:14:04,638
because you might end up having to reload
values multiple times.

181
00:14:04,638 --> 00:14:09,616
Versus, if you had instruction set
architecture, something like MIPS, where

182
00:14:09,616 --> 00:14:14,717
you actually have 32 general purpose
registers and you can name any register

183
00:14:14,717 --> 00:14:20,217
for any instruction, you could have loaded
a, b, c, d, and e all into the register

184
00:14:20,217 --> 00:14:26,001
space, once, and then, done all your
operations and never have had to reload.

185
00:14:26,001 --> 00:14:31,637
And, this is actually an instruction sett
issue and not, or a fundamental machine

186
00:14:31,637 --> 00:14:35,065
model issue, and not a micro-architecture
issue.

187
00:14:37,074 --> 00:14:40,318
Okay.
So, to summarize different machine models,

188
00:14:40,318 --> 00:14:45,310
we have our stack, accumulator, register
and memory, and register, register,

189
00:14:45,310 --> 00:14:51,231
sometimes called load store architecture.
If we want to take some expression here,

190
00:14:51,231 --> 00:14:57,119
like, c equals a plus b, we can look at
the instructions that would have to be

191
00:14:57,119 --> 00:15:01,976
generated on an abstract architecture.
So, here, we're going to have push a, push

192
00:15:01,976 --> 00:15:06,037
b, add, pop c.
As we add more naming, we actually can,

193
00:15:06,037 --> 00:15:11,011
potentially, have fewer registers, or
fewer instructions, excuse me.

194
00:15:11,011 --> 00:15:15,018
So, we load a, add b.
Note here, we don't have to name what

195
00:15:15,018 --> 00:15:19,026
we're adding with cuz we're adding with
the accumulator.

196
00:15:19,026 --> 00:15:23,052
And then, store c.
Start to get a little bit more compact.

197
00:15:23,052 --> 00:15:29,066
If we have register memory, we can load
one of the values, add, store, it's pretty

198
00:15:29,066 --> 00:15:33,000
simple.
If we want to load store architecture, we

199
00:15:33,000 --> 00:15:36,035
actually have to do a little bit more
work.

200
00:15:36,035 --> 00:15:41,097
We have to load, load, add, store.
But, which, which seems inefficient, but

201
00:15:41,097 --> 00:15:47,056
if you have to reuse any of these values
a, b, you don't have to go reload them.

202
00:15:47,056 --> 00:16:06,056
Versus, in these two architectures you
have to go reload the value.