1
00:00:04,420 --> 00:00:07,080
In general game planning, a player may
choose to make

2
00:00:07,080 --> 00:00:09,520
assumptions about the actions of the other
players or not.

3
00:00:11,010 --> 00:00:15,010
For example, a player might assume that
the other players are behaving rationally.

4
00:00:16,230 --> 00:00:18,940
By eliminating irrational actions on the
part of other players a

5
00:00:18,940 --> 00:00:21,409
player can decrease the number of
possibilities it needs to consider.

6
00:00:23,320 --> 00:00:26,890
Unfortunately, in general game playing, at
least as currently constituted, no

7
00:00:26,890 --> 00:00:30,600
player knows the identity of the other
players or their characteristics.

8
00:00:30,600 --> 00:00:33,390
The other players might be irrational, or
they

9
00:00:33,390 --> 00:00:35,160
might behave the same as the player
itself.

10
00:00:36,350 --> 00:00:39,640
Players might even be down, not
functioning correctly.

11
00:00:39,640 --> 00:00:41,430
Since there's no information about the
other players,

12
00:00:41,430 --> 00:00:44,705
many general game players take a
pessimistic approach.

13
00:00:44,705 --> 00:00:47,510
They assume that the other players will
perform the worst possible actions.

14
00:00:47,510 --> 00:00:52,720
This pessimistic approach is the basis for
a game playing technique called minimax.

15
00:00:55,840 --> 00:00:59,420
Basic idea of minimax is to select moves
that are guaranteed

16
00:00:59,420 --> 00:01:03,399
to produce the highest possible return, no
matter what the opponents do.

17
00:01:04,950 --> 00:01:07,320
Player tries to maximize its own value and
assumes that the

18
00:01:07,320 --> 00:01:11,530
opponents are trying to minimize its
value, hence the name minimax.

19
00:01:11,530 --> 00:01:12,030
In

20
00:01:18,220 --> 00:01:20,570
the case of a one step game, minimax
chooses

21
00:01:20,570 --> 00:01:22,820
an action such that the value of the
resulting state

22
00:01:22,820 --> 00:01:24,610
for any opponent is greater than or equal
to

23
00:01:24,610 --> 00:01:27,010
the value of the resulting state for any
other action.

24
00:01:27,010 --> 00:01:29,030
In the case of a multi-step game, minimax
goes

25
00:01:29,030 --> 00:01:31,700
to the end of the game, and backs up
values.

26
00:01:31,700 --> 00:01:33,930
Generally, we can think about minimax as a
search of a

27
00:01:33,930 --> 00:01:39,880
bipartite tree, consisting of alternating
max nodes, shown here as gray squares.

28
00:01:39,880 --> 00:01:42,940
And min nodes, show her as beige circles.

29
00:01:44,210 --> 00:01:46,600
The max nodes represent the choices of the
player, while

30
00:01:46,600 --> 00:01:48,970
the min nodes represent the choices of the
other players.

31
00:01:50,440 --> 00:01:52,980
Now, in the case of games with more than
two players, it

32
00:01:52,980 --> 00:01:56,180
can be multiple layers of min nodes
between each layer of max nodes.

33
00:01:56,180 --> 00:01:57,230
One layer for each opponent.

34
00:01:59,290 --> 00:02:01,825
now, also in looking at this tree, note
that although

35
00:02:01,825 --> 00:02:04,340
we've separated the choices of the player
and its opponents.

36
00:02:04,340 --> 00:02:06,630
This does not mean that the play
alternates between

37
00:02:06,630 --> 00:02:10,110
the opponents or that the opponents know
the player's action.

38
00:02:10,110 --> 00:02:13,460
Player and opponents make their choices
and then simultaneously, with knowledge of

39
00:02:13,460 --> 00:02:16,110
each others, and, and simultaneously
without

40
00:02:16,110 --> 00:02:17,280
the knowledge of each others' choices.

41
00:02:19,900 --> 00:02:24,270
Okay, the value of a max node for a player
is the utility the

42
00:02:24,270 --> 00:02:25,930
goal, the value, the reward at the

43
00:02:25,930 --> 00:02:28,090
corresponding state, if that state is
terminal.

44
00:02:29,940 --> 00:02:32,020
Otherwise, it's the maximum of the values
of

45
00:02:32,020 --> 00:02:35,200
the min nodes that result from its legal
actions.

46
00:02:35,200 --> 00:02:36,626
The value of a min node is the

47
00:02:36,626 --> 00:02:39,420
minimum that results for any legal
opponent action.

48
00:02:42,600 --> 00:02:46,000
Let's see how this works.
Following game tree illustrates it.

49
00:02:46,000 --> 00:02:48,620
The nodes at the bottom of the tree are
terminal states

50
00:02:48,620 --> 00:02:51,430
and the values are the player's gold
values for those states.

51
00:02:52,610 --> 00:02:54,000
The values shown in the other nodes are

52
00:02:54,000 --> 00:02:56,450
computed according to the rules we just
went over.

53
00:02:56,450 --> 00:03:01,620
For example, the value of the min node at
the lower left is one because that's

54
00:03:01,620 --> 00:03:05,559
the minimum of the two values of the max
nodes below it, namely one and two.

55
00:03:07,060 --> 00:03:07,760
The value of the min

56
00:03:07,760 --> 00:03:11,800
node next to that min node is three
because that's the minimum of the

57
00:03:11,800 --> 00:03:15,370
value of the, values of the two max nodes
below it, namely three and four.

58
00:03:17,420 --> 00:03:20,260
The value of the max node above these two
min nodes is

59
00:03:20,260 --> 00:03:23,479
three because that's the maximum of the
values of the two min nodes.

60
00:03:24,610 --> 00:03:25,120
And so forth.

61
00:03:29,350 --> 00:03:31,070
Here's an implementation of a minimax
player.

62
00:03:31,070 --> 00:03:34,930
It's identical to the implementations of
the compulsive deliberation for

63
00:03:34,930 --> 00:03:37,770
single player games, except that it has a
different bestmove procedure.

64
00:03:41,270 --> 00:03:44,940
The main difference between the bestmove
subroutine for single player games, and

65
00:03:44,940 --> 00:03:48,280
the bestmove for multiple player games, is
the way the scores are computed.

66
00:03:48,280 --> 00:03:49,880
Rather than comparing subsequent states,
it

67
00:03:49,880 --> 00:03:52,180
compares the min nodes, as described
previously.

68
00:03:55,510 --> 00:03:58,170
The minscore subroutine for minimax takes
an action and

69
00:03:58,170 --> 00:04:01,370
a state as arguments, and produces the
minimum values for

70
00:04:01,370 --> 00:04:03,530
the given role associated with the given
player for

71
00:04:03,530 --> 00:04:05,749
any of the opponents legal actions in the
given state.

72
00:04:07,190 --> 00:04:12,610
The maxscore subroutine, which is called
by minscore, takes state as argument,

73
00:04:12,610 --> 00:04:16,190
and conducts a recursive exploration of
the game tree below that state.

74
00:04:16,190 --> 00:04:19,710
If the state's terminal, the output is
just the roles reward for that state.

75
00:04:19,710 --> 00:04:20,530
Otherwise, the output

76
00:04:20,530 --> 00:04:22,400
is the maximum of the utilities of the min
nodes

77
00:04:22,400 --> 00:04:28,820
corresponding to the player's legal
actions in the given state.

78
00:04:28,820 --> 00:04:31,330
Now, one disadvantage to the minimax
procedure is that

79
00:04:31,330 --> 00:04:34,140
it examines the entire game tree in all
cases.

80
00:04:34,140 --> 00:04:36,760
While this is sometimes necessary, there
are cases where it's possible

81
00:04:36,760 --> 00:04:40,090
to get the same result without examining
the entire game tree.

82
00:04:40,090 --> 00:04:43,110
For example, if in processing a state the

83
00:04:43,110 --> 00:04:45,530
max score subroutine finds an action that
produces

84
00:04:45,530 --> 00:04:46,740
100 points.

85
00:04:46,740 --> 00:04:49,480
It doesn't need to look at any additional
actions as it cannot do better.

86
00:04:50,780 --> 00:04:53,140
Similarly if the minscore subroutine finds
an action

87
00:04:53,140 --> 00:04:56,420
that produces zero points for the player
it

88
00:04:56,420 --> 00:04:57,890
does not need to look at any additional

89
00:04:57,890 --> 00:04:59,820
actions, as it cannot get the score any
lower.

90
00:05:01,560 --> 00:05:03,480
Bounded minimax is just the minimax
procedure

91
00:05:03,480 --> 00:05:05,210
we just saw with two minor changes.

92
00:05:05,210 --> 00:05:07,575
Rather than processing all actions on
every node,

93
00:05:07,575 --> 00:05:10,550
maxscore and minscore, first check for
these bounds.

94
00:05:10,550 --> 00:05:12,220
And if they occur in any node, they

95
00:05:12,220 --> 00:05:14,880
terminate their search, and return the
corresponding values.

96
00:05:16,730 --> 00:05:18,300
So here's an example.

97
00:05:18,300 --> 00:05:23,100
The nodes in this tree with are, are those
examined by bounded minimax.

98
00:05:24,170 --> 00:05:27,520
The ones that have numbers on them, are
examined by bounded minimax.

99
00:05:27,520 --> 00:05:30,100
The other nodes are not examined at all,
and they don't need to be examined.

100
00:05:30,100 --> 00:05:34,370
In this case notice that more than half of
the tree is pruned from consideration.

101
00:05:40,330 --> 00:05:43,690
Note that 100 and zero are not the only
values that can be used here.

102
00:05:43,690 --> 00:05:46,980
For example, if a player is in a so-called
satsificing game where it just needs

103
00:05:46,980 --> 00:05:51,120
to get a certain minimum score, then it
can use that threshhold rather than 100.

104
00:05:51,120 --> 00:05:55,170
For example, the player simply wants to
win and it's a

105
00:05:55,170 --> 00:05:58,500
fixed sum game, then he can use 51 as the
threshold.

106
00:05:58,500 --> 00:06:00,550
Knowing that if he gets this amount it has
won the game.