1
00:00:04,790 --> 00:00:09,930
One way of dealing with this conservative
nature of depth limited search

2
00:00:09,930 --> 00:00:13,950
is to improve upon the arbitrary zero
value return for non terminal states.

3
00:00:15,160 --> 00:00:17,050
In heuristic search this is accomplished
by

4
00:00:17,050 --> 00:00:22,160
applying a heuristic evaluation function
to non-terminal states.

5
00:00:22,160 --> 00:00:24,250
Such functions are based on features of
states, and so

6
00:00:24,250 --> 00:00:26,813
they can be computed without examining the
entire game tree.

7
00:00:28,720 --> 00:00:29,860
The implementation

8
00:00:29,860 --> 00:00:33,270
of fixed depth heuristic search is easy.

9
00:00:33,270 --> 00:00:34,210
Now at the top here, we have the

10
00:00:34,210 --> 00:00:37,040
implementation of simple depth-limited
search that we saw earlier.

11
00:00:38,140 --> 00:00:38,930
And at the bottom, we have

12
00:00:38,930 --> 00:00:43,020
the implementation of heuristic
fixed-depth search.

13
00:00:43,020 --> 00:00:44,490
The only difference is that we've replaced

14
00:00:44,490 --> 00:00:47,310
the default value, zero, with the result
of

15
00:00:47,310 --> 00:00:51,510
calling the evaluation subroutine,
evalfun, on the

16
00:00:51,510 --> 00:00:53,970
state being considered whenever the depth
limit's exceeded.

17
00:00:55,650 --> 00:00:56,850
Tough part of the implementation is

18
00:00:56,850 --> 00:00:59,169
figuring out how to evaluate non-terminal
states.

19
00:01:01,960 --> 00:01:05,400
Fortunately, examples of heuristic
functions are bound.

20
00:01:05,400 --> 00:01:08,630
For example, in chess, we often use piece
count to compare states.

21
00:01:08,630 --> 00:01:11,360
With the idea that in the absence of
immediate threats,

22
00:01:11,360 --> 00:01:15,400
having more material is generally better
than having less material.

23
00:01:15,400 --> 00:01:18,240
And similarly, we sometimes use board
control, with

24
00:01:18,240 --> 00:01:20,450
the idea that having control of the center

25
00:01:20,450 --> 00:01:22,300
of the board is more valuable than
controlling

26
00:01:22,300 --> 00:01:23,850
the edges or the corners of the board.

27
00:01:26,040 --> 00:01:27,750
The downside of using heuristic functions
is

28
00:01:27,750 --> 00:01:30,650
that they're not necessarily guaranteed to
be successful.

29
00:01:30,650 --> 00:01:33,970
They may work in many cases, but they can
occasionally fail.

30
00:01:33,970 --> 00:01:35,755
That happens, for example, in chess, when
a player is

31
00:01:35,755 --> 00:01:38,620
check-mated, even though he has more
material and better board control.

32
00:01:40,080 --> 00:01:43,350
Still, games often admit heuristics that
are useful in

33
00:01:43,350 --> 00:01:46,490
a sense that they, work more often than
not.

34
00:01:47,910 --> 00:01:51,180
while for specific games such as chess,
programmers are able to build

35
00:01:51,180 --> 00:01:53,590
in these value evaluation functions.

36
00:01:53,590 --> 00:01:55,960
This is unfortunately not possible for
general game playing,

37
00:01:55,960 --> 00:01:57,370
since the rules of the game are not known

38
00:01:57,370 --> 00:02:01,490
in advance, rather the game player itself
must analyze

39
00:02:01,490 --> 00:02:05,150
the game in order to find useful
evaluation functions.

40
00:02:06,510 --> 00:02:10,680
Now in an upcoming lesson, we'll discuss
how to find such heuristics automatically.

41
00:02:10,680 --> 00:02:13,240
That said, there are some heuristics for
game

42
00:02:13,240 --> 00:02:16,480
playing that have arguable merit across
all games.

43
00:02:16,480 --> 00:02:18,219
And in this section, we're going to
examine some of these heuristics.

44
00:02:19,420 --> 00:02:21,210
And along the way, we'll show how to

45
00:02:21,210 --> 00:02:23,600
build a game player that utilizes these
heuristics.

46
00:02:26,525 --> 00:02:29,090
Okay, mobility is one such heuristic.

47
00:02:29,090 --> 00:02:31,680
Mobility is a measure of the number of
things a player can do.

48
00:02:31,680 --> 00:02:35,860
This could be the number of actions
available in a given state,

49
00:02:35,860 --> 00:02:39,100
or the number of actions available in
steps away from the given state.

50
00:02:40,640 --> 00:02:42,500
Or it could be the number of states

51
00:02:42,500 --> 00:02:45,970
reachable within n steps, from the given
state.

52
00:02:45,970 --> 00:02:47,920
I that, that can be different from the
number of actions since

53
00:02:47,920 --> 00:02:51,940
sometimes different actions can refer,
can, can lead to the same state.

54
00:02:55,834 --> 00:02:58,980
a simple limitation of the mobility
heuristic is shown here.

55
00:02:58,980 --> 00:03:02,450
the method simply computes the number of
actions that are legal in the given state.

56
00:03:02,450 --> 00:03:06,380
And returns as value the percentage of all
possible actions represented by

57
00:03:06,380 --> 00:03:09,000
this set of legal actions, that's all
possible actions in all states.

58
00:03:12,760 --> 00:03:17,660
Focus is another heuristic.
It's the opposite of mobility.

59
00:03:17,660 --> 00:03:20,340
It's a measure of the narrowness of the
search base.

60
00:03:20,340 --> 00:03:22,430
Sometimes it's good to focus, to cut down

61
00:03:22,430 --> 00:03:24,870
on the number of possibilities to be
considered.

62
00:03:26,400 --> 00:03:30,430
Usually it's better to restrict an
opponent's moves, rather than keeping

63
00:03:30,430 --> 00:03:33,000
one's own, rather so that one keeps one's
own options open.

64
00:03:34,450 --> 00:03:38,465
And a simple limitation of the focus
heuristic can be implemented

65
00:03:38,465 --> 00:03:39,470
as shown here.

66
00:03:39,470 --> 00:03:41,530
It's a dual of mobility, again we divide
the the

67
00:03:41,530 --> 00:03:44,000
number of legal actions in a state by the
total number

68
00:03:44,000 --> 00:03:46,640
of actions available in any state but,
rather than returning

69
00:03:46,640 --> 00:03:53,450
that percentage as a, as result, we
subtract it from 100.

70
00:03:53,450 --> 00:03:58,460
Goal proximity is another heuristic, a

71
00:03:58,460 --> 00:04:00,070
little bit different from the previous
two.

72
00:04:00,070 --> 00:04:01,600
It's a measure of how similar a given

73
00:04:01,600 --> 00:04:04,840
state is to a desirable goal terminal
state.

74
00:04:04,840 --> 00:04:06,280
Now, there are various ways this can be
computed.

75
00:04:07,410 --> 00:04:11,240
One common method is to count how many
propositions are true in the

76
00:04:11,240 --> 00:04:15,350
current state and are also true in a
terminal state with sufficient utility.

77
00:04:16,630 --> 00:04:19,740
The difficulty of implementing this method
unfortunately is that

78
00:04:19,740 --> 00:04:24,100
is it is obtaining the set of desirable
terminal states.

79
00:04:24,100 --> 00:04:26,040
With which the current state can be
compared.

80
00:04:26,040 --> 00:04:26,740
It's not so easy.

81
00:04:29,280 --> 00:04:31,650
Another alternative is to use the goal
value of

82
00:04:31,650 --> 00:04:33,990
a state as a measure of progress toward
the goal.

83
00:04:33,990 --> 00:04:37,210
With the idea being that the goal value of
a non-terminal state, even though it's

84
00:04:37,210 --> 00:04:42,439
not terminal, however the higher it is,
the closer one is to the actual goal.

85
00:04:44,910 --> 00:04:46,340
of course, this is not always true.

86
00:04:46,340 --> 00:04:49,350
However, many game, games, the goal values
are indeed monotonic.

87
00:04:49,350 --> 00:04:52,820
Meaning that the values do increase with
proximity to the goal.

88
00:04:52,820 --> 00:04:55,630
If you're trying to get out of a cave, and
you

89
00:04:55,630 --> 00:04:57,980
see some light down one corridor, and not
down the other.

90
00:04:57,980 --> 00:04:59,700
You might want to go to the one that has
light.

91
00:05:01,380 --> 00:05:03,060
Moreover, it's sometimes possible to
compute this

92
00:05:03,060 --> 00:05:05,880
by simple examination of the game
description.

93
00:05:05,880 --> 00:05:08,830
Using methods which we can describe in
later lessons.

94
00:05:11,000 --> 00:05:12,850
Now, none of these heuristics is
guaranteed to work

95
00:05:12,850 --> 00:05:15,570
in all games, but all have strengths in
some games.

96
00:05:16,850 --> 00:05:21,830
To deal with this fact, some designers of
GGP players, have opted

97
00:05:21,830 --> 00:05:25,069
to use a weighted combination of
heuristics in place of a single heuristic.

98
00:05:26,370 --> 00:05:28,750
Equation shown here is a typical formula.

99
00:05:28,750 --> 00:05:32,570
Each fi here is a heuristic function such
as your mobility

100
00:05:32,570 --> 00:05:36,260
or focus or goal proximity, and wi is the
corresponding weight.

101
00:05:36,260 --> 00:05:39,980
Of course, there's no way of knowing in
advance

102
00:05:39,980 --> 00:05:42,000
what the weight should be, but sometimes,
playing a few

103
00:05:42,000 --> 00:05:44,280
instances of the game, for example, during
the start clock

104
00:05:44,280 --> 00:05:47,009
of the game, can suggest weights for the
various heuristics.

105
00:05:51,600 --> 00:05:55,240
As mentioned earlier, depth-limited search
is not guaranteed to succeed in all cases.

106
00:05:56,240 --> 00:05:58,750
Failing is never good, however, it's
particularly

107
00:05:58,750 --> 00:06:01,170
embarrassing in situations where just a
little

108
00:06:01,170 --> 00:06:03,570
more search would have revealed
significant changes

109
00:06:03,570 --> 00:06:06,131
in the player's circumstances for better
or worse.

110
00:06:06,131 --> 00:06:09,170
In the research literature this is often
called a horizon problem.

111
00:06:09,170 --> 00:06:13,504
So an example, consider, chess, a

112
00:06:13,504 --> 00:06:16,860
situation, and a situation where the
players

113
00:06:16,860 --> 00:06:21,290
are exchanging pieces, with white
capturing black's pieces and vice versa.

114
00:06:21,290 --> 00:06:25,690
Now, imagine cutting off the search at an
arbitrary depth, say at two captures each.

115
00:06:25,690 --> 00:06:29,992
At this point, white might believe it has
an advantage, since it has more material.

116
00:06:29,992 --> 00:06:31,800
However, if the very next move by black is
a

117
00:06:31,800 --> 00:06:34,810
capture of the white queen, this
evaluation could be misleading.

118
00:06:36,910 --> 00:06:40,158
A common solution to this problem is to
forgo the fixed depth planet in

119
00:06:40,158 --> 00:06:43,890
favor of one that is itself dependent on
the current state of the affairs.

120
00:06:43,890 --> 00:06:47,170
Searching deeper in some areas of the
tree, searching less deep in other areas.

121
00:06:47,170 --> 00:06:47,790
Of the tree.

122
00:06:49,660 --> 00:06:50,680
Here's an implementation of the

123
00:06:50,680 --> 00:06:54,110
variable, what's called variable-depth
heuristic search.

124
00:06:54,110 --> 00:06:58,290
this version of maxscore differs from
fixed depth heuristic search in

125
00:06:58,290 --> 00:07:02,240
that there is a subroutine here called
expfun that is called

126
00:07:02,240 --> 00:07:07,089
to determine whether the current state and
or depth meets in appropriate condition.

127
00:07:08,660 --> 00:07:11,040
If so the tree expansion continues,

128
00:07:11,040 --> 00:07:12,760
otherwise the player terminates the
expansion and

129
00:07:12,760 --> 00:07:15,585
simply returns the resulting, result of
applying

130
00:07:15,585 --> 00:07:17,710
its evaluation function to the
non-terminal state.

131
00:07:20,870 --> 00:07:22,780
the challenge in variable depth search

132
00:07:22,780 --> 00:07:26,660
is finding an appropriate definition for
expfun.

133
00:07:26,660 --> 00:07:30,280
One common technique is to focus on
differentials of heuristic functions.

134
00:07:30,280 --> 00:07:32,730
For example, a significant change in
mobility

135
00:07:32,730 --> 00:07:34,920
or goal proximity might indicate that
further search

136
00:07:34,920 --> 00:07:37,050
is warranted whereas actions that do not

137
00:07:37,050 --> 00:07:40,000
lead to dramatic changes might be less
important.

138
00:07:40,000 --> 00:07:43,362
In chess, a good example of this is to
look for what's called quiescence.

139
00:07:43,362 --> 00:07:45,460
It is a state where there are no immediate
captures.