1
00:00:05,012 --> 00:00:11,718
Having a formal description of the game is one thing, being able to use that description to play the game effectively is something else entirely.

2
00:00:12,666 --> 00:00:18,521
players must be able to compute the initial state of the game, must be able to compute which moves are legal in every state

3
00:00:18,851 --> 00:00:22,883
must be able to determine the state resulting form a particular combination of moves

4
00:00:23,353 --> 00:00:26,029
must be able to compute the value of each state for each player

5
00:00:26,498 --> 00:00:29,861
and must be able to determine whether any given state is terminal.

6
00:00:31,509 --> 00:00:36,660
Since game descriptions are written in symbolic logic it's obviously necessary for a game player to do some amount of automated reasoning.

7
00:00:37,108 --> 00:00:42,930
Now there are two extremes here: one possibility is for the game player to process the game description interpretively throughout a game,

8
00:00:43,842 --> 00:00:49,908
second possibility is for a player to use the description to devise a specialized program and then use that program to play the game.

9
00:00:50,238 --> 00:00:52,757
So effectively automatic programming.

10
00:00:53,380 --> 00:01:01,289
Since this is just an introduction, we will discuss the first possiblibility only and leave it to you to think about the second possibility in various hybrid approaches.

11
00:01:04,275 --> 00:01:07,588
To start with, the player can use the game description to determine the initial state.

12
00:01:07,935 --> 00:01:11,563
In the case of tic-tac-toe, xwe have a board with 9 empty cells.

13
00:01:13,509 --> 00:01:19,747
Given a state, like the one we just saw, the player can use the game description to compute the legal moves for each of the players.

14
00:01:20,229 --> 00:01:28,752
In this case, the white player can mark any of the 9 cells and the black player must do nothing, in other words it executes the "noop" action.

15
00:01:31,952 --> 00:01:39,532
Given a state, and the players' actions, a player can compute the next state using the update rules in the game description.

16
00:01:40,033 --> 00:01:46,520
In the case shown here, the white player plays the "mark(1,3)" action in the initial state

17
00:01:47,803 --> 00:01:53,130
and the black player does "noop", then the result is the state in which there is an "x" in the upper right hand corner.

18
00:01:56,258 --> 00:02:03,813
One way for a player to decide on a course of action in a match is to use these computations repeatedly to expand the game tree.

19
00:02:04,339 --> 00:02:11,724
Starting in a known state, it computes the legal actions for itself and its opponents in a manner just discussed

20
00:02:12,035 --> 00:02:18,061
for each combination of actions of the players, it simulates the actions to obtain the next state and thereby expands the tree.

21
00:02:18,729 --> 00:02:22,381
Here we see the tic-tac-toe expanded one level.

22
00:02:25,182 --> 00:02:34,907
Repeating this, a player can expand a tree to two levels, three levels, and so forth until it encounters a terminal state on every branch.

23
00:02:35,609 --> 00:02:38,632
So just the one shown here in the middle of the bottom row.

24
00:02:39,589 --> 00:02:44,356
By examining the various branches, it can choose the one that produces the best payoff.

25
00:02:45,357 --> 00:02:54,704
Now of course this choice depends on the moves of the other players, and must consider all possible opponents moves or make some assumptions about the things that the other player will or will not do.

26
00:02:55,221 --> 00:03:00,761
In principle the procedure allows a player to identify the best possible strategy to play any game.

27
00:03:02,535 --> 00:03:12,224
Unfortunately, even in cases where there is a clear-cut solution, the tree may be so large as to make it practically impossible for any player to expand the game tree.

28
00:03:12,957 --> 00:03:16,973
In tic-tac-toe there are just 5000 states, which is reasonable manageable number.

29
00:03:17,362 --> 00:03:24,305
But there more than 10^30 states in chess, and using this approach the player would run out of time and memory long before finishing.

30
00:03:26,593 --> 00:03:29,383
The alternative is to do incremental search.

31
00:03:30,011 --> 00:03:37,362
On each move expanding the tree as much a possible, ans then making a choice based on the apparent value of non-terminal states.

32
00:03:38,535 --> 00:03:45,519
Now in traditional game playing where the rules are known in advance the programmer can invent a game-specific evaluation function to help in this regard.

33
00:03:46,020 --> 00:03:55,565
For example in chess we know that states with higher piece counts and greater board control are better than ones with less material or less control.

34
00:03:56,226 --> 00:04:03,755
Unfortunately it's not possible for a GGP programmer to invent such game-specific rules in advance, since the game's rules are not known until the game begins.

35
00:04:04,793 --> 00:04:08,179
The program must evaluate the states for itself.

36
00:04:09,689 --> 00:04:14,179
The good news is that there are some evaluation techniques that always work.

37
00:04:14,370 --> 00:04:22,357
For example, there's no harm prefering new states to states that have previously been seen provided of course that there's a way to get back to the original states.

38
00:04:23,945 --> 00:04:32,038
Also if a player is to determine that some observable condition corresponds to distance from the goal, then it's a good idea to minimize that quantity.

39
00:04:32,658 --> 00:04:39,865
Suppose for example the player were in a cave trying to get out, if it saw a brighter light in one tunnel than another it might go for the brighter light.

40
00:04:41,554 --> 00:04:47,700
Finally there's some states that can be determined to be bad even if other states are not known to be good.

41
00:04:49,223 --> 00:04:56,006
For silly example, stepping off the roof of a tall building is probably not a great way to get to the store, at least not in the real world.

42
00:04:58,372 --> 00:05:03,552
Another possibility is to use non guaranteed evaluation functions sometimes called heuristics.

43
00:05:04,587 --> 00:05:07,595
A number of such heuristics have been proposed over the years.

44
00:05:08,716 --> 00:05:19,682
Goal proximity is one of those: proponents of this heuristic argue that all other things being equal, it's a good idea to prefer states that are closer to goal states than states that are farther away.

45
00:05:20,668 --> 00:05:27,871
Distance here is usually judged by similarity between states, that is the number of facts in common in the descriptions of the two states.

46
00:05:28,922 --> 00:05:37,795
Mobility is another general heuristic: proponents argue that, all other things being equal, it's better to move to a state that affords the player greater mobility.

47
00:05:39,151 --> 00:05:43,618
And that's as it gives it more possible actions, better than being boxed into a corner.

48
00:05:45,225 --> 00:05:51,581
Symetrically proponents of mobility argue that it is good to minimize the mobility of one's opponents.

49
00:05:54,004 --> 00:06:03,555
Now all these heuristics have been shown to be effective in some games, unfortunately, they are only heuristics: they sometimes fail and sometimes with comical consequences.

50
00:06:04,529 --> 00:06:08,803
Final match of GGP 2006 is an example.

51
00:06:09,108 --> 00:06:15,422
The game was cylinder checkers, that is checkers played on a cylinder: the game wraps around vertically.

52
00:06:16,425 --> 00:06:25,687
Recall that in checkers a player i s permitted to move one of his ordinary pieces, pieces that are not kings that is, one square forward in each turn.

53
00:06:26,304 --> 00:06:30,264
Here red is moving from the top to the bottom and black is moving from the bottom to the top.

54
00:06:31,226 --> 00:06:36,307
If a piece is blocked by an opponent's player [piece] you can jump that player [piece] if there is an empty square on the other side.

55
00:06:37,846 --> 00:06:41,337
Moreover, the player must make such a jump if one is available.

56
00:06:41,968 --> 00:06:48,674
The objective of the game is to take all or as many of the opponent's pieces as possible while preserving one's own.

57
00:06:49,761 --> 00:06:55,729
Here is a snapshot of the game, it's red's turn to play: what should he do, and what do you think he did ?

58
00:06:59,095 --> 00:07:07,160
Okay here's a hint: the player in this case was Clune's player and it had decided for some reason or other that limiting the opponent's mobility was a good heuristic.

59
00:07:07,798 --> 00:07:12,917
If we were to move the rearmost piece, black would have multiple possible moves.

60
00:07:13,785 --> 00:07:17,515
However if it were to move the piece in front, then black would be forced to capture it.

61
00:07:18,216 --> 00:07:20,369
In other words, it would have at most one move.

62
00:07:21,085 --> 00:07:26,705
Clearly moving the forward piece minimizes the opponent's mobility, so that's what Clune's player did.

63
00:07:27,297 --> 00:07:36,387
Actually the whole match played out this way with red giving black captures at every opportunity, it was sad to watch but frankly a little comical at the same time.

64
00:07:36,903 --> 00:07:43,811
The moral here is that while non-guaranteed heuristics search is sometimes useful, they're not always useful.

65
00:07:46,795 --> 00:07:50,835
An alternative to evaluation functions like these is Monte-Carlo search.

66
00:07:51,383 --> 00:08:05,873
The basic idea is simple: the player expands the tree for a few levels, then rather than using a local heuristic to evaluate a state, it makes some probes from that state to the end of the game by selecting random moves for all players, which can be done very rapidly.

67
00:08:06,719 --> 00:08:14,657
Sums up the total reward for all such probes and divides by the number of probes to obtain an estimated utility for that state.

68
00:08:15,393 --> 00:08:21,368
You can then use these expected utilities in comparing states and selecting actions.

69
00:08:23,001 --> 00:08:31,599
Monte-Carlo and its variants have proven highly successful in general game playing, virtually every general game playing program today using some variant of Monte-Carlo search.

70
00:08:33,999 --> 00:08:39,103
Okay this discussion of game tree search and heuristics reveals just how difficult the GGP problem is. 

71
00:08:39,830 --> 00:08:45,864
Monte-Carlo works amazingly well, but even it breaks down badly in certain cases.

72
00:08:46,324 --> 00:08:52,256
Fortunately, there's another complimentary approach to general game playing that has tremendous power, and it's called metagaming.

73
00:08:53,949 --> 00:09:01,265
Metagaming is problem-solving in the world of games, it involves reasoning about games and by extension game players and game playing.

74
00:09:03,002 --> 00:09:16,698
As stated, this is an extremly general definition, and includes both game design and game analysis, it includes reasoning about games in general as well as reasoning about specific games and specific matches of specific games.

75
00:09:18,293 --> 00:09:27,760
Significantly, it includes what programmers do in devising programs to play specific games, as well as what programmers do in devising general game playing programs.

76
00:09:29,342 --> 00:09:37,010
Metagame is usually done offline during the brief player after a player receives the game rules and [before] game play begins.

77
00:09:37,667 --> 00:09:42,122
Or sometimes it's done in parallel with ordinary game tree search.

78
00:09:43,831 --> 00:09:47,671
In general game playing we're primarily interested in those types of metagaming that can be automated.

79
00:09:48,141 --> 00:09:54,123
It raises the question of distinction between ordinary game playing and metagaming.

80
00:09:54,926 --> 00:09:58,842
Can we distinguish the two ? Well it's not that easy, but there are some differences.

81
00:09:59,230 --> 00:10:10,856
To begin with, ordinary game tree search can be viewed as a degenerate form of metagaming, one in which the metagamer must find the best action for a specific role in a specific game starting in a specific state.

82
00:10:11,578 --> 00:10:18,999
By contrast in some cases metagaming sometimes involves informations and goals that are different from the specifics used in game tree search. 

83
00:10:20,269 --> 00:10:26,610
To begin with, metagaming can take into account information other than the game description. For example it might take into account past experience.

84
00:10:28,280 --> 00:10:46,759
In round robin tournaments where total return, the sum of the values over multiple matches, in cases like that it might select a different strategy than in an elimination ladder where beating one's opponent is more important than the score that one gets, so long as it's greater than the opponent's score.

85
00:10:49,020 --> 00:10:57,685
Metagaming is also sometimes done with less information than is used in match play: for example without information about the role, initial state, goals, termination and so forth.

86
00:10:58,371 --> 00:11:05,163
As a result, metagaming can be more general deriving conclusions that apply across different matches and different players.

87
00:11:07,153 --> 00:11:22,807
Also the goal in metagaming is broader than that of game tree search, it's not so much concern with selecting the actions of a specific player in a specific game but rather concerns with devising a game tree search program or optimizing an existing program to search the game tree, but without actually searching that tree itself.

88
00:11:24,767 --> 00:11:39,263
Okay well, whether or not this concept of automated metagaming can be distinguished from game tree search, there's no doubt that the concept is used to good effect in many general game playing programs, and we'll have a chance to look at some metagaming techniques in the course.

89
00:11:42,675 --> 00:11:46,351
Before we the concept though I want to give one example of metagaming.

90
00:11:46,757 --> 00:11:51,942
The example here is called game decomposition or sometimes factoring.

91
00:11:52,717 --> 00:11:57,670
Consider the example of Hodgepodge, Hodgepodge is actually 2 games glued together.

92
00:11:58,021 --> 00:12:03,031
Here we show chess and othello but it could be any two games.

93
00:12:03,515 --> 00:12:10,408
One move in a game of hodgepodge corresponds to one move on each of the two constituing games.

94
00:12:10,754 --> 00:12:14,864
Winning requires winning at least one of the two games while not loosing the other.

95
00:12:15,639 --> 00:12:22,230
What makes Hodgepodge interesting is that it's factorable, that is it can be divided into two independent games.

96
00:12:22,723 --> 00:12:25,202
Realizing this can have dramatic effect.

97
00:12:25,417 --> 00:12:32,631
To see this consider the size of the game tree for Hodgepodge, supposing that one game tree has branching factor "a" and the other has branching factor "b"

98
00:12:33,798 --> 00:12:38,258
Then the branching factor of the joint game is "a * b".

99
00:12:39,335 --> 00:12:44,838
At any point in the game, a player has "a * b" possible moves.

100
00:12:45,868 --> 00:12:50,288
The size of the fringe of the game tree at level n is "(a * b) ^n".

101
00:12:51,989 --> 00:12:56,685
However the two games are independent: moving in one subgame does not affect the state of the other subgame.

102
00:12:57,066 --> 00:13:03,259
So the players really should be searching two smaller game trees: one with branching factor "a" and the other with btanching factor "b".

103
00:13:04,007 --> 00:13:09,824
In this way, at depth n, there would be only "a^n + b^n" possible states.

104
00:13:10,577 --> 00:13:14,050
This is a huge decrease in the size of the search space.

105
00:13:16,399 --> 00:13:28,592
So factoring is just one example of game reformulation and metagaming, there are many others. For example it's possible to find symmetries in games to cut down on the search space.

106
00:13:28,959 --> 00:13:32,880
In some ways, in some games there are bottlenecks that allow for different type of factoring.

107
00:13:33,423 --> 00:13:40,508
Consider for example a game made up of 1 or more subgames in which it's necessary to win the first game before moving on the secondgame and so forth.

108
00:13:41,284 --> 00:13:49,407
In such case there's no need to search to a terminal game in the overall game it's sufficient to just limit the search to the termination of the current subgame.

109
00:13:49,407 --> 00:13:54,199
And only after that's done search to the termination condition the next subgame and so forth.

110
00:13:55,164 --> 00:14:02,404
Now these examples are extreme cases but there are simple, everyday examples of finding structure of this  sort that can help in curtailing search.

111
00:14:04,272 --> 00:14:13,245
Whatever sort of metagaming is done the trick is to analyze and reformulate a game without exapnding the entire game tree.

112
00:14:13,423 --> 00:14:26,224
The interesting thing about general game playing is this: at some times the cost of analysis is proportional to the size of the description rather than the size of the game tree as in examples we just saw.

113
00:14:26,504 --> 00:14:31,980
And in such cases playerrs can expand a little time and gain a lot in search savings.