1 00:00:05,012 --> 00:00:11,718 Having a formal description of the game is one thing, being able to use that description to play the game effectively is something else entirely. 2 00:00:12,666 --> 00:00:18,521 players must be able to compute the initial state of the game, must be able to compute which moves are legal in every state 3 00:00:18,851 --> 00:00:22,883 must be able to determine the state resulting form a particular combination of moves 4 00:00:23,353 --> 00:00:26,029 must be able to compute the value of each state for each player 5 00:00:26,498 --> 00:00:29,861 and must be able to determine whether any given state is terminal. 6 00:00:31,509 --> 00:00:36,660 Since game descriptions are written in symbolic logic it's obviously necessary for a game player to do some amount of automated reasoning. 7 00:00:37,108 --> 00:00:42,930 Now there are two extremes here: one possibility is for the game player to process the game description interpretively throughout a game, 8 00:00:43,842 --> 00:00:49,908 second possibility is for a player to use the description to devise a specialized program and then use that program to play the game. 9 00:00:50,238 --> 00:00:52,757 So effectively automatic programming. 10 00:00:53,380 --> 00:01:01,289 Since this is just an introduction, we will discuss the first possiblibility only and leave it to you to think about the second possibility in various hybrid approaches. 11 00:01:04,275 --> 00:01:07,588 To start with, the player can use the game description to determine the initial state. 12 00:01:07,935 --> 00:01:11,563 In the case of tic-tac-toe, xwe have a board with 9 empty cells. 13 00:01:13,509 --> 00:01:19,747 Given a state, like the one we just saw, the player can use the game description to compute the legal moves for each of the players. 14 00:01:20,229 --> 00:01:28,752 In this case, the white player can mark any of the 9 cells and the black player must do nothing, in other words it executes the "noop" action. 15 00:01:31,952 --> 00:01:39,532 Given a state, and the players' actions, a player can compute the next state using the update rules in the game description. 16 00:01:40,033 --> 00:01:46,520 In the case shown here, the white player plays the "mark(1,3)" action in the initial state 17 00:01:47,803 --> 00:01:53,130 and the black player does "noop", then the result is the state in which there is an "x" in the upper right hand corner. 18 00:01:56,258 --> 00:02:03,813 One way for a player to decide on a course of action in a match is to use these computations repeatedly to expand the game tree. 19 00:02:04,339 --> 00:02:11,724 Starting in a known state, it computes the legal actions for itself and its opponents in a manner just discussed 20 00:02:12,035 --> 00:02:18,061 for each combination of actions of the players, it simulates the actions to obtain the next state and thereby expands the tree. 21 00:02:18,729 --> 00:02:22,381 Here we see the tic-tac-toe expanded one level. 22 00:02:25,182 --> 00:02:34,907 Repeating this, a player can expand a tree to two levels, three levels, and so forth until it encounters a terminal state on every branch. 23 00:02:35,609 --> 00:02:38,632 So just the one shown here in the middle of the bottom row. 24 00:02:39,589 --> 00:02:44,356 By examining the various branches, it can choose the one that produces the best payoff. 25 00:02:45,357 --> 00:02:54,704 Now of course this choice depends on the moves of the other players, and must consider all possible opponents moves or make some assumptions about the things that the other player will or will not do. 26 00:02:55,221 --> 00:03:00,761 In principle the procedure allows a player to identify the best possible strategy to play any game. 27 00:03:02,535 --> 00:03:12,224 Unfortunately, even in cases where there is a clear-cut solution, the tree may be so large as to make it practically impossible for any player to expand the game tree. 28 00:03:12,957 --> 00:03:16,973 In tic-tac-toe there are just 5000 states, which is reasonable manageable number. 29 00:03:17,362 --> 00:03:24,305 But there more than 10^30 states in chess, and using this approach the player would run out of time and memory long before finishing. 30 00:03:26,593 --> 00:03:29,383 The alternative is to do incremental search. 31 00:03:30,011 --> 00:03:37,362 On each move expanding the tree as much a possible, ans then making a choice based on the apparent value of non-terminal states. 32 00:03:38,535 --> 00:03:45,519 Now in traditional game playing where the rules are known in advance the programmer can invent a game-specific evaluation function to help in this regard. 33 00:03:46,020 --> 00:03:55,565 For example in chess we know that states with higher piece counts and greater board control are better than ones with less material or less control. 34 00:03:56,226 --> 00:04:03,755 Unfortunately it's not possible for a GGP programmer to invent such game-specific rules in advance, since the game's rules are not known until the game begins. 35 00:04:04,793 --> 00:04:08,179 The program must evaluate the states for itself. 36 00:04:09,689 --> 00:04:14,179 The good news is that there are some evaluation techniques that always work. 37 00:04:14,370 --> 00:04:22,357 For example, there's no harm prefering new states to states that have previously been seen provided of course that there's a way to get back to the original states. 38 00:04:23,945 --> 00:04:32,038 Also if a player is to determine that some observable condition corresponds to distance from the goal, then it's a good idea to minimize that quantity. 39 00:04:32,658 --> 00:04:39,865 Suppose for example the player were in a cave trying to get out, if it saw a brighter light in one tunnel than another it might go for the brighter light. 40 00:04:41,554 --> 00:04:47,700 Finally there's some states that can be determined to be bad even if other states are not known to be good. 41 00:04:49,223 --> 00:04:56,006 For silly example, stepping off the roof of a tall building is probably not a great way to get to the store, at least not in the real world. 42 00:04:58,372 --> 00:05:03,552 Another possibility is to use non guaranteed evaluation functions sometimes called heuristics. 43 00:05:04,587 --> 00:05:07,595 A number of such heuristics have been proposed over the years. 44 00:05:08,716 --> 00:05:19,682 Goal proximity is one of those: proponents of this heuristic argue that all other things being equal, it's a good idea to prefer states that are closer to goal states than states that are farther away. 45 00:05:20,668 --> 00:05:27,871 Distance here is usually judged by similarity between states, that is the number of facts in common in the descriptions of the two states. 46 00:05:28,922 --> 00:05:37,795 Mobility is another general heuristic: proponents argue that, all other things being equal, it's better to move to a state that affords the player greater mobility. 47 00:05:39,151 --> 00:05:43,618 And that's as it gives it more possible actions, better than being boxed into a corner. 48 00:05:45,225 --> 00:05:51,581 Symetrically proponents of mobility argue that it is good to minimize the mobility of one's opponents. 49 00:05:54,004 --> 00:06:03,555 Now all these heuristics have been shown to be effective in some games, unfortunately, they are only heuristics: they sometimes fail and sometimes with comical consequences. 50 00:06:04,529 --> 00:06:08,803 Final match of GGP 2006 is an example. 51 00:06:09,108 --> 00:06:15,422 The game was cylinder checkers, that is checkers played on a cylinder: the game wraps around vertically. 52 00:06:16,425 --> 00:06:25,687 Recall that in checkers a player i s permitted to move one of his ordinary pieces, pieces that are not kings that is, one square forward in each turn. 53 00:06:26,304 --> 00:06:30,264 Here red is moving from the top to the bottom and black is moving from the bottom to the top. 54 00:06:31,226 --> 00:06:36,307 If a piece is blocked by an opponent's player [piece] you can jump that player [piece] if there is an empty square on the other side. 55 00:06:37,846 --> 00:06:41,337 Moreover, the player must make such a jump if one is available. 56 00:06:41,968 --> 00:06:48,674 The objective of the game is to take all or as many of the opponent's pieces as possible while preserving one's own. 57 00:06:49,761 --> 00:06:55,729 Here is a snapshot of the game, it's red's turn to play: what should he do, and what do you think he did ? 58 00:06:59,095 --> 00:07:07,160 Okay here's a hint: the player in this case was Clune's player and it had decided for some reason or other that limiting the opponent's mobility was a good heuristic. 59 00:07:07,798 --> 00:07:12,917 If we were to move the rearmost piece, black would have multiple possible moves. 60 00:07:13,785 --> 00:07:17,515 However if it were to move the piece in front, then black would be forced to capture it. 61 00:07:18,216 --> 00:07:20,369 In other words, it would have at most one move. 62 00:07:21,085 --> 00:07:26,705 Clearly moving the forward piece minimizes the opponent's mobility, so that's what Clune's player did. 63 00:07:27,297 --> 00:07:36,387 Actually the whole match played out this way with red giving black captures at every opportunity, it was sad to watch but frankly a little comical at the same time. 64 00:07:36,903 --> 00:07:43,811 The moral here is that while non-guaranteed heuristics search is sometimes useful, they're not always useful. 65 00:07:46,795 --> 00:07:50,835 An alternative to evaluation functions like these is Monte-Carlo search. 66 00:07:51,383 --> 00:08:05,873 The basic idea is simple: the player expands the tree for a few levels, then rather than using a local heuristic to evaluate a state, it makes some probes from that state to the end of the game by selecting random moves for all players, which can be done very rapidly. 67 00:08:06,719 --> 00:08:14,657 Sums up the total reward for all such probes and divides by the number of probes to obtain an estimated utility for that state. 68 00:08:15,393 --> 00:08:21,368 You can then use these expected utilities in comparing states and selecting actions. 69 00:08:23,001 --> 00:08:31,599 Monte-Carlo and its variants have proven highly successful in general game playing, virtually every general game playing program today using some variant of Monte-Carlo search. 70 00:08:33,999 --> 00:08:39,103 Okay this discussion of game tree search and heuristics reveals just how difficult the GGP problem is. 71 00:08:39,830 --> 00:08:45,864 Monte-Carlo works amazingly well, but even it breaks down badly in certain cases. 72 00:08:46,324 --> 00:08:52,256 Fortunately, there's another complimentary approach to general game playing that has tremendous power, and it's called metagaming. 73 00:08:53,949 --> 00:09:01,265 Metagaming is problem-solving in the world of games, it involves reasoning about games and by extension game players and game playing. 74 00:09:03,002 --> 00:09:16,698 As stated, this is an extremly general definition, and includes both game design and game analysis, it includes reasoning about games in general as well as reasoning about specific games and specific matches of specific games. 75 00:09:18,293 --> 00:09:27,760 Significantly, it includes what programmers do in devising programs to play specific games, as well as what programmers do in devising general game playing programs. 76 00:09:29,342 --> 00:09:37,010 Metagame is usually done offline during the brief player after a player receives the game rules and [before] game play begins. 77 00:09:37,667 --> 00:09:42,122 Or sometimes it's done in parallel with ordinary game tree search. 78 00:09:43,831 --> 00:09:47,671 In general game playing we're primarily interested in those types of metagaming that can be automated. 79 00:09:48,141 --> 00:09:54,123 It raises the question of distinction between ordinary game playing and metagaming. 80 00:09:54,926 --> 00:09:58,842 Can we distinguish the two ? Well it's not that easy, but there are some differences. 81 00:09:59,230 --> 00:10:10,856 To begin with, ordinary game tree search can be viewed as a degenerate form of metagaming, one in which the metagamer must find the best action for a specific role in a specific game starting in a specific state. 82 00:10:11,578 --> 00:10:18,999 By contrast in some cases metagaming sometimes involves informations and goals that are different from the specifics used in game tree search. 83 00:10:20,269 --> 00:10:26,610 To begin with, metagaming can take into account information other than the game description. For example it might take into account past experience. 84 00:10:28,280 --> 00:10:46,759 In round robin tournaments where total return, the sum of the values over multiple matches, in cases like that it might select a different strategy than in an elimination ladder where beating one's opponent is more important than the score that one gets, so long as it's greater than the opponent's score. 85 00:10:49,020 --> 00:10:57,685 Metagaming is also sometimes done with less information than is used in match play: for example without information about the role, initial state, goals, termination and so forth. 86 00:10:58,371 --> 00:11:05,163 As a result, metagaming can be more general deriving conclusions that apply across different matches and different players. 87 00:11:07,153 --> 00:11:22,807 Also the goal in metagaming is broader than that of game tree search, it's not so much concern with selecting the actions of a specific player in a specific game but rather concerns with devising a game tree search program or optimizing an existing program to search the game tree, but without actually searching that tree itself. 88 00:11:24,767 --> 00:11:39,263 Okay well, whether or not this concept of automated metagaming can be distinguished from game tree search, there's no doubt that the concept is used to good effect in many general game playing programs, and we'll have a chance to look at some metagaming techniques in the course. 89 00:11:42,675 --> 00:11:46,351 Before we the concept though I want to give one example of metagaming. 90 00:11:46,757 --> 00:11:51,942 The example here is called game decomposition or sometimes factoring. 91 00:11:52,717 --> 00:11:57,670 Consider the example of Hodgepodge, Hodgepodge is actually 2 games glued together. 92 00:11:58,021 --> 00:12:03,031 Here we show chess and othello but it could be any two games. 93 00:12:03,515 --> 00:12:10,408 One move in a game of hodgepodge corresponds to one move on each of the two constituing games. 94 00:12:10,754 --> 00:12:14,864 Winning requires winning at least one of the two games while not loosing the other. 95 00:12:15,639 --> 00:12:22,230 What makes Hodgepodge interesting is that it's factorable, that is it can be divided into two independent games. 96 00:12:22,723 --> 00:12:25,202 Realizing this can have dramatic effect. 97 00:12:25,417 --> 00:12:32,631 To see this consider the size of the game tree for Hodgepodge, supposing that one game tree has branching factor "a" and the other has branching factor "b" 98 00:12:33,798 --> 00:12:38,258 Then the branching factor of the joint game is "a * b". 99 00:12:39,335 --> 00:12:44,838 At any point in the game, a player has "a * b" possible moves. 100 00:12:45,868 --> 00:12:50,288 The size of the fringe of the game tree at level n is "(a * b) ^n". 101 00:12:51,989 --> 00:12:56,685 However the two games are independent: moving in one subgame does not affect the state of the other subgame. 102 00:12:57,066 --> 00:13:03,259 So the players really should be searching two smaller game trees: one with branching factor "a" and the other with btanching factor "b". 103 00:13:04,007 --> 00:13:09,824 In this way, at depth n, there would be only "a^n + b^n" possible states. 104 00:13:10,577 --> 00:13:14,050 This is a huge decrease in the size of the search space. 105 00:13:16,399 --> 00:13:28,592 So factoring is just one example of game reformulation and metagaming, there are many others. For example it's possible to find symmetries in games to cut down on the search space. 106 00:13:28,959 --> 00:13:32,880 In some ways, in some games there are bottlenecks that allow for different type of factoring. 107 00:13:33,423 --> 00:13:40,508 Consider for example a game made up of 1 or more subgames in which it's necessary to win the first game before moving on the secondgame and so forth. 108 00:13:41,284 --> 00:13:49,407 In such case there's no need to search to a terminal game in the overall game it's sufficient to just limit the search to the termination of the current subgame. 109 00:13:49,407 --> 00:13:54,199 And only after that's done search to the termination condition the next subgame and so forth. 110 00:13:55,164 --> 00:14:02,404 Now these examples are extreme cases but there are simple, everyday examples of finding structure of this sort that can help in curtailing search. 111 00:14:04,272 --> 00:14:13,245 Whatever sort of metagaming is done the trick is to analyze and reformulate a game without exapnding the entire game tree. 112 00:14:13,423 --> 00:14:26,224 The interesting thing about general game playing is this: at some times the cost of analysis is proportional to the size of the description rather than the size of the game tree as in examples we just saw. 113 00:14:26,504 --> 00:14:31,980 And in such cases playerrs can expand a little time and gain a lot in search savings.