1 00:00:00,348 --> 00:00:07,283 [BLANK_AUDIO] 2 00:00:07,283 --> 00:00:09,168 In our last lesson, we saw various 3 00:00:09,168 --> 00:00:11,850 approaches to incomplete search of game trees. 4 00:00:13,128 --> 00:00:16,690 In each appre, approach, the evaluation of states 5 00:00:16,690 --> 00:00:18,535 is based on local properties of those states. 6 00:00:18,535 --> 00:00:22,610 Now that means properties that do not depend on the game tree as a whole. 7 00:00:25,020 --> 00:00:27,940 In many games, there's no correlation between these local properties 8 00:00:27,940 --> 00:00:30,950 and the likelihood of success in completing a game successfully. 9 00:00:32,630 --> 00:00:34,120 So in this lesson, we're going to look at some 10 00:00:34,120 --> 00:00:38,559 alternative methods, based on statistical analysis of game trees. 11 00:00:39,980 --> 00:00:44,360 First examine a simple approach based on what's called Monte Carlo game simulation. 12 00:00:45,690 --> 00:00:49,598 And then we look at a more sophisticated variation, called Monte Carlo Tree Search. 13 00:00:49,598 --> 00:00:56,080 Or sometimes, UCT. 14 00:00:56,080 --> 00:00:58,910 The basic idea of Monte Carlo search is simple. 15 00:00:58,910 --> 00:01:02,070 As with depth limited search, we explore the game tree to some fixed depth. 16 00:01:04,290 --> 00:01:06,015 In order to estimate the value of any 17 00:01:06,015 --> 00:01:08,970 non-terminal state at this depth, we make some 18 00:01:08,970 --> 00:01:10,570 probes from that state to the end of 19 00:01:10,570 --> 00:01:13,500 the game by selecting random moves for the players. 20 00:01:13,500 --> 00:01:15,980 To sum up the total rewards for all such probes, and divide 21 00:01:15,980 --> 00:01:19,350 by the number of probes to obtain an estimated utility for that state. 22 00:01:19,350 --> 00:01:22,780 And then we use these expected utilities in comparing states and selecting actions. 23 00:01:24,830 --> 00:01:27,406 So I was just mentioning the expansion phase is the same 24 00:01:27,406 --> 00:01:31,157 as depth-limited search that treats explore to some fixed step, as before. 25 00:01:31,157 --> 00:01:34,400 And then the pro, we enter a probe phase where we have 26 00:01:34,400 --> 00:01:39,210 an exploration from each of the fringe states reached in this expansion process. 27 00:01:39,210 --> 00:01:42,380 For each making random probes from there to a terminal state. 28 00:01:45,130 --> 00:01:48,950 The values produced from each of these probes are then added up, and divided by 29 00:01:48,950 --> 00:01:52,280 the number of possibilities number of probes 30 00:01:52,280 --> 00:01:54,450 for each state to obtain an expected utility. 31 00:01:54,450 --> 00:01:57,290 For example, in the case on the left, we made four 32 00:01:57,290 --> 00:02:00,660 probes, got 1 100th, the sum total of the four probes 33 00:02:00,660 --> 00:02:05,180 is 100, divide by 4 is 25 The second case, 2 34 00:02:05,180 --> 00:02:08,970 100s, 2 zeros, total of 200 divide by 4, we get 50. 35 00:02:09,990 --> 00:02:14,230 so these utilities are then compared to determine the relative utilities 36 00:02:14,230 --> 00:02:16,490 of the fringe states produced at the end of the expansion phase. 37 00:02:16,490 --> 00:02:18,974 Much better than making an assumption of, 38 00:02:18,974 --> 00:02:23,080 conservative assumption of zero utility for non-terminal states. 39 00:02:25,960 --> 00:02:30,100 Simple implementation of max score for Monte Carlo search is shown here. 40 00:02:30,100 --> 00:02:33,080 It's a method that's exactly the same as ordinary fixed depth heuristic 41 00:02:33,080 --> 00:02:38,200 search, except that the player uses the Monte Carlo routine to evaluate states. 42 00:02:39,460 --> 00:02:43,530 And Monte Carlo is definite, one definition 43 00:02:43,530 --> 00:02:45,260 of Monte Carlo shown here, takes a state 44 00:02:45,260 --> 00:02:48,140 as argument, returns the average utility obtained from 45 00:02:48,140 --> 00:02:51,230 set of n probes, here called depth charges. 46 00:02:51,230 --> 00:02:53,630 Or ends the value of some global parameter count. 47 00:02:55,180 --> 00:02:57,040 Now, the depth charge subroutine, shown at 48 00:02:57,040 --> 00:02:59,535 the bottom first checks at a state's terminal. 49 00:02:59,535 --> 00:03:02,230 If so, it returns that value, otherwise it forms a 50 00:03:02,230 --> 00:03:06,150 joint move by taking random legal actions of all the players. 51 00:03:07,420 --> 00:03:13,070 That simulates this joint move, and calls itself recursively until 52 00:03:13,070 --> 00:03:14,893 it gets to a terminal state, and returns the result. 53 00:03:14,893 --> 00:03:22,270 Well, one downside on the Monte 54 00:03:22,270 --> 00:03:26,250 Carlo method is that it can be optimistic. It seems all players are playing randomly. 55 00:03:26,250 --> 00:03:29,870 When in fact, it's possible that they know exactly what they are doing. 56 00:03:29,870 --> 00:03:31,840 It doesn't help if most of the probes from 57 00:03:31,840 --> 00:03:34,100 a position in chess lead to success, if one 58 00:03:34,100 --> 00:03:35,710 leads to a state in which one's player is 59 00:03:35,710 --> 00:03:37,940 checkmated, and the other player sees how to do that. 60 00:03:39,270 --> 00:03:40,300 This issue is addressed 61 00:03:40,300 --> 00:03:43,365 below to some extent in the UCT method that we'll describe shortly. 62 00:03:46,310 --> 00:03:48,230 Another drawback of Monte Carlo is it doesn't 63 00:03:48,230 --> 00:03:50,330 really take into account the structure of a game. 64 00:03:50,330 --> 00:03:52,240 For example it may not recognize symmetries or 65 00:03:52,240 --> 00:03:55,510 independences that could substantially decrease the cost of search. 66 00:03:55,510 --> 00:03:58,942 Now for that matter, it doesn't even recognize boards, or pieces, or piece 67 00:03:58,942 --> 00:04:00,658 count; or any other feature that might 68 00:04:00,658 --> 00:04:02,692 form the basis of game specific heuristics. 69 00:04:06,660 --> 00:04:09,660 Still, even with these drawbacks, the Monte Carlo method is quite powerful. 70 00:04:09,660 --> 00:04:15,320 It's fast, and consumes very little space, and it's surprisingly effective. 71 00:04:15,320 --> 00:04:19,760 prior to its use, general game players were at best interesting novelties. 72 00:04:19,760 --> 00:04:23,620 But once players start using Monte Carlo, the improvement in game play was dramatic. 73 00:04:23,620 --> 00:04:26,650 Suddenly automatic general game players began to perform at a very high level. 74 00:04:26,650 --> 00:04:31,520 And using a variation of this technique, Cadia Player won the International General 75 00:04:31,520 --> 00:04:34,430 Game Playing Cos, Competition three times. 76 00:04:34,430 --> 00:04:35,920 And almost every general game playing program 77 00:04:35,920 --> 00:04:37,930 today includes some version of Monte Carlo's search.