1 00:00:04,140 --> 00:00:06,720 Compulsive deliberation is wasteful, in 2 00:00:06,720 --> 00:00:09,050 that it's computations are repeated unnecessarily. 3 00:00:10,380 --> 00:00:12,730 Once a player is able to find a path to a terminal state 4 00:00:12,730 --> 00:00:16,739 with maximum reward, he should not have to repeat that computation on every step. 5 00:00:18,700 --> 00:00:21,800 Sequential planning, which we're going to be seeing in this lesson, 6 00:00:21,800 --> 00:00:23,780 is the antithesis of compulsive deliberation, 7 00:00:23,780 --> 00:00:25,420 in which no work is repeated. 8 00:00:25,420 --> 00:00:27,270 Once a sequential planner finds a good path, 9 00:00:27,270 --> 00:00:29,350 it simply saves the sequence of actions along 10 00:00:29,350 --> 00:00:31,560 that path, and then executes those actions. 11 00:00:31,560 --> 00:00:34,579 Step by step until the game is done without any further deliberation. 12 00:00:36,370 --> 00:00:38,260 So, sequential planner's one that produces 13 00:00:38,260 --> 00:00:40,270 an optimal sequential plan usually during the 14 00:00:40,270 --> 00:00:43,870 start clock and then executesd the step of that plan during game play. 15 00:00:45,370 --> 00:00:49,650 Sequential planning has multiple benefits relative to compulsive deliberation. 16 00:00:49,650 --> 00:00:54,670 First of all, it's not as wasteful since it searches the game tree just once. 17 00:00:54,670 --> 00:00:57,000 The start clock's sufficiently long, the planning 18 00:00:57,000 --> 00:00:59,700 can be done entirely during the start clock. 19 00:00:59,700 --> 00:01:02,840 And after that, the execution time is very low, since all the player needs to do 20 00:01:02,840 --> 00:01:04,410 on a step is to look up the 21 00:01:04,410 --> 00:01:06,800 action for that step, without doing any search whatsoever. 22 00:01:08,420 --> 00:01:10,570 Note that although the planning is usually done during the start up 23 00:01:10,570 --> 00:01:13,690 period of the game, it can also be done during regular game play. 24 00:01:13,690 --> 00:01:18,040 And it's also possible to mix sequential planning with other techniques. 25 00:01:18,040 --> 00:01:19,930 For example in the case of large games, a player 26 00:01:19,930 --> 00:01:24,490 might randomly search during the initial part of the game. 27 00:01:24,490 --> 00:01:26,330 And then switch the sequential planning once the game 28 00:01:26,330 --> 00:01:29,072 tree becomes small enough to produce the sequential plan. 29 00:01:29,072 --> 00:01:31,670 Of course in the last case the player's abilities 30 00:01:31,670 --> 00:01:37,060 exceed depends in the strategy used before sequential planning begins. 31 00:01:37,060 --> 00:01:40,410 Okay, let's start our look at sequential planning with a couple of definitions. 32 00:01:40,410 --> 00:01:45,010 A sequential plan, for a single player game, is a sequence of actions that 33 00:01:45,010 --> 00:01:47,580 leads from the initial state of the game to a terminal state. 34 00:01:48,680 --> 00:01:51,440 Such that, every action in the sequence is 35 00:01:51,440 --> 00:01:53,629 legal in the state in which the action's performed. 36 00:01:54,790 --> 00:01:59,490 And 2, none of the intermediate states produced during the execution is terminal. 37 00:02:00,710 --> 00:02:02,880 The sequential plan's optimal if and only if 38 00:02:02,880 --> 00:02:06,030 no other sequential plan produces a greater final reward. 39 00:02:07,940 --> 00:02:10,290 Here are some examples of sequential plans for 40 00:02:10,290 --> 00:02:10,790 eight puzzle. 41 00:02:12,340 --> 00:02:16,640 Now, the first play prescribes a move to the right Followed by a 42 00:02:16,640 --> 00:02:21,190 move down, followed by another move to the right, and another move down. 43 00:02:21,190 --> 00:02:24,290 And this clearly leads to a state in which all the tiles 44 00:02:24,290 --> 00:02:27,900 are in their goal positions, and the empty cells in the lower right. 45 00:02:27,900 --> 00:02:30,532 And so the value for this stage is 100. 46 00:02:30,532 --> 00:02:34,629 However this is not the only plan that works. 47 00:02:35,630 --> 00:02:41,210 The player could also move right, move down, move left, move right, 48 00:02:41,210 --> 00:02:46,750 move right, move down, and arrive at the same state, after a couple more steps. 49 00:02:48,030 --> 00:02:53,460 Or, it could move right, down, left, right, left, 50 00:02:53,460 --> 00:02:56,950 right, right, and down to get there as well. 51 00:02:59,490 --> 00:03:04,050 Now agreed, these later two plans are longer than they need to be. 52 00:03:05,400 --> 00:03:06,650 but they're both optimal in that they 53 00:03:06,650 --> 00:03:08,860 produce a terminal state with the maximal value. 54 00:03:10,870 --> 00:03:13,270 By contrast, the sequential plan, 55 00:03:15,290 --> 00:03:22,040 right, left, right, left, right, left, right, left is not optimal. 56 00:03:22,040 --> 00:03:25,570 It leads to a terminal state since any plan with eight steps is terminal. 57 00:03:25,570 --> 00:03:28,438 However, the resulting reward is only 40 points. 58 00:03:28,438 --> 00:03:30,840 And there are other plans that produce higher values. 59 00:03:34,170 --> 00:03:38,030 Implementation of sequential planner is, again, similar to that for 60 00:03:38,030 --> 00:03:42,310 a previous system, method that we've seen, namely compulsive deliberation. 61 00:03:44,140 --> 00:03:46,615 We set up a couple of additional global variables in this case. 62 00:03:46,615 --> 00:03:50,419 One to hold the plan, and the other to keep track of the current step. 63 00:03:54,020 --> 00:03:56,750 During the start clock, the player uses the 64 00:03:56,750 --> 00:04:00,100 best plan subroutine to produce a sequential plan. 65 00:04:00,100 --> 00:04:06,480 And has to reverse the plan, as we'll see since, best plan builds the plan backward. 66 00:04:08,110 --> 00:04:09,990 the plan's then stored in the plan variable, 67 00:04:09,990 --> 00:04:13,460 and the step counter is initialized to zero. 68 00:04:13,460 --> 00:04:15,480 Finally, the play handler in each step simply 69 00:04:15,480 --> 00:04:18,000 reads off the action corresponding to that step, 70 00:04:19,060 --> 00:04:22,960 updates the step counter, and returns the action for that step. 71 00:04:26,610 --> 00:04:30,340 Okay, so the workhorse of this is clearly the best plan subroutine. 72 00:04:30,340 --> 00:04:33,830 Not surprisingly it's an analogous to Mac Score. 73 00:04:33,830 --> 00:04:38,180 Takes a state as argument, but instead of returning a simple score, returns 74 00:04:38,180 --> 00:04:42,600 a pair consisting of a score and a plan to achieve that score. 75 00:04:43,905 --> 00:04:44,870 Alright, let's see how this works. 76 00:04:46,300 --> 00:04:48,950 As in Mac Score, first step is to check whether 77 00:04:48,950 --> 00:04:51,720 the state is terminal if so then the procedure simply computes 78 00:04:51,720 --> 00:04:53,770 the player's reward for that state, and 79 00:04:53,770 --> 00:04:56,430 returns that score paired with an empty plan. 80 00:04:56,430 --> 00:04:58,130 That is, an empty list of actions, 81 00:05:02,850 --> 00:05:05,790 otherwise it computes all legal actions for the specified state. 82 00:05:07,660 --> 00:05:11,250 It computes the next state corresponding to the first of these actions 83 00:05:11,250 --> 00:05:14,670 and computes the best score and best plan for that successor state. 84 00:05:14,670 --> 00:05:15,170 And 85 00:05:17,650 --> 00:05:19,630 it then searches the remaining possible actions to 86 00:05:19,630 --> 00:05:20,780 see if it can find a better one. 87 00:05:22,790 --> 00:05:24,645 For each it computes the next state, gets 88 00:05:24,645 --> 00:05:26,670 the best score and best plan for that state. 89 00:05:26,670 --> 00:05:30,340 And it compares the score to the best score it's seen so far. 90 00:05:30,340 --> 00:05:35,200 If the score is better, then it saves that score and the corresponding plan. 91 00:05:35,200 --> 00:05:38,770 Now with the action that got it there appended to the end. 92 00:05:43,770 --> 00:05:46,290 After all the actions are executed, best plan turns 93 00:05:46,290 --> 00:05:48,540 a pair of the best score and the best plan.