Compulsive deliberation is wasteful, in that it's computations are repeated unnecessarily. Once a player is able to find a path to a terminal state with maximum reward, he should not have to repeat that computation on every step. Sequential planning, which we're going to be seeing in this lesson, is the antithesis of compulsive deliberation, in which no work is repeated. Once a sequential planner finds a good path, it simply saves the sequence of actions along that path, and then executes those actions. Step by step until the game is done without any further deliberation. So, sequential planner's one that produces an optimal sequential plan usually during the start clock and then executesd the step of that plan during game play. Sequential planning has multiple benefits relative to compulsive deliberation. First of all, it's not as wasteful since it searches the game tree just once. The start clock's sufficiently long, the planning can be done entirely during the start clock. And after that, the execution time is very low, since all the player needs to do on a step is to look up the action for that step, without doing any search whatsoever. Note that although the planning is usually done during the start up period of the game, it can also be done during regular game play. And it's also possible to mix sequential planning with other techniques. For example in the case of large games, a player might randomly search during the initial part of the game. And then switch the sequential planning once the game tree becomes small enough to produce the sequential plan. Of course in the last case the player's abilities exceed depends in the strategy used before sequential planning begins. Okay, let's start our look at sequential planning with a couple of definitions. A sequential plan, for a single player game, is a sequence of actions that leads from the initial state of the game to a terminal state. Such that, every action in the sequence is legal in the state in which the action's performed. And 2, none of the intermediate states produced during the execution is terminal. The sequential plan's optimal if and only if no other sequential plan produces a greater final reward. Here are some examples of sequential plans for eight puzzle. Now, the first play prescribes a move to the right Followed by a move down, followed by another move to the right, and another move down. And this clearly leads to a state in which all the tiles are in their goal positions, and the empty cells in the lower right. And so the value for this stage is 100. However this is not the only plan that works. The player could also move right, move down, move left, move right, move right, move down, and arrive at the same state, after a couple more steps. Or, it could move right, down, left, right, left, right, right, and down to get there as well. Now agreed, these later two plans are longer than they need to be. but they're both optimal in that they produce a terminal state with the maximal value. By contrast, the sequential plan, right, left, right, left, right, left, right, left is not optimal. It leads to a terminal state since any plan with eight steps is terminal. However, the resulting reward is only 40 points. And there are other plans that produce higher values. Implementation of sequential planner is, again, similar to that for a previous system, method that we've seen, namely compulsive deliberation. We set up a couple of additional global variables in this case. One to hold the plan, and the other to keep track of the current step. During the start clock, the player uses the best plan subroutine to produce a sequential plan. And has to reverse the plan, as we'll see since, best plan builds the plan backward. the plan's then stored in the plan variable, and the step counter is initialized to zero. Finally, the play handler in each step simply reads off the action corresponding to that step, updates the step counter, and returns the action for that step. Okay, so the workhorse of this is clearly the best plan subroutine. Not surprisingly it's an analogous to Mac Score. Takes a state as argument, but instead of returning a simple score, returns a pair consisting of a score and a plan to achieve that score. Alright, let's see how this works. As in Mac Score, first step is to check whether the state is terminal if so then the procedure simply computes the player's reward for that state, and returns that score paired with an empty plan. That is, an empty list of actions, otherwise it computes all legal actions for the specified state. It computes the next state corresponding to the first of these actions and computes the best score and best plan for that successor state. And it then searches the remaining possible actions to see if it can find a better one. For each it computes the next state, gets the best score and best plan for that state. And it compares the score to the best score it's seen so far. If the score is better, then it saves that score and the corresponding plan. Now with the action that got it there appended to the end. After all the actions are executed, best plan turns a pair of the best score and the best plan.