Monte Carlo Tree Search is a more
sophisticated variation of Monte
Carlo Search, that tackles some of the
weaknesses of the simpler method.
both methods build up a game tree
incrementally,
and both rely on random simulation of
games.
But they differ in the way the tree is
expanded.
MCS, Monte Carlo Search uniformly expands
the partial game tree during
it's expansion phase, and then simulates
games starting at the state,
states on the fringe of the expanded tree.
MCTS, Monte Carlo Tree Search, uses a more
sophisticated approach, in which
the, the processes of expansion in the
simulation are kind of interleaved.
MCTS processes the game tree in cycles of
4 steps each.
And after each cycle's complete, it
repeats
the steps, so long as there's time
remaining.
At which point, it selects an action based
on the statistics its accumulated to that
point.
On the selection step, the player
traverses the tree produced
thus far, to select an unexpanded node of
the tree.
Making choices based on visit counts and
utilities stored on nodes of the tree.
We'll see how that
happens a little bit later.
During expansion, the successes of the
state chosen
during the selection phase are added to
the tree.
The player then simulates the game
starting
at the node chosen during the selection
phase.
In so doing, it chooses actions at random
until a terminal state is encountered, as
with MCS.
And finally, the value of the terminal
state is propagated back along the
path to the root node, and the
visit counts and utilities are updated
accordingly.
Here's an implementation of the MCTS
selection procedure.
If the initial state has not been seen
before, not
with zero visits, has not been seen
before, then hit selected.
Otherwise, a procedure search is the
successors of the node.
If any of them have not been seen, then
one of the unseen nodes is selected.
If all of the successors have been seen
before, then
procedure uses the select fin subroutine,
which we'll talk about.
The fine values for those nodes, and
chooses
the one that maximizes this value.
Okay, one of the most common ways of
implementing selection fun, is
what's called UCT which is short for upper
confidence bounds applied to trees.
A typical UCT formula is shown here.
Vi plus the square root of log n p over
ni.
Vi's the average reward for that state,
that it's seen so far.
Np is the total number of times the
parent's, state's parent was picked.
Ni is the number of times this particular
state was picked.
Course, there are other ways that, that
one can evaluate states.
The form here is based on combination of
what's called exploitation exploration.
Exploitation here means that results use
of results on
previously explored states, which is the
first term, vi.
Exploration means expansion of as yet
unexplored states,
a measure of which is the second term.
And at the bottom of the slide, we have
simply implementation of the formula shown
at the top.
Expansion in MCTS is basically the same as
that for MCS.
An implementation for a single player is
shown here.
On large games with large time bounds,
it's possible that the
space consumed as process could exceed the
memory available to a player.
In such cases, it's common to use a
variation of the selection procedure, in
which no additional states are added to
the tree, and just probes are used.
Simulation for MCTS is essentially the
same as simulation for MCS.
So the same procedure, exact procedure can
be used in both methods.
And MCTS however, has a different
procedure for recording the results,
called backpropagation.
At the selected node, the method records a
visit count and a utility.
A visit count in this case is one, since
it, it's a newly-processed state.
The utility is the result of the
simulation.
the procedure then propagates to ancestors
of this node.
In the case of a single-player
game, the procedure simply adds one to the
visit count of each ancestor,
and it augments its total utility by the
utility obtained on the latest simulation.
In the case of a multi-player game, the
propagated value
is the minimum of the values for all
opponent actions.