The Minimax algorithm works well when a value function can accurately evaluate the state of the game at any point. In some games this is not feasible, and the only reliable indication of position is whether or not it leads to a win. For games such as these, we can use a Monte Carlo Tree Search (MCTS). This strategy attempts to find the average utility of the game over a number of simulations starting from each possible position. Each simulation (also called a playout or rollout) continues fully until an end state is reached. In the case that each end state is either a win or a loss, the average utility is the win percentage. If random move selection does not work well (as is often the case), a playout policy is used to determine the moves, biasing towards better moves according to some rules.

In order to determine the ordering of these paths, we must balance two factors: exploration of paths which have had few playouts, and exploitation of states which have performed well in previous playouts.

  1. Selection - Use the selection policy to determine a starting node within the search tree.
  2. Expansion - Expand the search tree by one node from this position. Label it 0/0.
  3. Simulation - Perform a playout from this node until completion, without adding intermediary nodes to the search tree.
  4. Back-propagation - For every node in the tree on the path to the selected node, labelled a/b, increment b. For every node representing the player who won the playout, also increment a.

After the completion of a set number of iterations, or once the allotted runtime has been exceeded, the move with the highest number of playouts is returned.

As well as the previous example of games for which a good evaluation function cannot be found, MCTS also generally performs better than Minimax for games which have a large branching factor, where the maximum depth with minimax would be severely limited.

This method of repeatedly using previous results as a starting point for deeper exploration is a form of Reinforcement Learning.