Minimizing Simple and Cumulative Regret in Monte-Carlo Tree Search

Tom Pepels, Tristan Cazenave, Mark H. M. Winands, Marc Lanctot

Research output: Chapter in Book/Report/Conference proceedingChapterAcademic

Abstract

Regret minimization is important in both the multi-armed bandit problem and monte-carlo tree search (mcts). Recently, simple regret, i.e., the regret of not recommending the best action, has been proposed as an alternative to cumulative regret in mcts, i.e., regret accumulated over time. Each type of regret is appropriate in different contexts. Although the majority of mcts research applies the uct selection policy for minimizing cumulative regret in the tree, this paper introduces a new mcts variant, hybrid mcts (h-mcts), which minimizes both types of regret in different parts of the tree. H-mcts uses shot, a recursive version of sequential halving, to minimize simple regret near the root, and uct to minimize cumulative regret when descending further down the tree. We discuss the motivation for this new search technique, and show the performance of h-mcts in six distinct two-player games: amazons, atarigo, ataxx, breakthrough, nogo, and pentalath.
Original languageEnglish
Title of host publicationComputer Games
Subtitle of host publicationThird Workshop on Computer Games, CGW 2014, Held in Conjunction with the 21st European Conference on Artificial Intelligence, ECAI 2014, Prague, Czech Republic, August 18, 2014, Revised Selected Papers
PublisherSpringer
Pages1-15
Number of pages15
ISBN (Print)978-3-319-14923-3
DOIs
Publication statusPublished - 2014

Publication series

SeriesCommunications in Computer and Information Science
Volume504

Cite this