Anytime Sequential Halving in Monte-Carlo Tree Search

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingAcademicpeer-review

Abstract

Monte-Carlo Tree Search (MCTS) typically uses multi-armed bandit (MAB) strategies designed to minimize cumulative regret, such as UCB1, as its selection strategy. However, in the root node of the search tree, it is more sensible to minimize simple regret. Previous work has proposed using Sequential Halving as selection strategy in the root node, as, in theory, it performs better with respect to simple regret. However, Sequential Halving requires a budget of iterations to be predetermined, which is often impractical. This paper proposes an anytime version of the algorithm, which can be halted at any arbitrary time and still return a satisfactory result, while being designed such that it approximates the behavior of Sequential Halving. Empirical results in synthetic MAB problems and ten different board games demonstrate that the algorithm’s performance is competitive with Sequential Halving and UCB1 (and their analogues in MCTS).

Original languageEnglish
Title of host publicationComputers and Games - 12th International Conference, CG 2024, Revised Selected Papers
EditorsMichael Hartisch, Chu-Hsuan Hsueh, Jonathan Schaeffer
PublisherSpringer, Cham
Pages91-102
Number of pages12
ISBN (Print)9783031865848
DOIs
Publication statusPublished - 2025

Publication series

SeriesLecture Notes in Computer Science
Volume15550
ISSN0302-9743

Fingerprint

Dive into the research topics of 'Anytime Sequential Halving in Monte-Carlo Tree Search'. Together they form a unique fingerprint.

Cite this