Analysis of the Impact of Randomization of Search-Control Parameters in Monte-Carlo Tree Search

Chiara F. Sironi*, Mark H. M. Winands

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

22 Downloads (Pure)

Abstract

Monte-Carlo Tree Search (MCTS) has been applied successfully in many domains, including games. However, its performance is not uniform on all domains, and it also depends on how parameters that control the search are set. Parameter values that are optimal for a task might be sub-optimal for another. In a domain that tackles many games with di_erent characteristics, like general game playing (GGP), selecting appropriate parameter settings is not a trivial task. Games are unknown to the player, thus, _nding optimal parameters for a given game in advance is not feasible. Previous work has looked into tuning parameter values online, while the game is being played, showing some promising results. This tuning approach looks for optimal parameter values, balancing exploitation of values that performed well so far in the search and exploration of less sampled values. Continuously changing parameter values while performing the search, combined also with exploration of multiple values, introduces some randomization in the process. In addition, previous research indicates that adding randomization to certain components of MCTS might increase the diversi_cation of the search and improve the performance. Therefore, this article investigates the e_ect of randomly selecting values for MCTS search-control parameters online among prede_ned sets of reasonable values. For the GGP domain, this article evaluates four di_erent online parameter randomization strategies by comparing them with other methods to set parameter values: online parameter tuning, o_ine parameter tuning and sub-optimal parameter choices. Results on a set of 14 heterogeneous abstract games show that randomizing parameter values before each simulation has a positive e_ect on the search in some of the tested games, with respect to using _xed o_ine-tuned parameters. Moreover, results show a clear distinction between games for which online parameter tuning works best and games for which online randomization works best. In addition, the overall performance of online parameter randomization is closer to the one of online parameter turning than the one of sub-optimal parameter values, showing that online randomization is a reasonable parameter selection strategy. When analyzing the structure of the search trees generated by agents that use the di_erent parameters selection strategies, it is clear that randomization causes MCTS to become more explorative, which is helpful for alignment games that present many winning paths in their trees. Online parameter tuning, instead, seems more suitable for games that present narrow winning paths and many losing paths.

Original languageEnglish
Pages (from-to)717-757
Number of pages41
JournalJournal of Artificial Intelligence Research
Volume72
DOIs
Publication statusPublished - 8 Nov 2021

Cite this