Monte carlo tree search (mcts) has become a widely popular sampled-based search algorithm for two-player games with perfect information. When actions are chosen simultaneously, players may need to mix between their strategies. In this paper, we discuss the adaptation of mcts to simultaneous move games. We introduce a new algorithm, online outcome sampling (oos), that approaches a nash equilibrium strategy over time. We compare both head-to-head performance and exploitability of several mcts variants in goofspiel. We show that regret matching and oos perform best and that all variants produce less exploitable strategies than uct.
|Title of host publication||Computer Games|
|Subtitle of host publication||Workshop on Computer Games, CGW 2013, Held in Conjunction with the 23rd International Conference on Artificial Intelligence, IJCAI 2013, Beijing, China, August 3, 2013, Revised Selected Papers|
|Number of pages||16|
|Publication status||Published - 2014|
|Series||Communications in Computer and Information Science|