TY - GEN
T1 - Quality-based Rewards for Monte-Carlo Tree Search Simulations
AU - Pepels, Tom
AU - Tak, Mandy J. W.
AU - Lanctot, Marc
AU - Winands, Mark H. M.
PY - 2014
Y1 - 2014
N2 - Monte-Carlo Tree Search is a best-first search technique based on simulations to sample the state space of a decision-making problem. In games, positions are evaluated based on estimates obtained from rewards of numerous randomized play-outs. Generally, rewards from play-outs are discrete values representing the outcome of the game ( loss, draw, or win), e. g., r is an element of{-1, 0, 1}, which are backpropagated from expanded leaf nodes to the root node. However, a play-out may provide additional information. In this paper, we introduce new measures for assessing the a posteriori quality of a simulation. We show that altering the rewards of play-outs based on their assessed quality improves results in six distinct two-player games and in the General Game Playing agent CADIAPLAYER. We propose two specific enhancements, the Relative Bonus and Qualitative Bonus. Both are used as control variates, a variance reduction method for statistical simulation. Relative Bonus is based on the number of moves made during a simulation and Qualitative Bonus relies on a domain-dependent assessment of the game''s terminal state. We show that the proposed enhancements, both separate and combined, lead to significant performance increases in the domains discussed.
AB - Monte-Carlo Tree Search is a best-first search technique based on simulations to sample the state space of a decision-making problem. In games, positions are evaluated based on estimates obtained from rewards of numerous randomized play-outs. Generally, rewards from play-outs are discrete values representing the outcome of the game ( loss, draw, or win), e. g., r is an element of{-1, 0, 1}, which are backpropagated from expanded leaf nodes to the root node. However, a play-out may provide additional information. In this paper, we introduce new measures for assessing the a posteriori quality of a simulation. We show that altering the rewards of play-outs based on their assessed quality improves results in six distinct two-player games and in the General Game Playing agent CADIAPLAYER. We propose two specific enhancements, the Relative Bonus and Qualitative Bonus. Both are used as control variates, a variance reduction method for statistical simulation. Relative Bonus is based on the number of moves made during a simulation and Qualitative Bonus relies on a domain-dependent assessment of the game''s terminal state. We show that the proposed enhancements, both separate and combined, lead to significant performance increases in the domains discussed.
U2 - 10.3233/978-1-61499-419-0-705
DO - 10.3233/978-1-61499-419-0-705
M3 - Conference article in proceeding
SN - 978-1-61499-418-3
T3 - Frontiers in Artificial Intelligence and Applications
SP - 705
EP - 710
BT - Proceedings of the Twenty-first European Conference on Artificial Intelligence
PB - IOS Press
CY - Amsterdam, The Netherlands, The Netherlands
ER -