Quality-based Rewards for Monte-Carlo Tree Search Simulations

Tom Pepels; Mandy J. W. Tak; Marc Lanctot; Mark H. M. Winands

doi:10.3233/978-1-61499-419-0-705

Quality-based Rewards for Monte-Carlo Tree Search Simulations

Tom Pepels^*, Mandy J. W. Tak, Marc Lanctot, Mark H. M. Winands

^*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceeding › Conference article in proceeding › Academic › peer-review

Abstract

Monte-Carlo Tree Search is a best-first search technique based on simulations to sample the state space of a decision-making problem. In games, positions are evaluated based on estimates obtained from rewards of numerous randomized play-outs. Generally, rewards from play-outs are discrete values representing the outcome of the game ( loss, draw, or win), e. g., r is an element of{-1, 0, 1}, which are backpropagated from expanded leaf nodes to the root node. However, a play-out may provide additional information. In this paper, we introduce new measures for assessing the a posteriori quality of a simulation. We show that altering the rewards of play-outs based on their assessed quality improves results in six distinct two-player games and in the General Game Playing agent CADIAPLAYER. We propose two specific enhancements, the Relative Bonus and Qualitative Bonus. Both are used as control variates, a variance reduction method for statistical simulation. Relative Bonus is based on the number of moves made during a simulation and Qualitative Bonus relies on a domain-dependent assessment of the game''s terminal state. We show that the proposed enhancements, both separate and combined, lead to significant performance increases in the domains discussed.

Original language	English
Title of host publication	Proceedings of the Twenty-first European Conference on Artificial Intelligence
Place of Publication	Amsterdam, The Netherlands, The Netherlands
Publisher	IOS Press
Pages	705-710
Number of pages	6
ISBN (Print)	978-1-61499-418-3
DOIs	https://doi.org/10.3233/978-1-61499-419-0-705
Publication status	Published - 2014

Publication series

Series	Frontiers in Artificial Intelligence and Applications
Volume	263

Access to Document

10.3233/978-1-61499-419-0-705Licence: CC BY-NC

https://doi.org/10.3233/978-1-61499-419-0-705

Cite this

@inproceedings{81ec12483bb946f5ae2a2a57e30b59b5,

title = "Quality-based Rewards for Monte-Carlo Tree Search Simulations",

abstract = "Monte-Carlo Tree Search is a best-first search technique based on simulations to sample the state space of a decision-making problem. In games, positions are evaluated based on estimates obtained from rewards of numerous randomized play-outs. Generally, rewards from play-outs are discrete values representing the outcome of the game ( loss, draw, or win), e. g., r is an element of{-1, 0, 1}, which are backpropagated from expanded leaf nodes to the root node. However, a play-out may provide additional information. In this paper, we introduce new measures for assessing the a posteriori quality of a simulation. We show that altering the rewards of play-outs based on their assessed quality improves results in six distinct two-player games and in the General Game Playing agent CADIAPLAYER. We propose two specific enhancements, the Relative Bonus and Qualitative Bonus. Both are used as control variates, a variance reduction method for statistical simulation. Relative Bonus is based on the number of moves made during a simulation and Qualitative Bonus relies on a domain-dependent assessment of the game''s terminal state. We show that the proposed enhancements, both separate and combined, lead to significant performance increases in the domains discussed.",

author = "Tom Pepels and Tak, {Mandy J. W.} and Marc Lanctot and Winands, {Mark H. M.}",

year = "2014",

doi = "10.3233/978-1-61499-419-0-705",

language = "English",

isbn = "978-1-61499-418-3",

series = "Frontiers in Artificial Intelligence and Applications",

publisher = "IOS Press",

pages = "705--710",

booktitle = "Proceedings of the Twenty-first European Conference on Artificial Intelligence",

address = "Netherlands",

}

Pepels, T, Tak, MJW, Lanctot, M & Winands, MHM 2014, Quality-based Rewards for Monte-Carlo Tree Search Simulations. in Proceedings of the Twenty-first European Conference on Artificial Intelligence. IOS Press, Amsterdam, The Netherlands, The Netherlands, Frontiers in Artificial Intelligence and Applications, vol. 263, pp. 705-710. https://doi.org/10.3233/978-1-61499-419-0-705

Quality-based Rewards for Monte-Carlo Tree Search Simulations. / Pepels, Tom; Tak, Mandy J. W.; Lanctot, Marc et al.
Proceedings of the Twenty-first European Conference on Artificial Intelligence. Amsterdam, The Netherlands, The Netherlands: IOS Press, 2014. p. 705-710 (Frontiers in Artificial Intelligence and Applications, Vol. 263).

Research output: Chapter in Book/Report/Conference proceeding › Conference article in proceeding › Academic › peer-review

TY - GEN

T1 - Quality-based Rewards for Monte-Carlo Tree Search Simulations

AU - Pepels, Tom

AU - Tak, Mandy J. W.

AU - Lanctot, Marc

AU - Winands, Mark H. M.

PY - 2014

Y1 - 2014

N2 - Monte-Carlo Tree Search is a best-first search technique based on simulations to sample the state space of a decision-making problem. In games, positions are evaluated based on estimates obtained from rewards of numerous randomized play-outs. Generally, rewards from play-outs are discrete values representing the outcome of the game ( loss, draw, or win), e. g., r is an element of{-1, 0, 1}, which are backpropagated from expanded leaf nodes to the root node. However, a play-out may provide additional information. In this paper, we introduce new measures for assessing the a posteriori quality of a simulation. We show that altering the rewards of play-outs based on their assessed quality improves results in six distinct two-player games and in the General Game Playing agent CADIAPLAYER. We propose two specific enhancements, the Relative Bonus and Qualitative Bonus. Both are used as control variates, a variance reduction method for statistical simulation. Relative Bonus is based on the number of moves made during a simulation and Qualitative Bonus relies on a domain-dependent assessment of the game''s terminal state. We show that the proposed enhancements, both separate and combined, lead to significant performance increases in the domains discussed.

AB - Monte-Carlo Tree Search is a best-first search technique based on simulations to sample the state space of a decision-making problem. In games, positions are evaluated based on estimates obtained from rewards of numerous randomized play-outs. Generally, rewards from play-outs are discrete values representing the outcome of the game ( loss, draw, or win), e. g., r is an element of{-1, 0, 1}, which are backpropagated from expanded leaf nodes to the root node. However, a play-out may provide additional information. In this paper, we introduce new measures for assessing the a posteriori quality of a simulation. We show that altering the rewards of play-outs based on their assessed quality improves results in six distinct two-player games and in the General Game Playing agent CADIAPLAYER. We propose two specific enhancements, the Relative Bonus and Qualitative Bonus. Both are used as control variates, a variance reduction method for statistical simulation. Relative Bonus is based on the number of moves made during a simulation and Qualitative Bonus relies on a domain-dependent assessment of the game''s terminal state. We show that the proposed enhancements, both separate and combined, lead to significant performance increases in the domains discussed.

U2 - 10.3233/978-1-61499-419-0-705

DO - 10.3233/978-1-61499-419-0-705

M3 - Conference article in proceeding

SN - 978-1-61499-418-3

T3 - Frontiers in Artificial Intelligence and Applications

SP - 705

EP - 710

BT - Proceedings of the Twenty-first European Conference on Artificial Intelligence

PB - IOS Press

CY - Amsterdam, The Netherlands, The Netherlands

ER -