Enhancing Playout Policy Adaptation for General Game Playing

Chiara F. Sironi; Tristan Cazenave; Mark H. M. Winands

doi:10.1007/978-3-030-89453-5_9

Enhancing Playout Policy Adaptation for General Game Playing

Chiara F. Sironi, Tristan Cazenave, Mark H. M. Winands

Research output: Chapter in Book/Report/Conference proceeding › Chapter › Academic

34 Downloads (Pure)

Abstract

Playout policy adaptation (ppa) is a state-of-the-art strategy that has been proposed to control the playouts in monte-carlo tree search (mcts). Ppa has been successfully applied to many two-player, sequential-move games. This paper further evaluates this strategy in general game playing (ggp) by first reformulating it for simultaneous-move games. Next, it presents five enhancements for the strategy, four of which have been previously successfully applied to a related mcts playout strategy, the move-average sampling technique (mast). Experiments on a heterogeneous set of games show three enhancements to have a positive effect on ppa: (i) updating the policy for all players proportionally to their payoffs instead of updating only the policy of the winner, (ii) collecting statistics for n-grams of moves instead of single moves only, and (iii) discounting the backpropagated payoffs depending on the depth of the playout. Results also show enhanced ppa variants to be competitive with mast for small search budgets, and better for larger search budgets. The use of an \(\epsilon \)-greedy selection of moves and of after-move decay of statistics, instead, seem to have a detrimental effect on ppa.keywordsmonte-carlo tree searchplayout policy adaptationgeneral game playing.

Original language	English
Title of host publication	Monte Carlo Search
Subtitle of host publication	MCS 2020
Publisher	Springer, Cham
Pages	116-139
ISBN (Electronic)	978-3-030-89453-5
ISBN (Print)	978-3-030-89452-8
DOIs	https://doi.org/10.1007/978-3-030-89453-5_9
Publication status	Published - 16 Oct 2021

Publication series

Series	Communications in Computer and Information Science
Volume	1379
ISSN	1865-0929

Access to Document

10.1007/978-3-030-89453-5_9Licence: Unspecified

Full text Final published version, 1 MBLicence: Taverne

Cite this

@inbook{418900db0ecc4bf59e9b746d3f0d4e50,

title = "Enhancing Playout Policy Adaptation for General Game Playing",

abstract = "Playout policy adaptation (ppa) is a state-of-the-art strategy that has been proposed to control the playouts in monte-carlo tree search (mcts). Ppa has been successfully applied to many two-player, sequential-move games. This paper further evaluates this strategy in general game playing (ggp) by first reformulating it for simultaneous-move games. Next, it presents five enhancements for the strategy, four of which have been previously successfully applied to a related mcts playout strategy, the move-average sampling technique (mast). Experiments on a heterogeneous set of games show three enhancements to have a positive effect on ppa: (i) updating the policy for all players proportionally to their payoffs instead of updating only the policy of the winner, (ii) collecting statistics for n-grams of moves instead of single moves only, and (iii) discounting the backpropagated payoffs depending on the depth of the playout. Results also show enhanced ppa variants to be competitive with mast for small search budgets, and better for larger search budgets. The use of an \(\epsilon \)-greedy selection of moves and of after-move decay of statistics, instead, seem to have a detrimental effect on ppa.keywordsmonte-carlo tree searchplayout policy adaptationgeneral game playing.",

author = "Sironi, {Chiara F.} and Tristan Cazenave and Winands, {Mark H. M.}",

note = "Funding Information: Acknowledgments. This work was supported in part by the French government under management of Agence Nationale de la Recherche as part of the “Investisse-ments d{\textquoteright}avenir” program, reference ANR19-P3IA-0001 (PRAIRIE 3IA Institute). Publisher Copyright: {\textcopyright} 2021, Springer Nature Switzerland AG.",

year = "2021",

month = oct,

day = "16",

doi = "10.1007/978-3-030-89453-5_9",

language = "English",

isbn = "978-3-030-89452-8",

series = "Communications in Computer and Information Science",

publisher = "Springer, Cham",

pages = "116--139",

booktitle = "Monte Carlo Search",

address = "Switzerland",

}

TY - CHAP

T1 - Enhancing Playout Policy Adaptation for General Game Playing

AU - Sironi, Chiara F.

AU - Cazenave, Tristan

AU - Winands, Mark H. M.

N1 - Funding Information: Acknowledgments. This work was supported in part by the French government under management of Agence Nationale de la Recherche as part of the “Investisse-ments d’avenir” program, reference ANR19-P3IA-0001 (PRAIRIE 3IA Institute). Publisher Copyright: © 2021, Springer Nature Switzerland AG.

PY - 2021/10/16

Y1 - 2021/10/16

N2 - Playout policy adaptation (ppa) is a state-of-the-art strategy that has been proposed to control the playouts in monte-carlo tree search (mcts). Ppa has been successfully applied to many two-player, sequential-move games. This paper further evaluates this strategy in general game playing (ggp) by first reformulating it for simultaneous-move games. Next, it presents five enhancements for the strategy, four of which have been previously successfully applied to a related mcts playout strategy, the move-average sampling technique (mast). Experiments on a heterogeneous set of games show three enhancements to have a positive effect on ppa: (i) updating the policy for all players proportionally to their payoffs instead of updating only the policy of the winner, (ii) collecting statistics for n-grams of moves instead of single moves only, and (iii) discounting the backpropagated payoffs depending on the depth of the playout. Results also show enhanced ppa variants to be competitive with mast for small search budgets, and better for larger search budgets. The use of an \(\epsilon \)-greedy selection of moves and of after-move decay of statistics, instead, seem to have a detrimental effect on ppa.keywordsmonte-carlo tree searchplayout policy adaptationgeneral game playing.

AB - Playout policy adaptation (ppa) is a state-of-the-art strategy that has been proposed to control the playouts in monte-carlo tree search (mcts). Ppa has been successfully applied to many two-player, sequential-move games. This paper further evaluates this strategy in general game playing (ggp) by first reformulating it for simultaneous-move games. Next, it presents five enhancements for the strategy, four of which have been previously successfully applied to a related mcts playout strategy, the move-average sampling technique (mast). Experiments on a heterogeneous set of games show three enhancements to have a positive effect on ppa: (i) updating the policy for all players proportionally to their payoffs instead of updating only the policy of the winner, (ii) collecting statistics for n-grams of moves instead of single moves only, and (iii) discounting the backpropagated payoffs depending on the depth of the playout. Results also show enhanced ppa variants to be competitive with mast for small search budgets, and better for larger search budgets. The use of an \(\epsilon \)-greedy selection of moves and of after-move decay of statistics, instead, seem to have a detrimental effect on ppa.keywordsmonte-carlo tree searchplayout policy adaptationgeneral game playing.

U2 - 10.1007/978-3-030-89453-5_9

DO - 10.1007/978-3-030-89453-5_9

M3 - Chapter

SN - 978-3-030-89452-8

T3 - Communications in Computer and Information Science

SP - 116

EP - 139

BT - Monte Carlo Search

PB - Springer, Cham

ER -