N-Grams and the Last-Good-Reply Policy Applied in General Game Playing

Mandy J. W. Tak; Mark H. M. Winands; Yngvi Björnsson

doi:10.1109/TCIAIG.2012.2200252

N-Grams and the Last-Good-Reply Policy Applied in General Game Playing

Mandy J. W. Tak^*, Mark H. M. Winands, Yngvi Björnsson

^*Corresponding author for this work

Networks and Strategic Optimization

Research output: Contribution to journal › Article › Academic › peer-review

Abstract

The aim of general game playing (GGP) is to create programs capable of playing a wide range of different games at an expert level, given only the rules of the game. The most successful GGP programs currently employ simulation-based Monte Carlo tree search (MCTS). The performance of MCTS depends heavily on the simulation strategy used. In this paper, we introduce improved simulation strategies for GGP that we implement and test in the GGP agent CADIAPLAYER, which won the International GGP competition in both 2007 and 2008. There are two aspects to the improvements: first, we show that a simple epsilon-greedy exploration strategy works better in the simulation play-outs than the softmax-based Gibbs measure currently used in CADIAPLAYER and, second, we introduce a general framework based on N-grams for learning promising move sequences. Collectively, these enhancements result in a much improved performance of CADIAPLAYER. For example, in our test suite consisting of five different two-player turn-based games, they led to an impressive average win rate of approximately 70%. The enhancements are also shown to be effective in multiplayer and simultaneous-move games. We additionally perform experiments with the last-good-reply policy (LGRP). The LGRP combined with N-grams is also tested. The LGRP has already been shown to be successful in Go programs and we demonstrate that it also has promise in GGP.

Original language	English
Pages (from-to)	73-83
Number of pages	11
Journal	IEEE Transactions on Computational Intelligence and AI in Games
Volume	4
Issue number	2
DOIs	https://doi.org/10.1109/TCIAIG.2012.2200252
Publication status	Published - Jun 2012

Keywords

General game playing (GGP)
last-good-reply policy (LGRP)
Monte Carlo tree search (MCTS)
N-grams

Access to Document

10.1109/TCIAIG.2012.2200252

Cite this

@article{f4aa087b48b24a669da8659ba82bf5ac,

title = "N-Grams and the Last-Good-Reply Policy Applied in General Game Playing",

abstract = "The aim of general game playing (GGP) is to create programs capable of playing a wide range of different games at an expert level, given only the rules of the game. The most successful GGP programs currently employ simulation-based Monte Carlo tree search (MCTS). The performance of MCTS depends heavily on the simulation strategy used. In this paper, we introduce improved simulation strategies for GGP that we implement and test in the GGP agent CADIAPLAYER, which won the International GGP competition in both 2007 and 2008. There are two aspects to the improvements: first, we show that a simple epsilon-greedy exploration strategy works better in the simulation play-outs than the softmax-based Gibbs measure currently used in CADIAPLAYER and, second, we introduce a general framework based on N-grams for learning promising move sequences. Collectively, these enhancements result in a much improved performance of CADIAPLAYER. For example, in our test suite consisting of five different two-player turn-based games, they led to an impressive average win rate of approximately 70%. The enhancements are also shown to be effective in multiplayer and simultaneous-move games. We additionally perform experiments with the last-good-reply policy (LGRP). The LGRP combined with N-grams is also tested. The LGRP has already been shown to be successful in Go programs and we demonstrate that it also has promise in GGP.",

keywords = "General game playing (GGP), last-good-reply policy (LGRP), Monte Carlo tree search (MCTS), N-grams",

author = "Tak, {Mandy J. W.} and Winands, {Mark H. M.} and Yngvi Bj{\"o}rnsson",

year = "2012",

month = jun,

doi = "10.1109/TCIAIG.2012.2200252",

language = "English",

volume = "4",

pages = "73--83",

journal = "IEEE Transactions on Computational Intelligence and AI in Games",

issn = "1943-068X",

publisher = "IEEE",

number = "2",

}

TY - JOUR

T1 - N-Grams and the Last-Good-Reply Policy Applied in General Game Playing

AU - Tak, Mandy J. W.

AU - Winands, Mark H. M.

AU - Björnsson, Yngvi

PY - 2012/6

Y1 - 2012/6

N2 - The aim of general game playing (GGP) is to create programs capable of playing a wide range of different games at an expert level, given only the rules of the game. The most successful GGP programs currently employ simulation-based Monte Carlo tree search (MCTS). The performance of MCTS depends heavily on the simulation strategy used. In this paper, we introduce improved simulation strategies for GGP that we implement and test in the GGP agent CADIAPLAYER, which won the International GGP competition in both 2007 and 2008. There are two aspects to the improvements: first, we show that a simple epsilon-greedy exploration strategy works better in the simulation play-outs than the softmax-based Gibbs measure currently used in CADIAPLAYER and, second, we introduce a general framework based on N-grams for learning promising move sequences. Collectively, these enhancements result in a much improved performance of CADIAPLAYER. For example, in our test suite consisting of five different two-player turn-based games, they led to an impressive average win rate of approximately 70%. The enhancements are also shown to be effective in multiplayer and simultaneous-move games. We additionally perform experiments with the last-good-reply policy (LGRP). The LGRP combined with N-grams is also tested. The LGRP has already been shown to be successful in Go programs and we demonstrate that it also has promise in GGP.

AB - The aim of general game playing (GGP) is to create programs capable of playing a wide range of different games at an expert level, given only the rules of the game. The most successful GGP programs currently employ simulation-based Monte Carlo tree search (MCTS). The performance of MCTS depends heavily on the simulation strategy used. In this paper, we introduce improved simulation strategies for GGP that we implement and test in the GGP agent CADIAPLAYER, which won the International GGP competition in both 2007 and 2008. There are two aspects to the improvements: first, we show that a simple epsilon-greedy exploration strategy works better in the simulation play-outs than the softmax-based Gibbs measure currently used in CADIAPLAYER and, second, we introduce a general framework based on N-grams for learning promising move sequences. Collectively, these enhancements result in a much improved performance of CADIAPLAYER. For example, in our test suite consisting of five different two-player turn-based games, they led to an impressive average win rate of approximately 70%. The enhancements are also shown to be effective in multiplayer and simultaneous-move games. We additionally perform experiments with the last-good-reply policy (LGRP). The LGRP combined with N-grams is also tested. The LGRP has already been shown to be successful in Go programs and we demonstrate that it also has promise in GGP.

KW - General game playing (GGP)

KW - last-good-reply policy (LGRP)

KW - Monte Carlo tree search (MCTS)

KW - N-grams

U2 - 10.1109/TCIAIG.2012.2200252

DO - 10.1109/TCIAIG.2012.2200252

M3 - Article

SN - 1943-068X

VL - 4

SP - 73

EP - 83

JO - IEEE Transactions on Computational Intelligence and AI in Games

JF - IEEE Transactions on Computational Intelligence and AI in Games

IS - 2

ER -