Biasing MCTS with Features for General Games

Dennis Soemers; Eric Piette; Cameron Browne

doi:10.1109/CEC.2019.8790141

Biasing MCTS with Features for General Games

Dennis Soemers^*, Eric Piette, Cameron Browne

^*Corresponding author for this work

Advanced Computing Sciences

Research output: Chapter in Book/Report/Conference proceeding › Conference article in proceeding › Academic › peer-review

137 Downloads (Pure)

Abstract

This paper proposes using a linear function approximator, rather than a deep neural network (DNN), to bias a Monte Carlo tree search (MCTS) player for general games. This is unlikely to match the potential raw playing strength of DNNs, but has advantages in terms of generality, interpretability and resources (time and hardware) required for training. Features describing local patterns are used as inputs. The features are formulated in such a way that they are easily interpretable and applicable to a wide range of general games, and might encode simple local strategies. We gradually create new features during the same self-play training process used to learn feature weights. We evaluate the playing strength of an MCTS player biased by learnt features against a standard upper confidence bounds for trees (UCT) player in multiple different board games, and demonstrate significantly improved playing strength in the majority of them after a small number of self-play training games.

Original language	English
Title of host publication	IEEE Congress on Evolutionary Computation
Subtitle of host publication	(CEC'19)
Pages	450-457
Number of pages	8
DOIs	https://doi.org/10.1109/CEC.2019.8790141
Publication status	Published - 11 Jun 2019

Keywords

GO
features
games
learning
search

Access to Document

10.1109/CEC.2019.8790141

Full TextFinal published version, 433 KBLicence: Taverne
cec19

Cite this

@inproceedings{2f7e20afdd5b436a9b40b2eff181e1e0,

title = "Biasing MCTS with Features for General Games",

abstract = "This paper proposes using a linear function approximator, rather than a deep neural network (DNN), to bias a Monte Carlo tree search (MCTS) player for general games. This is unlikely to match the potential raw playing strength of DNNs, but has advantages in terms of generality, interpretability and resources (time and hardware) required for training. Features describing local patterns are used as inputs. The features are formulated in such a way that they are easily interpretable and applicable to a wide range of general games, and might encode simple local strategies. We gradually create new features during the same self-play training process used to learn feature weights. We evaluate the playing strength of an MCTS player biased by learnt features against a standard upper confidence bounds for trees (UCT) player in multiple different board games, and demonstrate significantly improved playing strength in the majority of them after a small number of self-play training games.",

keywords = "GO, features, games, learning, search",

author = "Dennis Soemers and Eric Piette and Cameron Browne",

note = "Funding Information: This research is part of the European Research Council-funded Digital Ludeme Project (ERC Consolidator Grant #771292) run by Cameron Browne at Maastricht University{\textquoteright}s Department of Data Science and Knowledge Engineering. Publisher Copyright: {\textcopyright} 2019 IEEE.",

year = "2019",

month = jun,

day = "11",

doi = "10.1109/CEC.2019.8790141",

language = "English",

pages = "450--457",

booktitle = "IEEE Congress on Evolutionary Computation",

}

TY - GEN

T1 - Biasing MCTS with Features for General Games

AU - Soemers, Dennis

AU - Piette, Eric

AU - Browne, Cameron

N1 - Funding Information: This research is part of the European Research Council-funded Digital Ludeme Project (ERC Consolidator Grant #771292) run by Cameron Browne at Maastricht University’s Department of Data Science and Knowledge Engineering. Publisher Copyright: © 2019 IEEE.

PY - 2019/6/11

Y1 - 2019/6/11

N2 - This paper proposes using a linear function approximator, rather than a deep neural network (DNN), to bias a Monte Carlo tree search (MCTS) player for general games. This is unlikely to match the potential raw playing strength of DNNs, but has advantages in terms of generality, interpretability and resources (time and hardware) required for training. Features describing local patterns are used as inputs. The features are formulated in such a way that they are easily interpretable and applicable to a wide range of general games, and might encode simple local strategies. We gradually create new features during the same self-play training process used to learn feature weights. We evaluate the playing strength of an MCTS player biased by learnt features against a standard upper confidence bounds for trees (UCT) player in multiple different board games, and demonstrate significantly improved playing strength in the majority of them after a small number of self-play training games.

AB - This paper proposes using a linear function approximator, rather than a deep neural network (DNN), to bias a Monte Carlo tree search (MCTS) player for general games. This is unlikely to match the potential raw playing strength of DNNs, but has advantages in terms of generality, interpretability and resources (time and hardware) required for training. Features describing local patterns are used as inputs. The features are formulated in such a way that they are easily interpretable and applicable to a wide range of general games, and might encode simple local strategies. We gradually create new features during the same self-play training process used to learn feature weights. We evaluate the playing strength of an MCTS player biased by learnt features against a standard upper confidence bounds for trees (UCT) player in multiple different board games, and demonstrate significantly improved playing strength in the majority of them after a small number of self-play training games.

KW - GO

KW - features

KW - games

KW - learning

KW - search

U2 - 10.1109/CEC.2019.8790141

DO - 10.1109/CEC.2019.8790141

M3 - Conference article in proceeding

SP - 450

EP - 457

BT - IEEE Congress on Evolutionary Computation

ER -