Improved reinforcement learning with curriculum

Joseph West; Frederic Maire; Cameron Browne; Simon Denman

doi:10.1016/j.eswa.2020.113515

Improved reinforcement learning with curriculum

Joseph West^*, Frederic Maire, Cameron Browne, Simon Denman

^*Corresponding author for this work

Research output: Contribution to journal › Article › Academic › peer-review

Abstract

Humans tend to learn complex abstract concepts faster if examples are presented in a structured manner. For instance, when learning how to play a board game, usually one of the first concepts learned is how the game ends, i.e. the actions that lead to a terminal state (win, lose or draw). The advantage of learning endgames first is that once the actions leading to a terminal state are understood, it becomes possible to incrementally learn the consequences of actions that are further away from a terminal state - we call this an end-game-first curriculum. The state-of-the-art machine learning player for general board games, AlphaZero by Google DeepMind, does not employ a structured training curriculum. Whilst Deepmind's approach is effective, their method for generating experiences by self-play is resource intensive, costing literally millions of dollars in computational resources. We have developed a new method called the endgame-first training curriculum, which, when applied to the self-play/experience-generati on loop, reduces the required computational resources to achieve the same level of learning. Our approach improves performance by not generating experiences which are expected to be of low training value. The end-gamefirst curriculum enables significant savings in processing resources and is potentially applicable to other problems that can be framed in terms of a game. (c) 2020 Elsevier Ltd. All rights reserved.

Original language	English
Article number	113515
Number of pages	15
Journal	Expert Systems with Applications
Volume	158
DOIs	https://doi.org/10.1016/j.eswa.2020.113515
Publication status	Published - 15 Nov 2020

Keywords

Curriculum learning
Reinforcement learning
Monte Carlo tree search
General game playing
NEURAL-NETWORKS
GAME
GO

Access to Document

10.1016/j.eswa.2020.113515

Cite this

@article{f6d9136042c6443690d3aaee92373aa4,

title = "Improved reinforcement learning with curriculum",

abstract = "Humans tend to learn complex abstract concepts faster if examples are presented in a structured manner. For instance, when learning how to play a board game, usually one of the first concepts learned is how the game ends, i.e. the actions that lead to a terminal state (win, lose or draw). The advantage of learning endgames first is that once the actions leading to a terminal state are understood, it becomes possible to incrementally learn the consequences of actions that are further away from a terminal state - we call this an end-game-first curriculum. The state-of-the-art machine learning player for general board games, AlphaZero by Google DeepMind, does not employ a structured training curriculum. Whilst Deepmind's approach is effective, their method for generating experiences by self-play is resource intensive, costing literally millions of dollars in computational resources. We have developed a new method called the endgame-first training curriculum, which, when applied to the self-play/experience-generati on loop, reduces the required computational resources to achieve the same level of learning. Our approach improves performance by not generating experiences which are expected to be of low training value. The end-gamefirst curriculum enables significant savings in processing resources and is potentially applicable to other problems that can be framed in terms of a game. (c) 2020 Elsevier Ltd. All rights reserved.",

keywords = "Curriculum learning, Reinforcement learning, Monte Carlo tree search, General game playing, NEURAL-NETWORKS, GAME, GO",

author = "Joseph West and Frederic Maire and Cameron Browne and Simon Denman",

year = "2020",

month = nov,

day = "15",

doi = "10.1016/j.eswa.2020.113515",

language = "English",

volume = "158",

journal = "Expert Systems with Applications",

issn = "0957-4174",

publisher = "Elsevier Science",

}

TY - JOUR

T1 - Improved reinforcement learning with curriculum

AU - West, Joseph

AU - Maire, Frederic

AU - Browne, Cameron

AU - Denman, Simon

PY - 2020/11/15

Y1 - 2020/11/15

N2 - Humans tend to learn complex abstract concepts faster if examples are presented in a structured manner. For instance, when learning how to play a board game, usually one of the first concepts learned is how the game ends, i.e. the actions that lead to a terminal state (win, lose or draw). The advantage of learning endgames first is that once the actions leading to a terminal state are understood, it becomes possible to incrementally learn the consequences of actions that are further away from a terminal state - we call this an end-game-first curriculum. The state-of-the-art machine learning player for general board games, AlphaZero by Google DeepMind, does not employ a structured training curriculum. Whilst Deepmind's approach is effective, their method for generating experiences by self-play is resource intensive, costing literally millions of dollars in computational resources. We have developed a new method called the endgame-first training curriculum, which, when applied to the self-play/experience-generati on loop, reduces the required computational resources to achieve the same level of learning. Our approach improves performance by not generating experiences which are expected to be of low training value. The end-gamefirst curriculum enables significant savings in processing resources and is potentially applicable to other problems that can be framed in terms of a game. (c) 2020 Elsevier Ltd. All rights reserved.

AB - Humans tend to learn complex abstract concepts faster if examples are presented in a structured manner. For instance, when learning how to play a board game, usually one of the first concepts learned is how the game ends, i.e. the actions that lead to a terminal state (win, lose or draw). The advantage of learning endgames first is that once the actions leading to a terminal state are understood, it becomes possible to incrementally learn the consequences of actions that are further away from a terminal state - we call this an end-game-first curriculum. The state-of-the-art machine learning player for general board games, AlphaZero by Google DeepMind, does not employ a structured training curriculum. Whilst Deepmind's approach is effective, their method for generating experiences by self-play is resource intensive, costing literally millions of dollars in computational resources. We have developed a new method called the endgame-first training curriculum, which, when applied to the self-play/experience-generati on loop, reduces the required computational resources to achieve the same level of learning. Our approach improves performance by not generating experiences which are expected to be of low training value. The end-gamefirst curriculum enables significant savings in processing resources and is potentially applicable to other problems that can be framed in terms of a game. (c) 2020 Elsevier Ltd. All rights reserved.

KW - Curriculum learning

KW - Reinforcement learning

KW - Monte Carlo tree search

KW - General game playing

KW - NEURAL-NETWORKS

KW - GAME

KW - GO

U2 - 10.1016/j.eswa.2020.113515

DO - 10.1016/j.eswa.2020.113515

M3 - Article

SN - 0957-4174

VL - 158

JO - Expert Systems with Applications

JF - Expert Systems with Applications

M1 - 113515

ER -