Integrating guidance into relational reinforcement learning

K Driessens; S Dzeroski

doi:10.1023/B:MACH.0000039779.47329.3a

Integrating guidance into relational reinforcement learning

K Driessens^*, S Dzeroski

^*Corresponding author for this work

Research output: Contribution to journal › Article › Academic › peer-review

Abstract

Reinforcement learning, and q-learning in particular, encounter two major problems when dealing with large state spaces. First, learning the q-function in tabular form may be infeasible because of the excessive amount of memory needed to store the table, and because the q-function only converges after each state has been visited multiple times. Second, rewards in the state space may be so sparse that with random exploration they will only be discovered extremely slowly. The first problem is often solved by learning a generalization of the encountered examples (e.g., using a neural net or decision tree). Relational reinforcement learning (rrl) is such an approach; it makes q-learning feasible in structural domains by incorporating a relational learner into q-learning. The problem of sparse rewards has not been addressed for rrl. This paper presents a solution based on the use of “reasonable policies” to provide guidance. Different types of policies and different strategies to supply guidance through these policies are discussed and evaluated experimentally in several relational domains to show the merits of the approach.

Original language	English
Pages (from-to)	271-304
Journal	Machine Learning
Volume	57
Issue number	3
DOIs	https://doi.org/10.1023/B:MACH.0000039779.47329.3a
Publication status	Published - Dec 2004
Externally published	Yes

Keywords

reinforcement learning
relational learning
guided exploration

Access to Document

10.1023/B:MACH.0000039779.47329.3a

Cite this

@article{798d33b7a32e45d481de63083713c8be,

title = "Integrating guidance into relational reinforcement learning",

abstract = "Reinforcement learning, and q-learning in particular, encounter two major problems when dealing with large state spaces. First, learning the q-function in tabular form may be infeasible because of the excessive amount of memory needed to store the table, and because the q-function only converges after each state has been visited multiple times. Second, rewards in the state space may be so sparse that with random exploration they will only be discovered extremely slowly. The first problem is often solved by learning a generalization of the encountered examples (e.g., using a neural net or decision tree). Relational reinforcement learning (rrl) is such an approach; it makes q-learning feasible in structural domains by incorporating a relational learner into q-learning. The problem of sparse rewards has not been addressed for rrl. This paper presents a solution based on the use of “reasonable policies” to provide guidance. Different types of policies and different strategies to supply guidance through these policies are discussed and evaluated experimentally in several relational domains to show the merits of the approach.",

keywords = "reinforcement learning, relational learning, guided exploration",

author = "K Driessens and S Dzeroski",

year = "2004",

month = dec,

doi = "10.1023/B:MACH.0000039779.47329.3a",

language = "English",

volume = "57",

pages = "271--304",

journal = "Machine Learning",

issn = "0885-6125",

publisher = "Springer",

number = "3",

}

TY - JOUR

T1 - Integrating guidance into relational reinforcement learning

AU - Driessens, K

AU - Dzeroski, S

PY - 2004/12

Y1 - 2004/12

N2 - Reinforcement learning, and q-learning in particular, encounter two major problems when dealing with large state spaces. First, learning the q-function in tabular form may be infeasible because of the excessive amount of memory needed to store the table, and because the q-function only converges after each state has been visited multiple times. Second, rewards in the state space may be so sparse that with random exploration they will only be discovered extremely slowly. The first problem is often solved by learning a generalization of the encountered examples (e.g., using a neural net or decision tree). Relational reinforcement learning (rrl) is such an approach; it makes q-learning feasible in structural domains by incorporating a relational learner into q-learning. The problem of sparse rewards has not been addressed for rrl. This paper presents a solution based on the use of “reasonable policies” to provide guidance. Different types of policies and different strategies to supply guidance through these policies are discussed and evaluated experimentally in several relational domains to show the merits of the approach.

AB - Reinforcement learning, and q-learning in particular, encounter two major problems when dealing with large state spaces. First, learning the q-function in tabular form may be infeasible because of the excessive amount of memory needed to store the table, and because the q-function only converges after each state has been visited multiple times. Second, rewards in the state space may be so sparse that with random exploration they will only be discovered extremely slowly. The first problem is often solved by learning a generalization of the encountered examples (e.g., using a neural net or decision tree). Relational reinforcement learning (rrl) is such an approach; it makes q-learning feasible in structural domains by incorporating a relational learner into q-learning. The problem of sparse rewards has not been addressed for rrl. This paper presents a solution based on the use of “reasonable policies” to provide guidance. Different types of policies and different strategies to supply guidance through these policies are discussed and evaluated experimentally in several relational domains to show the merits of the approach.

KW - reinforcement learning

KW - relational learning

KW - guided exploration

U2 - 10.1023/B:MACH.0000039779.47329.3a

DO - 10.1023/B:MACH.0000039779.47329.3a

M3 - Article

SN - 0885-6125

VL - 57

SP - 271

EP - 304

JO - Machine Learning

JF - Machine Learning

IS - 3

ER -