A Direct Policy-Search Algorithm for Relational Reinforcement Learning

S. Sarjant; B. Pfahringer; K. Driessens; T. Smith

doi:10.1007/978-3-662-44923-3_6

A Direct Policy-Search Algorithm for Relational Reinforcement Learning

S. Sarjant^*, B. Pfahringer, K. Driessens, T. Smith

^*Corresponding author for this work

Robots, Agents, Interaction

Research output: Chapter in Book/Report/Conference proceeding › Conference article in proceeding › Academic › peer-review

4 Downloads (Pure)

Abstract

In the field of relational reinforcement learning - a representational generalisation of reinforcement learning - the first-order representation of environments results in a potentially infinite number of possible states, requiring learning agents to use some form of abstraction to learn effectively. Instead of forming an abstraction over the state-action space, an alternative technique is to create behaviour directly through policy-search. The algorithm named CERRLA presented in this paper uses the cross-entropy method to learn behaviour directly in the form of decision-lists of relation rules for solving problems in a range of different environments, without the need for expert guidance in the learning process. The behaviour produced by the algorithm is easy to comprehend and is biased towards compactness. The results obtained show that CERRLA is competitive in both the standard testing environment and in Ms. Pac-MAN and CARCASSONNE, two large and complex game environments.

Original language	English
Title of host publication	Inductive Logic Programming
Subtitle of host publication	23rd International Conference, ILP 2013, Rio de Janeiro, Brazil, August 28-30, 2013, Revised Selected Papers
Pages	76-92
Number of pages	17
ISBN (Electronic)	978-3-662-44923-3
DOIs	https://doi.org/10.1007/978-3-662-44923-3_6
Publication status	Published - 2014

Publication series

Series	Lecture Notes in Computer Science
Volume	8812
ISSN	0302-9743

Access to Document

10.1007/978-3-662-44923-3_6

Cite this

@inproceedings{8da0e9ec9015416398ff9957f0717f9e,

title = "A Direct Policy-Search Algorithm for Relational Reinforcement Learning",

abstract = "In the field of relational reinforcement learning - a representational generalisation of reinforcement learning - the first-order representation of environments results in a potentially infinite number of possible states, requiring learning agents to use some form of abstraction to learn effectively. Instead of forming an abstraction over the state-action space, an alternative technique is to create behaviour directly through policy-search. The algorithm named CERRLA presented in this paper uses the cross-entropy method to learn behaviour directly in the form of decision-lists of relation rules for solving problems in a range of different environments, without the need for expert guidance in the learning process. The behaviour produced by the algorithm is easy to comprehend and is biased towards compactness. The results obtained show that CERRLA is competitive in both the standard testing environment and in Ms. Pac-MAN and CARCASSONNE, two large and complex game environments.",

author = "S. Sarjant and B. Pfahringer and K. Driessens and T. Smith",

year = "2014",

doi = "10.1007/978-3-662-44923-3_6",

language = "English",

isbn = "978-3-662-44922-6",

series = "Lecture Notes in Computer Science",

publisher = "Springer Nature Switzerland AG",

pages = "76--92",

booktitle = "Inductive Logic Programming",

}

Sarjant, S, Pfahringer, B, Driessens, K & Smith, T 2014, A Direct Policy-Search Algorithm for Relational Reinforcement Learning. in Inductive Logic Programming: 23rd International Conference, ILP 2013, Rio de Janeiro, Brazil, August 28-30, 2013, Revised Selected Papers. Lecture Notes in Computer Science, vol. 8812, pp. 76-92. https://doi.org/10.1007/978-3-662-44923-3_6

A Direct Policy-Search Algorithm for Relational Reinforcement Learning. / Sarjant, S.; Pfahringer, B.; Driessens, K. et al.
Inductive Logic Programming: 23rd International Conference, ILP 2013, Rio de Janeiro, Brazil, August 28-30, 2013, Revised Selected Papers. 2014. p. 76-92 (Lecture Notes in Computer Science, Vol. 8812).

Research output: Chapter in Book/Report/Conference proceeding › Conference article in proceeding › Academic › peer-review

TY - GEN

T1 - A Direct Policy-Search Algorithm for Relational Reinforcement Learning

AU - Sarjant, S.

AU - Pfahringer, B.

AU - Driessens, K.

AU - Smith, T.

PY - 2014

Y1 - 2014

N2 - In the field of relational reinforcement learning - a representational generalisation of reinforcement learning - the first-order representation of environments results in a potentially infinite number of possible states, requiring learning agents to use some form of abstraction to learn effectively. Instead of forming an abstraction over the state-action space, an alternative technique is to create behaviour directly through policy-search. The algorithm named CERRLA presented in this paper uses the cross-entropy method to learn behaviour directly in the form of decision-lists of relation rules for solving problems in a range of different environments, without the need for expert guidance in the learning process. The behaviour produced by the algorithm is easy to comprehend and is biased towards compactness. The results obtained show that CERRLA is competitive in both the standard testing environment and in Ms. Pac-MAN and CARCASSONNE, two large and complex game environments.

AB - In the field of relational reinforcement learning - a representational generalisation of reinforcement learning - the first-order representation of environments results in a potentially infinite number of possible states, requiring learning agents to use some form of abstraction to learn effectively. Instead of forming an abstraction over the state-action space, an alternative technique is to create behaviour directly through policy-search. The algorithm named CERRLA presented in this paper uses the cross-entropy method to learn behaviour directly in the form of decision-lists of relation rules for solving problems in a range of different environments, without the need for expert guidance in the learning process. The behaviour produced by the algorithm is easy to comprehend and is biased towards compactness. The results obtained show that CERRLA is competitive in both the standard testing environment and in Ms. Pac-MAN and CARCASSONNE, two large and complex game environments.

U2 - 10.1007/978-3-662-44923-3_6

DO - 10.1007/978-3-662-44923-3_6

M3 - Conference article in proceeding

SN - 978-3-662-44922-6

T3 - Lecture Notes in Computer Science

SP - 76

EP - 92

BT - Inductive Logic Programming

ER -