A Direct Policy-Search Algorithm for Relational Reinforcement Learning

S. Sarjant*, B. Pfahringer, K. Driessens, T. Smith

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingAcademicpeer-review

4 Downloads (Pure)

Abstract

In the field of relational reinforcement learning - a representational generalisation of reinforcement learning - the first-order representation of environments results in a potentially infinite number of possible states, requiring learning agents to use some form of abstraction to learn effectively. Instead of forming an abstraction over the state-action space, an alternative technique is to create behaviour directly through policy-search. The algorithm named CERRLA presented in this paper uses the cross-entropy method to learn behaviour directly in the form of decision-lists of relation rules for solving problems in a range of different environments, without the need for expert guidance in the learning process. The behaviour produced by the algorithm is easy to comprehend and is biased towards compactness. The results obtained show that CERRLA is competitive in both the standard testing environment and in Ms. Pac-MAN and CARCASSONNE, two large and complex game environments.
Original languageEnglish
Title of host publicationInductive Logic Programming
Subtitle of host publication23rd International Conference, ILP 2013, Rio de Janeiro, Brazil, August 28-30, 2013, Revised Selected Papers
Pages76-92
Number of pages17
ISBN (Electronic)978-3-662-44923-3
DOIs
Publication statusPublished - 2014

Publication series

SeriesLecture Notes in Computer Science
Volume8812
ISSN0302-9743

Cite this