Simplifying optimal strategies in stochastic games

J Flesch; F Thuijsman; OJ Vrieze

doi:10.1137/S0363012996311940

Simplifying optimal strategies in stochastic games

J Flesch^*, F Thuijsman, OJ Vrieze

^*Corresponding author for this work

Networks and Strategic Optimization

Research output: Contribution to journal › Article › Academic › peer-review

50 Downloads (Pure)

Abstract

We deal with zero-sum limiting average stochastic games. We show that the existence of arbitrary optimal strategies implies the existence of stationary epsilon-optimal strategies, for all epsilon > 0, and the existence of Markov optimal strategies. We present such a construction for which we do not even need to know these optimal strategies. Furthermore, an example demonstrates that the existence of stationary optimal strategies is not implied by the existence of optimal strategies, so the result is sharp. More generally, one can evaluate a strategy pi for the maximizing player, player 1, by the reward phi(s)(pi) that pi guarantees to him when starting in state s. A strategy pi is called nonimproving if phi(s)(pi) greater than or equal to phi(s)(pi[h]) for all s and for all finite histories h with final state s, where pi[h] is the strategy pi conditional on the history h. Using the evaluation phi, we may define the relation "epsilon-better" between strategies. A strategy pi(1) is called epsilon-better than pi(2) if phi(s)(pi(1)) greater than or equal to phi(s)(pi(2)) - epsilon for all s. We show that for any nonimproving strategy pi, for all epsilon > 0, there exists an epsilon-better stationary strategy and a (0-)better Markov strategy as well. Since all optimal strategies are nonimproving, this result can be regarded as a generalization of the above result for optimal strategies. Finally, we briefly discuss some other extensions. Among others, we indicate possible simplifications of strategies that are only optimal for particular initial states by "almost stationary" epsilon-optimal strategies, for all epsilon > 0, and by "almost Markov" optimal strategies. We also discuss the validity of the above results for other reward functions. Several examples clarify these issues.

Original language	English
Pages (from-to)	1331-1347
Journal	Siam Journal on Control and Optimization
Volume	36
Issue number	4
DOIs	https://doi.org/10.1137/S0363012996311940
Publication status	Published - Jul 1998

Keywords

stochastic games
limiting average rewards
optimality
Markov strategies
stationary strategies

Access to Document

10.1137/S0363012996311940

Full TextFinal published version, 303 KBLicence: Taverne

Cite this

@article{83991c218138451b8b2668157f04c3e6,

title = "Simplifying optimal strategies in stochastic games",

abstract = "We deal with zero-sum limiting average stochastic games. We show that the existence of arbitrary optimal strategies implies the existence of stationary epsilon-optimal strategies, for all epsilon > 0, and the existence of Markov optimal strategies. We present such a construction for which we do not even need to know these optimal strategies. Furthermore, an example demonstrates that the existence of stationary optimal strategies is not implied by the existence of optimal strategies, so the result is sharp. More generally, one can evaluate a strategy pi for the maximizing player, player 1, by the reward phi(s)(pi) that pi guarantees to him when starting in state s. A strategy pi is called nonimproving if phi(s)(pi) greater than or equal to phi(s)(pi[h]) for all s and for all finite histories h with final state s, where pi[h] is the strategy pi conditional on the history h. Using the evaluation phi, we may define the relation {"}epsilon-better{"} between strategies. A strategy pi(1) is called epsilon-better than pi(2) if phi(s)(pi(1)) greater than or equal to phi(s)(pi(2)) - epsilon for all s. We show that for any nonimproving strategy pi, for all epsilon > 0, there exists an epsilon-better stationary strategy and a (0-)better Markov strategy as well. Since all optimal strategies are nonimproving, this result can be regarded as a generalization of the above result for optimal strategies. Finally, we briefly discuss some other extensions. Among others, we indicate possible simplifications of strategies that are only optimal for particular initial states by {"}almost stationary{"} epsilon-optimal strategies, for all epsilon > 0, and by {"}almost Markov{"} optimal strategies. We also discuss the validity of the above results for other reward functions. Several examples clarify these issues.",

keywords = "stochastic games, limiting average rewards, optimality, Markov strategies, stationary strategies",

author = "J Flesch and F Thuijsman and OJ Vrieze",

year = "1998",

month = jul,

doi = "10.1137/S0363012996311940",

language = "English",

volume = "36",

pages = "1331--1347",

journal = "Siam Journal on Control and Optimization",

issn = "0363-0129",

publisher = "SIAM Publications",

number = "4",

}

TY - JOUR

T1 - Simplifying optimal strategies in stochastic games

AU - Flesch, J

AU - Thuijsman, F

AU - Vrieze, OJ

PY - 1998/7

Y1 - 1998/7

N2 - We deal with zero-sum limiting average stochastic games. We show that the existence of arbitrary optimal strategies implies the existence of stationary epsilon-optimal strategies, for all epsilon > 0, and the existence of Markov optimal strategies. We present such a construction for which we do not even need to know these optimal strategies. Furthermore, an example demonstrates that the existence of stationary optimal strategies is not implied by the existence of optimal strategies, so the result is sharp. More generally, one can evaluate a strategy pi for the maximizing player, player 1, by the reward phi(s)(pi) that pi guarantees to him when starting in state s. A strategy pi is called nonimproving if phi(s)(pi) greater than or equal to phi(s)(pi[h]) for all s and for all finite histories h with final state s, where pi[h] is the strategy pi conditional on the history h. Using the evaluation phi, we may define the relation "epsilon-better" between strategies. A strategy pi(1) is called epsilon-better than pi(2) if phi(s)(pi(1)) greater than or equal to phi(s)(pi(2)) - epsilon for all s. We show that for any nonimproving strategy pi, for all epsilon > 0, there exists an epsilon-better stationary strategy and a (0-)better Markov strategy as well. Since all optimal strategies are nonimproving, this result can be regarded as a generalization of the above result for optimal strategies. Finally, we briefly discuss some other extensions. Among others, we indicate possible simplifications of strategies that are only optimal for particular initial states by "almost stationary" epsilon-optimal strategies, for all epsilon > 0, and by "almost Markov" optimal strategies. We also discuss the validity of the above results for other reward functions. Several examples clarify these issues.

AB - We deal with zero-sum limiting average stochastic games. We show that the existence of arbitrary optimal strategies implies the existence of stationary epsilon-optimal strategies, for all epsilon > 0, and the existence of Markov optimal strategies. We present such a construction for which we do not even need to know these optimal strategies. Furthermore, an example demonstrates that the existence of stationary optimal strategies is not implied by the existence of optimal strategies, so the result is sharp. More generally, one can evaluate a strategy pi for the maximizing player, player 1, by the reward phi(s)(pi) that pi guarantees to him when starting in state s. A strategy pi is called nonimproving if phi(s)(pi) greater than or equal to phi(s)(pi[h]) for all s and for all finite histories h with final state s, where pi[h] is the strategy pi conditional on the history h. Using the evaluation phi, we may define the relation "epsilon-better" between strategies. A strategy pi(1) is called epsilon-better than pi(2) if phi(s)(pi(1)) greater than or equal to phi(s)(pi(2)) - epsilon for all s. We show that for any nonimproving strategy pi, for all epsilon > 0, there exists an epsilon-better stationary strategy and a (0-)better Markov strategy as well. Since all optimal strategies are nonimproving, this result can be regarded as a generalization of the above result for optimal strategies. Finally, we briefly discuss some other extensions. Among others, we indicate possible simplifications of strategies that are only optimal for particular initial states by "almost stationary" epsilon-optimal strategies, for all epsilon > 0, and by "almost Markov" optimal strategies. We also discuss the validity of the above results for other reward functions. Several examples clarify these issues.

KW - stochastic games

KW - limiting average rewards

KW - optimality

KW - Markov strategies

KW - stationary strategies

U2 - 10.1137/S0363012996311940

DO - 10.1137/S0363012996311940

M3 - Article

SN - 0363-0129

VL - 36

SP - 1331

EP - 1347

JO - Siam Journal on Control and Optimization

JF - Siam Journal on Control and Optimization

IS - 4

ER -