Equilibrium tracking and convergence in dynamic games

Panayotis Mertikopoulos; Mathias Staudigl

doi:10.1109/CDC45484.2021.9683224

Equilibrium tracking and convergence in dynamic games

Panayotis Mertikopoulos^*, Mathias Staudigl

^*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceeding › Conference article in proceeding › Academic › peer-review

49 Downloads (Pure)

Abstract

In this paper, we examine the equilibrium tracking and convergence properties of no-regret learning algorithms in continuous games that evolve over time. Specifically, we focus on learning via "mirror descent", a widely used class of no-regret learning schemes where players take small steps along their individual payoff gradients and then "mirror" the output back to their action sets. In this general context, we show that the induced sequence of play stays asymptotically close to the evolving equilibrium of the sequence of stage games (assuming they are strongly monotone), and converges to it if the game stabilizes to a strictly monotone limit. Our results apply to both gradient- and payoff-based feedback, i.e., the "bandit" case where players only observe the payoffs of their chosen actions.

Original language	English
Title of host publication	2021 60th IEEE Conference on Decision and Control (CDC)
Publisher	IEEE
Pages	930-935
Number of pages	6
DOIs	https://doi.org/10.1109/CDC45484.2021.9683224
Publication status	Published - 2021

Access to Document

10.1109/CDC45484.2021.9683224

Full TextFinal published version, 346 KBLicence: Taverne

Cite this

@inproceedings{f0421d70c5e242f58eed324289eb5581,

title = "Equilibrium tracking and convergence in dynamic games",

abstract = "In this paper, we examine the equilibrium tracking and convergence properties of no-regret learning algorithms in continuous games that evolve over time. Specifically, we focus on learning via {"}mirror descent{"}, a widely used class of no-regret learning schemes where players take small steps along their individual payoff gradients and then {"}mirror{"} the output back to their action sets. In this general context, we show that the induced sequence of play stays asymptotically close to the evolving equilibrium of the sequence of stage games (assuming they are strongly monotone), and converges to it if the game stabilizes to a strictly monotone limit. Our results apply to both gradient- and payoff-based feedback, i.e., the {"}bandit{"} case where players only observe the payoffs of their chosen actions.",

author = "Panayotis Mertikopoulos and Mathias Staudigl",

note = "Publisher Copyright: {\textcopyright} 2021 IEEE.",

year = "2021",

doi = "10.1109/CDC45484.2021.9683224",

language = "English",

pages = "930--935",

booktitle = "2021 60th IEEE Conference on Decision and Control (CDC)",

publisher = "IEEE",

address = "United States",

}

TY - GEN

T1 - Equilibrium tracking and convergence in dynamic games

AU - Mertikopoulos, Panayotis

AU - Staudigl, Mathias

PY - 2021

Y1 - 2021

N2 - In this paper, we examine the equilibrium tracking and convergence properties of no-regret learning algorithms in continuous games that evolve over time. Specifically, we focus on learning via "mirror descent", a widely used class of no-regret learning schemes where players take small steps along their individual payoff gradients and then "mirror" the output back to their action sets. In this general context, we show that the induced sequence of play stays asymptotically close to the evolving equilibrium of the sequence of stage games (assuming they are strongly monotone), and converges to it if the game stabilizes to a strictly monotone limit. Our results apply to both gradient- and payoff-based feedback, i.e., the "bandit" case where players only observe the payoffs of their chosen actions.

AB - In this paper, we examine the equilibrium tracking and convergence properties of no-regret learning algorithms in continuous games that evolve over time. Specifically, we focus on learning via "mirror descent", a widely used class of no-regret learning schemes where players take small steps along their individual payoff gradients and then "mirror" the output back to their action sets. In this general context, we show that the induced sequence of play stays asymptotically close to the evolving equilibrium of the sequence of stage games (assuming they are strongly monotone), and converges to it if the game stabilizes to a strictly monotone limit. Our results apply to both gradient- and payoff-based feedback, i.e., the "bandit" case where players only observe the payoffs of their chosen actions.

U2 - 10.1109/CDC45484.2021.9683224

DO - 10.1109/CDC45484.2021.9683224

M3 - Conference article in proceeding

SP - 930

EP - 935

BT - 2021 60th IEEE Conference on Decision and Control (CDC)

PB - IEEE

ER -