Multiagent Online Learning in Time-Varying Games

B. Duvocelle; P. Mertikopoulos; M. Staudigl; D. Vermeulen

doi:10.1287/moor.2022.1283

Multiagent Online Learning in Time-Varying Games

B. Duvocelle, P. Mertikopoulos, M. Staudigl^*, D. Vermeulen

^*Corresponding author for this work

Research output: Contribution to journal › Article › Academic › peer-review

234 Downloads (Pure)

Abstract

We examine the long-run behavior of multiagent online learning in games that evolve over time. Specifically, we focus on a wide class of policies based on mirror descent, and we show that the induced sequence of play (a) converges to a Nash equilibrium in time-varying games that stabilize in the long run to a strictly monotone limit, and (b) it stays asymptotically close to the evolving equilibrium of the sequence of stage games (assuming they are strongly monotone). Our results apply to both gradient- and payoffbased feedback - that is, when players only get to observe the payoffs of their chosen actions.

Original language	English
Pages (from-to)	914-941
Number of pages	29
Journal	Mathematics of Operations Research
Volume	48
Issue number	2
Early online date	1 Jul 2022
DOIs	https://doi.org/10.1287/moor.2022.1283
Publication status	Published - May 2023

Keywords

dynamic regret
Nash equilibrium
mirror descent
time-varying games
STOCHASTIC-APPROXIMATION
OPTIMIZATION
DYNAMICS
CONVERGENCE
GRADIENT
DESCENT
PLAY
FORM

Access to Document

10.1287/moor.2022.1283

Full TextFinal published version, 900 KBLicence: Taverne

Cite this

@article{f79adccfc2cc42b7a52a98ba89246a3c,

title = "Multiagent Online Learning in Time-Varying Games",

abstract = "We examine the long-run behavior of multiagent online learning in games that evolve over time. Specifically, we focus on a wide class of policies based on mirror descent, and we show that the induced sequence of play (a) converges to a Nash equilibrium in time-varying games that stabilize in the long run to a strictly monotone limit, and (b) it stays asymptotically close to the evolving equilibrium of the sequence of stage games (assuming they are strongly monotone). Our results apply to both gradient- and payoffbased feedback - that is, when players only get to observe the payoffs of their chosen actions.",

keywords = "dynamic regret, Nash equilibrium, mirror descent, time-varying games, STOCHASTIC-APPROXIMATION, OPTIMIZATION, DYNAMICS, CONVERGENCE, GRADIENT, DESCENT, PLAY, FORM",

author = "B. Duvocelle and P. Mertikopoulos and M. Staudigl and D. Vermeulen",

note = "data source: no data used",

year = "2023",

month = may,

doi = "10.1287/moor.2022.1283",

language = "English",

volume = "48",

pages = "914--941",

journal = "Mathematics of Operations Research",

issn = "0364-765X",

publisher = "Institute for Operations Research and the Management Sciences",

number = "2",

}

TY - JOUR

T1 - Multiagent Online Learning in Time-Varying Games

AU - Duvocelle, B.

AU - Mertikopoulos, P.

AU - Staudigl, M.

AU - Vermeulen, D.

N1 - data source: no data used

PY - 2023/5

Y1 - 2023/5

N2 - We examine the long-run behavior of multiagent online learning in games that evolve over time. Specifically, we focus on a wide class of policies based on mirror descent, and we show that the induced sequence of play (a) converges to a Nash equilibrium in time-varying games that stabilize in the long run to a strictly monotone limit, and (b) it stays asymptotically close to the evolving equilibrium of the sequence of stage games (assuming they are strongly monotone). Our results apply to both gradient- and payoffbased feedback - that is, when players only get to observe the payoffs of their chosen actions.

AB - We examine the long-run behavior of multiagent online learning in games that evolve over time. Specifically, we focus on a wide class of policies based on mirror descent, and we show that the induced sequence of play (a) converges to a Nash equilibrium in time-varying games that stabilize in the long run to a strictly monotone limit, and (b) it stays asymptotically close to the evolving equilibrium of the sequence of stage games (assuming they are strongly monotone). Our results apply to both gradient- and payoffbased feedback - that is, when players only get to observe the payoffs of their chosen actions.

KW - dynamic regret

KW - Nash equilibrium

KW - mirror descent

KW - time-varying games

KW - STOCHASTIC-APPROXIMATION

KW - OPTIMIZATION

KW - DYNAMICS

KW - CONVERGENCE

KW - GRADIENT

KW - DESCENT

KW - PLAY

KW - FORM

U2 - 10.1287/moor.2022.1283

DO - 10.1287/moor.2022.1283

M3 - Article

SN - 0364-765X

VL - 48

SP - 914

EP - 941

JO - Mathematics of Operations Research

JF - Mathematics of Operations Research

IS - 2

ER -