Abstract
We examine the long-run behavior of multiagent online learning in games that evolve over time. Specifically, we focus on a wide class of policies based on mirror descent, and we show that the induced sequence of play (a) converges to a Nash equilibrium in time-varying games that stabilize in the long run to a strictly monotone limit, and (b) it stays asymptotically close to the evolving equilibrium of the sequence of stage games (assuming they are strongly monotone). Our results apply to both gradient- and payoffbased feedback - that is, when players only get to observe the payoffs of their chosen actions.
| Original language | English |
|---|---|
| Pages (from-to) | 914-941 |
| Number of pages | 29 |
| Journal | Mathematics of Operations Research |
| Volume | 48 |
| Issue number | 2 |
| Early online date | 1 Jul 2022 |
| DOIs | |
| Publication status | Published - May 2023 |
Keywords
- dynamic regret
- Nash equilibrium
- mirror descent
- time-varying games
- STOCHASTIC-APPROXIMATION
- OPTIMIZATION
- DYNAMICS
- CONVERGENCE
- GRADIENT
- DESCENT
- PLAY
- FORM