Abstract
We examine the long-run behavior of multiagent online learning in games that evolve over time. Specifically, we focus on a wide class of policies based on mirror descent, and we show that the induced sequence of play (a) converges to a Nash equilibrium in time-varying games that stabilize in the long run to a strictly monotone limit, and (b) it stays asymptotically close to the evolving equilibrium of the sequence of stage games (assuming they are strongly monotone). Our results apply to both gradient- and payoffbased feedback - that is, when players only get to observe the payoffs of their chosen actions.
Original language | English |
---|---|
Pages (from-to) | 914-941 |
Number of pages | 29 |
Journal | Mathematics of Operations Research |
Volume | 48 |
Issue number | 2 |
Early online date | 1 Jul 2022 |
DOIs | |
Publication status | Published - May 2023 |
Keywords
- dynamic regret
- Nash equilibrium
- mirror descent
- time-varying games
- STOCHASTIC-APPROXIMATION
- OPTIMIZATION
- DYNAMICS
- CONVERGENCE
- GRADIENT
- DESCENT
- PLAY
- FORM