Adaptive, Doubly Optimal No-Regret Learning in Strongly Monotone and Exp-Concave Games with Gradient Feedback-Reference-Cited by-同舟云学术

Adaptive, Doubly Optimal No-Regret Learning in Strongly Monotone and Exp-Concave Games with Gradient Feedback

Published:2024-05-23 Issue: Volume: Page:
ISSN:0030-364X
Container-title:Operations Research
language:en
Short-container-title:Operations Research

Author:

Jordan Michael¹^ORCID,Lin Tianyi²^ORCID,Zhou Zhengyuan³^ORCID

Affiliation:

1. Department of Electrical Engineering and Computer Science and Department of Statistics, University of California, Berkeley, Berkeley, California 94720;

2. Department of Industrial Engineering and Operations Research, Columbia University, New York, New York 10027;

3. Stern School of Business, New York University, New York, New York 10012

Abstract

Feasible Online Learning with Gradient Feedback Online gradient descent (OGD) is well-known to be doubly optimal under strong convexity or monotonicity assumptions: (1) in the single-agent setting, it achieves an optimal regret of [Formula: see text] for strongly convex cost functions, and (2) in the multiagent setting of strongly monotone games with each agent employing OGD we obtain last-iterate convergence of the joint action to a unique Nash equilibrium at an optimal rate of [Formula: see text]. Whereas these finite-time guarantees highlight its merits, OGD has the drawback that it requires knowing the strong convexity/monotonicity parameters. In “Adaptive, Doubly Optimal No-Regret Learning in Strongly Monotone and Exp-Concave Games with Gradient Feedback,” M. Jordan, T. Lin, and Z. Zhou design a fully adaptive OGD algorithm, AdaOGD, that does not require a priori knowledge of these parameters. In the single-agent setting, the algorithm achieves [Formula: see text] regret under strong convexity, which is optimal up to a log factor. Further, if each agent employs AdaOGD in strongly monotone games, the joint action converges in a last-iterate sense to a unique Nash equilibrium at a rate of [Formula: see text], again optimal up to log factors. The algorithms are illustrated in a learning version of the classic newsvendor problem, in which, because of lost sales, only (noisy) gradient feedback can be observed. The results immediately yield the first feasible and near-optimal algorithm for both the single-retailer and multiretailer settings.

Publisher

Institute for Operations Research and the Management Sciences (INFORMS)

Link

https://pubsonline.informs.org/doi/pdf/10.1287/opre.2022.0446

Reference89 articles.

1. Abernethy J, Bartlett PL, Rakhlin A, Tewari A (2008) Optimal strategies and minimax lower bounds for online convex games. COLT (Omnipress, Madison, MI), 414–424.

2. Alacaoglu A, Malitsky Y (2022) Stochastic variance reduction for variational inequality methods. COLT (PMLR, New York), 778–816.

3. Ba W, Lin T, Zhang J, Zhou Z (2021) Doubly optimal no-regret online learning in strongly monotone games with bandit feedback. Preprint, submitted December 6, https://arxiv.org/abs/2112.02856.

4. Baby D, Wang YX (2021) Optimal dynamic regret in exp-concave online learning. COLT (PMLR, New York), 359–409.

5. Baby D, Wang YX (2022) Optimal dynamic regret in proper online learning with strongly convex losses and beyond. AISTATS (PMLR, New York), 1805–1845.