Online Learning over a Finite Action Set with Limited Switching-Reference-Cited by-同舟云学术

Online Learning over a Finite Action Set with Limited Switching

Published:2021-02 Issue:1 Volume:46 Page:179-203
ISSN:0364-765X
Container-title:Mathematics of Operations Research
language:en
Short-container-title:Mathematics of OR

Author:

Altschuler Jason M.¹^ORCID,Talwar Kunal²

Affiliation:

1. Laboratory for Information and Decision Systems, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139;

2. Google Brain, Mountain View, California 94043

Abstract

This paper studies the value of switching actions in the Prediction From Experts problem (PFE) and Adversarial Multiarmed Bandits problem (MAB). First, we revisit the well-studied and practically motivated setting of PFE with switching costs. Many algorithms achieve the minimax optimal order for both regret and switches in expectation; however, high probability guarantees are an open problem. We present the first algorithms that achieve this optimal order for both quantities with high probability. This also implies the first high probability guarantees for several other problems, and, in particular, is efficiently adaptable to online combinatorial optimization with limited switching. Next, to investigate the value of switching actions more granularly, we introduce the switching budget setting, which limits algorithms to a fixed number of (costless) switches. Using this result and several reductions, we unify previous work and completely characterize the complexity of this switching budget setting up to small polylogarithmic factors: for both PFE and MAB, for all switching budgets, and for both expectation and high probability guarantees. Interestingly, as the switching budget decreases, the minimax regret rate admits a phase transition for PFE but not for MAB. These results recover and generalize the known minimax rates for the (arbitrary) switching cost setting.

Publisher

Institute for Operations Research and the Management Sciences (INFORMS)

Subject

Management Science and Operations Research,Computer Science Applications,General Mathematics

Reference33 articles.

1. Regret in Online Combinatorial Optimization

2. The Nonstochastic Multiarmed Bandit Problem

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Phase Transitions in Bandits with Switching Constraints;Management Science;2023-12