Kernel-based Methods for Bandit Convex Optimization-Reference-Cited by-同舟云学术

Kernel-based Methods for Bandit Convex Optimization

Published:2021-06-30 Issue:4 Volume:68 Page:1-35
ISSN:0004-5411
Container-title:Journal of the ACM
language:en
Short-container-title:J. ACM

Author:

Bubeck Sébastien¹,Eldan Ronen²,Lee Yin Tat³

Affiliation:

1. Microsoft Research, USA

2. Weizmann Institute of Science, Israel

3. University of Washington, USA

Abstract

We consider the adversarial convex bandit problem and we build the first poly( T )-time algorithm with poly( n ) √ T -regret for this problem. To do so, we introduce three new ideas in the derivative-free optimization literature: (i) kernel methods, (ii) a generalization of Bernoulli convolutions, and (iii) a new annealing schedule for exponential weights (with increasing learning rate). The basic version of our algorithm achieves Õ( n 9.5 √ T )-regret, and we show that a simple variant of this algorithm can be run in poly( n log ( T ))-time per step (for polytopes with polynomially many constraints) at the cost of an additional poly( n ) T o(1) factor in the regret. These results improve upon the Õ( n 11 √ T -regret and exp (poly( T ))-time result of the first two authors and the log ( T ) poly( n ) √ T -regret and log( T ) poly( n ) -time result of Hazan and Li. Furthermore, we conjecture that another variant of the algorithm could achieve Õ( n 1.5 √ T )-regret, and moreover that this regret is unimprovable (the current best lower bound being Ω ( n √ T ) and it is achieved with linear functions). For the simpler situation of zeroth order stochastic convex optimization this corresponds to the conjecture that the optimal query complexity is of order n 3 / ɛ 2 .

Funder

European Research Council Starting Grant

Israel Science Foundation

Publisher

Association for Computing Machinery (ACM)

Subject

Artificial Intelligence,Hardware and Architecture,Information Systems,Control and Systems Engineering,Software

Link

https://dl.acm.org/doi/pdf/10.1145/3453721

Reference36 articles.

1. Entropy jumps in the presence of a spectral gap

Cited by 6 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Adaptive, Doubly Optimal No-Regret Learning in Strongly Monotone and Exp-Concave Games with Gradient Feedback;Operations Research;2024-05-23

2. A quantum online portfolio optimization algorithm;Quantum Information Processing;2024-02-16

3. The Online Saddle Point Problem and Online Convex Optimization with Knapsacks;Mathematics of Operations Research;2024-01-12

4. Online $$\textrm{L}^{\natural }$$-Convex Minimization;Lecture Notes in Computer Science;2024

5. Technical Note—On Adaptivity in Nonstationary Stochastic Optimization with Bandit Feedback;Operations Research;2023-07-31