Affiliation:
1. Microsoft Research, USA
2. Weizmann Institute of Science, Israel
3. University of Washington, USA
Abstract
We consider the adversarial convex bandit problem and we build the first poly(
T
)-time algorithm with poly(
n
) √
T
-regret for this problem. To do so, we introduce three new ideas in the derivative-free optimization literature: (i) kernel methods, (ii) a generalization of Bernoulli convolutions, and (iii) a new annealing schedule for exponential weights (with increasing learning rate). The basic version of our algorithm achieves Õ(
n
9.5
√
T
)-regret, and we show that a simple variant of this algorithm can be run in poly(
n
log (
T
))-time per step (for polytopes with polynomially many constraints) at the cost of an additional poly(
n
)
T
o(1)
factor in the regret. These results improve upon the Õ(
n
11
√
T
-regret and exp (poly(
T
))-time result of the first two authors and the log (
T
)
poly(
n
)
√
T
-regret and log(
T
)
poly(
n
)
-time result of Hazan and Li. Furthermore, we conjecture that another variant of the algorithm could achieve Õ(
n
1.5
√
T
)-regret, and moreover that this regret is unimprovable (the current best lower bound being Ω (
n
√
T
) and it is achieved with linear functions). For the simpler situation of zeroth order stochastic convex optimization this corresponds to the conjecture that the optimal query complexity is of order
n
3
/ ɛ
2
.
Funder
European Research Council Starting Grant
Israel Science Foundation
Publisher
Association for Computing Machinery (ACM)
Subject
Artificial Intelligence,Hardware and Architecture,Information Systems,Control and Systems Engineering,Software
Cited by
6 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献