Affiliation:
1. Naver Corporation, Seongnam-si, Rebublic of Korea
2. KTH, Stockholm, Sweden
3. KAIST, Daejeon, Rebublic of Korea
Abstract
We study multi-armed bandit (MAB) problems with additional observations, where in each round, the decision maker selects an arm to play and can also observe rewards of additional arms (within a given budget) by paying certain costs. In the case of stochastic rewards, we develop a new algorithm KL-UCB-AO which is asymptotically optimal when the time horizon grows large, by smartly identifying the optimal set of the arms to be explored using the given budget of additional observations. In the case of adversarial rewards, we propose H-INF, an algorithm with order-optimal regret. H-INF exploits a two-layered structure where in each layer, we run a known optimal MAB algorithm. Such a hierarchical structure facilitates the regret analysis of the algorithm, and in turn, yields order-optimal regret. We apply the framework of MAB with additional observations to the design of rate adaptation schemes in 802.11-like wireless systems, and to that of online advertisement systems. In both cases, we demonstrate that our algorithms leverage additional observations to significantly improve the system performance. We believe the techniques developed in this paper are of independent interest for other MAB problems, e.g., contextual or graph-structured MAB.
Funder
ERC consolidator
National ResearchFoundation of Kore
ICT R&D program of MSIP/IITP
Publisher
Association for Computing Machinery (ACM)
Subject
Computer Networks and Communications,Hardware and Architecture,Safety, Risk, Reliability and Quality,Computer Science (miscellaneous)
Reference33 articles.
1. Noga Alon Nicolo Cesa-Bianchi Claudio Gentile and Yishay Mansour. 2013. From bandits to experts: A tale of domination and independence Proceedings of NIPS. Noga Alon Nicolo Cesa-Bianchi Claudio Gentile and Yishay Mansour. 2013. From bandits to experts: A tale of domination and independence Proceedings of NIPS.
2. The Nonstochastic Multiarmed Bandit Problem
Cited by
7 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献