Multi-armed Bandit with Additional Observations-Reference-Cited by-同舟云学术

Multi-armed Bandit with Additional Observations

Published:2018-04-03 Issue:1 Volume:2 Page:1-22
ISSN:2476-1249
Container-title:Proceedings of the ACM on Measurement and Analysis of Computing Systems
language:en
Short-container-title:Proc. ACM Meas. Anal. Comput. Syst.

Author:

Yun Donggyu¹,Proutiere Alexandre²,Ahn Sumyeong³,Shin Jinwoo³,Yi Yung³

Affiliation:

1. Naver Corporation, Seongnam-si, Rebublic of Korea

2. KTH, Stockholm, Sweden

3. KAIST, Daejeon, Rebublic of Korea

Abstract

We study multi-armed bandit (MAB) problems with additional observations, where in each round, the decision maker selects an arm to play and can also observe rewards of additional arms (within a given budget) by paying certain costs. In the case of stochastic rewards, we develop a new algorithm KL-UCB-AO which is asymptotically optimal when the time horizon grows large, by smartly identifying the optimal set of the arms to be explored using the given budget of additional observations. In the case of adversarial rewards, we propose H-INF, an algorithm with order-optimal regret. H-INF exploits a two-layered structure where in each layer, we run a known optimal MAB algorithm. Such a hierarchical structure facilitates the regret analysis of the algorithm, and in turn, yields order-optimal regret. We apply the framework of MAB with additional observations to the design of rate adaptation schemes in 802.11-like wireless systems, and to that of online advertisement systems. In both cases, we demonstrate that our algorithms leverage additional observations to significantly improve the system performance. We believe the techniques developed in this paper are of independent interest for other MAB problems, e.g., contextual or graph-structured MAB.

Funder

ERC consolidator

National ResearchFoundation of Kore

ICT R&D program of MSIP/IITP

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Networks and Communications,Hardware and Architecture,Safety, Risk, Reliability and Quality,Computer Science (miscellaneous)

Link

https://dl.acm.org/doi/pdf/10.1145/3179416

Reference33 articles.

1. Noga Alon Nicolo Cesa-Bianchi Claudio Gentile and Yishay Mansour. 2013. From bandits to experts: A tale of domination and independence Proceedings of NIPS. Noga Alon Nicolo Cesa-Bianchi Claudio Gentile and Yishay Mansour. 2013. From bandits to experts: A tale of domination and independence Proceedings of NIPS.

2. The Nonstochastic Multiarmed Bandit Problem

Cited by 7 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Minimizing Edge Caching Service Costs Through Regret-Optimal Online Learning;IEEE/ACM Transactions on Networking;2024

2. Efficient algorithms for multi-armed bandits with additional feedbacks: Modeling and algorithms;Information Sciences;2023-07

3. A Thompson Sampling Approach to Unifying Causal Inference and Bandit Learning;Advances in Knowledge Discovery and Data Mining;2023

4. A Robust Algorithm to Unifying Offline Causal Inference and Online Multi-armed Bandit Learning;2021 IEEE International Conference on Data Mining (ICDM);2021-12

5. LACO: A Latency-Driven Network Slicing Orchestration in Beyond-5G Networks;IEEE Transactions on Wireless Communications;2021-01