Near-Optimal Regret Bounds for Thompson Sampling-Reference-Cited by-同舟云学术

Near-Optimal Regret Bounds for Thompson Sampling

Published:2017-10-15 Issue:5 Volume:64 Page:1-24
ISSN:0004-5411
Container-title:Journal of the ACM
language:en
Short-container-title:J. ACM

Author:

Agrawal Shipra¹,Goyal Navin²

Affiliation:

1. Microsoft Research

2. Microsoft Research, Karnataka, India

Abstract

Thompson Sampling (TS) is one of the oldest heuristics for multiarmed bandit problems. It is a randomized algorithm based on Bayesian ideas and has recently generated significant interest after several studies demonstrated that it has favorable empirical performance compared to the state-of-the-art methods. In this article, a novel and almost tight martingale-based regret analysis for Thompson Sampling is presented. Our technique simultaneously yields both problem-dependent and problem-independent bounds: (1) the first near-optimal problem-independent bound of O (√ NT ln T ) on the expected regret and (2) the optimal problem-dependent bound of (1 + ϵ)Σ i ln T / d (μ i ,μ 1 ) + O ( N /ϵ 2 ) on the expected regret (this bound was first proven by Kaufmann et al. (2012b)). Our technique is conceptually simple and easily extends to distributions other than the Beta distribution used in the original TS algorithm. For the version of TS that uses Gaussian priors, we prove a problem-independent bound of O (√ NT ln N ) on the expected regret and show the optimality of this bound by providing a matching lower bound. This is the first lower bound on the performance of a natural version of Thompson Sampling that is away from the general lower bound of Ω (√ NT ) for the multiarmed bandit problem.

Publisher

Association for Computing Machinery (ACM)

Subject

Artificial Intelligence,Hardware and Architecture,Information Systems,Control and Systems Engineering,Software

Link

https://dl.acm.org/doi/pdf/10.1145/3088510

Reference30 articles.

1. Milton Abramowitz and Irene A. Stegun. 1964. Handbook of Mathematical Functions with Formulas Graphs and Mathematical Tables. Dover New York. Milton Abramowitz and Irene A. Stegun. 1964. Handbook of Mathematical Functions with Formulas Graphs and Mathematical Tables. Dover New York.

Cited by 33 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Weak Signal Asymptotics for Sequentially Randomized Experiments;Management Science;2023-12-06

2. Multinomial Thompson sampling for rating scales and prior considerations for calibrating uncertainty;Statistical Methods & Applications;2023-12-06

3. Stability Enforced Bandit Algorithms for Channel Selection in Remote State Estimation of Gauss–Markov Processes;IEEE Transactions on Automatic Control;2023-12

4. Towards Energy Efficiency in RAN Network Slicing;2023 IEEE 48th Conference on Local Computer Networks (LCN);2023-10-02

5. Asymptotic Performance of Thompson Sampling for Batched Multi-Armed Bandits;IEEE Transactions on Information Theory;2023-09