Affiliation:
1. Graduate School of Business, Columbia University, New York, New York 10027
Abstract
This note gives a short, self-contained proof of a sharp connection between Gittins indices and Bayesian upper confidence bound algorithms. I consider a Gaussian multiarmed bandit problem with discount factor [Formula: see text]. The Gittins index of an arm is shown to equal the [Formula: see text]-quantile of the posterior distribution of the arm's mean plus an error term that vanishes as [Formula: see text]. In this sense, for sufficiently patient agents, a Gittins index measures the highest plausible mean-reward of an arm in a manner equivalent to an upper confidence bound.
Publisher
Institute for Operations Research and the Management Sciences (INFORMS)
Subject
Management Science and Operations Research,Computer Science Applications
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献