Bayesian Incentive-Compatible Bandit Exploration-Reference-Cited by-同舟云学术

Bayesian Incentive-Compatible Bandit Exploration

Published:2020-07 Issue:4 Volume:68 Page:1132-1161
ISSN:0030-364X
Container-title:Operations Research
language:en
Short-container-title:Operations Research

Author:

Mansour Yishay¹^ORCID,Slivkins Aleksandrs²^ORCID,Syrgkanis Vasilis³^ORCID

Affiliation:

1. School of Computer Science, Tel Aviv University, 6997801 Tel Aviv, Israel;

2. Microsoft Research, New York, New York 10011;

3. Microsoft Research, Cambridge, Massachusetts 02142

Abstract

As self-interested individuals (“agents”) make decisions over time, they utilize information revealed by other agents in the past and produce information that may help agents in the future. This phenomenon is common in a wide range of scenarios in the Internet economy, as well as in medical decisions. Each agent would like to exploit: select the best action given the current information, but would prefer the previous agents to explore: try out various alternatives to collect information. A social planner, by means of a carefully designed recommendation policy, can incentivize the agents to balance the exploration and exploitation so as to maximize social welfare. We model the planner’s recommendation policy as a multiarm bandit algorithm under incentive-compatibility constraints induced by agents’ Bayesian priors. We design a bandit algorithm which is incentive-compatible and has asymptotically optimal performance, as expressed by regret. Further, we provide a black-box reduction from an arbitrary multiarm bandit algorithm to an incentive-compatible one, with only a constant multiplicative increase in regret. This reduction works for very general bandit settings that incorporate contexts and arbitrary partial feedback.

Publisher

Institute for Operations Research and the Management Sciences (INFORMS)

Subject

Management Science and Operations Research,Computer Science Applications

Reference67 articles.

1. Agarwal A, Hsu D, Kale S, Langford J, Li L, Schapire R (2014) Taming the monster: A fast and simple algorithm for contextual bandits.31st Internat. Conf. Machine Learn. (ICML), 1638–1646.

2. Alon N, Cesa-Bianchi N, Dekel O, Koren T (2015) Online learning with feedback graphs: Beyond bandits.28th Conf. Learn Theory (COLT), 23–35.

3. Alon N, Cesa-Bianchi N, Gentile C, Mansour Y (2013) From bandits to experts: A tale of domination and independence.Advances in Neural Information Processing Systems (NIPS), vol. 27 (Curran Associates, Red Hook, NY), 1610–1618.

4. Audibert J-Y, Bubeck S, Lugosi G (2011) Minimax policies for combinatorial prediction games.24th Conf. Learn. Theory (COLT), 107–132.

Cited by 13 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Incentive-Aware Recommender Systems in Two-Sided Markets;ACM Transactions on Recommender Systems;2024-07-31

2. Incentivized Exploration of Non-Stationary Stochastic Bandits;2024 American Control Conference (ACC);2024-07-10

3. Incentive-compatible mechanism for manufacturing carbon emission supervision under carbon control policies in China;PLOS ONE;2024-05-13

4. On Statistical Discrimination as a Failure of Social Learning: A Multiarmed Bandit Approach;Management Science;2024-03-29

5. Incentivized Exploration via Filtered Posterior Sampling;SSRN Electronic Journal;2024