The Price of Incentivizing Exploration: A Characterization via Thompson Sampling and Sample Complexity-Reference-Cited by-同舟云学术

The Price of Incentivizing Exploration: A Characterization via Thompson Sampling and Sample Complexity

Published:2022-11-29 Issue: Volume: Page:
ISSN:0030-364X
Container-title:Operations Research
language:en
Short-container-title:Operations Research

Author:

Sellke Mark¹^ORCID,Slikvins Aleksandrs²^ORCID

Affiliation:

1. School of Mathematics, Institute for Advanced Study, Princeton, New Jersey 08540;

2. Microsoft Research, New York, New York 10012

Abstract

We consider “incentivized exploration”: a version of multiarmed bandits where the choice of arms is controlled by self-interested agents. The algorithm can only issue recommendations and needs to incentivize the agents to explore, even though they prefer to exploit. The algorithm controls the flow of information and relies on information asymmetry to create incentives. This line of work, motivated by misaligned incentives in recommendation systems, has received considerable attention in the “economics and computation” community. We focus on the “price of incentives”: the loss in performance, broadly construed, incurred for the sake of incentive compatibility. We prove that Thompson sampling, a standard bandit algorithm, is incentive compatible if initialized with sufficiently many data points. The performance loss because of incentives is, therefore, limited to the initial rounds when these data points are collected. The problem is largely reduced to that of sample complexity. How many rounds are needed to collect even one sample of each arm? We zoom in on this question, which is perhaps the most basic question one could ask about incentivized exploration. We characterize the dependence on agents' beliefs and the number of arms (which was essentially ignored in prior work), providing matching upper and lower bounds. Typically, the optimal sample complexity is polynomial in the number of arms and exponential in the “strength of beliefs.”

Publisher

Institute for Operations Research and the Management Sciences (INFORMS)

Subject

Management Science and Operations Research,Computer Science Applications

Link

http://pubsonline.informs.org/doi/pdf/10.1287/opre.2022.2401

Reference20 articles.

1. Information Design: A Unified Perspective

2. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Learning-based dynamic pricing strategy with pay-per-chapter mode for online publisher with case study of COL;Decision Support Systems;2024-11

2. Incentivized Exploration via Filtered Posterior Sampling;SSRN Electronic Journal;2024

3. Exploration and Incentives in Reinforcement Learning;Operations Research;2023-08-18