An asymptotically optimal heuristic for general nonstationary finite-horizon restless multi-armed, multi-action bandits-Reference-Cited by-同舟云学术

An asymptotically optimal heuristic for general nonstationary finite-horizon restless multi-armed, multi-action bandits

Published:2019-09 Issue:03 Volume:51 Page:745-772
ISSN:0001-8678
Container-title:Advances in Applied Probability
language:en
Short-container-title:Adv. Appl. Probab.

Author:

Zayas-Cabán Gabriel,Jasin Stefanus,Wang Guihua

Abstract

AbstractWe propose an asymptotically optimal heuristic, which we term randomized assignment control (RAC) for a restless multi-armed bandit problem with discrete-time and finite states. It is constructed using a linear programming relaxation of the original stochastic control formulation. In contrast to most of the existing literature, we consider a finite-horizon problem with multiple actions and time-dependent (i.e. nonstationary) upper bound on the number of bandits that can be activated at each time period; indeed, our analysis can also be applied in the setting with nonstationary transition matrix and nonstationary cost function. The asymptotic setting is obtained by letting the number of bandits and other related parameters grow to infinity. Our main contribution is that the asymptotic optimality of RAC in this general setting does not require indexability properties or the usual stability conditions of the underlying Markov chain (e.g. unichain) or fluid approximation (e.g. global stable attractor). Moreover, our multi-action setting is not restricted to the usual dominant action concept. Finally, we show that RAC is also asymptotically optimal for a dynamic population, where bandits can randomly arrive and depart the system.

Publisher

Cambridge University Press (CUP)

Subject

Applied Mathematics,Statistics and Probability

Reference18 articles.

1. The Complexity of Optimal Queuing Network Control

2. Some aspects of the sequential design of experiments

3. Optimal priority assignment with hard constraint

4. Multi-Armed Bandit Problems

5. Multi-Armed Bandits with Discount Factor Near One: The Bernoulli Case

Cited by 22 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Low-complexity algorithm for restless bandits with imperfect observations;Mathematical Methods of Operations Research;2024-09-05

2. A restless bandit model for dynamic ride matching with reneging travelers;European Journal of Operational Research;2024-07

3. Leveraging Nondegeneracy in Dynamic Resource Allocation;2024

4. Fluid Policies, Reoptimization, and Performance Guarantees in Dynamic Resource Allocation;Operations Research;2023-12-11

5. Linear Program-Based Policies for Restless Bandits: Necessary and Sufficient Conditions for (Exponentially Fast) Asymptotic Optimality;Mathematics of Operations Research;2023-12-01