INDEXABILITY AND OPTIMAL INDEX POLICIES FOR A CLASS OF REINITIALISING RESTLESS BANDITS-Reference-Cited by-同舟云学术

INDEXABILITY AND OPTIMAL INDEX POLICIES FOR A CLASS OF REINITIALISING RESTLESS BANDITS

Published:2015-10-16 Issue:1 Volume:30 Page:1-23
ISSN:0269-9648
Container-title:Probability in the Engineering and Informational Sciences
language:en
Short-container-title:Prob. Eng. Inf. Sci.

Author:

Villar Sofía S.

Abstract

Motivated by a class of Partially Observable Markov Decision Processes with application in surveillance systems in which a set of imperfectly observed state processes is to be inferred from a subset of available observations through a Bayesian approach, we formulate and analyze a special family of multi-armed restless bandit problems. We consider the problem of finding an optimal policy for observing the processes that maximizes the total expected net rewards over an infinite time horizon subject to the resource availability. From the Lagrangian relaxation of the original problem, an index policy can be derived, as long as the existence of the Whittle index is ensured. We demonstrate that such a class of reinitializing bandits in which the projects' state deteriorates while active and resets to its initial state when passive until its completion possesses the structural property of indexability and we further show how to compute the index in closed form. In general, the Whittle index rule for restless bandit problems does not achieve optimality. However, we show that the proposed Whittle index rule is optimal for the problem under study in the case of stochastically heterogenous arms under the expected total criterion, and it is further recovered by a simple tractable rule referred to as the 1-limited Round Robin rule. Moreover, we illustrate the significant suboptimality of other widely used heuristic: the Myopic index rule, by computing in closed form its suboptimality gap. We present numerical studies which illustrate for the more general instances the performance advantages of the Whittle index rule over other simple heuristics.

Publisher

Cambridge University Press (CUP)

Subject

Industrial and Manufacturing Engineering,Management Science and Operations Research,Statistics, Probability and Uncertainty,Statistics and Probability

Reference23 articles.

1. Dynamic priority allocation via restless bandit marginal productivity indices

2. Jacko P. & Villar S.S. (2012). Opportunistic schedulers for optimal scheduling of flows in wireless systems with ARQ Feedback. 24th International Teletraffic Conference (ITC) IEEE, pp. 1–8.

3. Some indexable families of restless bandit problems

4. Mathpages Algebra , Linear Fractional Transformations. Available from http://www.mathpages.com/home/kmath464/kmath464.htm

5. Mansourifard P. , Javidi T. , & Krishnamachari B. (2012). Optimality of myopic policy for a class of monotone affine restless multi-armed bandits. In the Proceedings of the 51th IEEE International Conference on Decision and Control (CDC).

Cited by 13 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Two families of indexable partially observable restless bandits and Whittle index computation;Performance Evaluation;2024-01

2. An Index Policy for Minimizing the Uncertainty-of-Information of Markov Sources;IEEE Transactions on Information Theory;2024-01

3. Empirical Gittins index strategies with ε-explorations for multi-armed bandit problems;Computational Statistics & Data Analysis;2023-04

4. Partially observable restless bandits with restarts: indexability and computation of Whittle index;2022 IEEE 61st Conference on Decision and Control (CDC);2022-12-06

5. Near optimal scheduling for opportunistic spectrum access over block fading channels in cognitive radio assisted vehicular network;Vehicular Communications;2022-10