Abstract
The theory of allocation indices for defining the optimal policy in multi-armed bandit problems developed by Gittins is presented in the continuous-time case where the projects (or ‘arms’) are strong Markov processes. Complications peculiar to the continuous-time case are discussed. This motivates investigation of whether approximation of the continuous-time problems by discrete-time versions provides a valid technique with convergent allocation indices and optimal expected rewards. Conditions are presented under which the convergence holds.
Publisher
Cambridge University Press (CUP)
Subject
Applied Mathematics,Statistics and Probability
Cited by
7 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献