Author:
Cooper William L.,Henderson Shane G.,Lewis Mark E.
Abstract
Simulation-based policy iteration (SBPI) is a modification
of the policy iteration algorithm for computing optimal policies
for Markov decision processes. At each iteration, rather than
solving the average evaluation equations, SBPI employs simulation
to estimate a solution to these equations. For recurrent
average-reward Markov decision processes with finite state and
action spaces, we provide easily verifiable conditions that
ensure that simulation-based policy iteration almost-surely
eventually never leaves the set of optimal decision rules. We
analyze three simulation estimators for solutions to the average
evaluation equations. Using our general results, we derive simple
conditions on the simulation run lengths that guarantee the
almost-sure convergence of the algorithm.
Publisher
Cambridge University Press (CUP)
Subject
Industrial and Manufacturing Engineering,Management Science and Operations Research,Statistics, Probability and Uncertainty,Statistics and Probability
Cited by
21 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献