Abstract
A simulation-based algorithm for learning good
policies for a discrete-time stochastic control process
with unknown transition law is analyzed when the state
and action spaces are compact subsets of Euclidean spaces.
This extends the Q-learning scheme of discrete state/action
problems along the lines of Baker [4]. Almost
sure convergence is proved under suitable conditions.
Publisher
Cambridge University Press (CUP)
Subject
Industrial and Manufacturing Engineering,Management Science and Operations Research,Statistics, Probability and Uncertainty,Statistics and Probability
Cited by
9 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献