Abstract
We consider the adaptive control of Markov chains under the weak accessibility condition with a view to minimizing the learning loss. A certainty equivalence control with a forcing scheme is constructed. We use a stationary randomized control scheme for forcing and compute a maximum likelihood estimate of the unknown parameter from the resulting observations. We obtain an exponential upper bound on the rate of decay of the probability of error. This allows us to choose the rate of forcing appropriately, whereby we achieve a o(f(n) log n) learning loss for any function as .
Publisher
Cambridge University Press (CUP)
Subject
Statistics, Probability and Uncertainty,General Mathematics,Statistics and Probability
Cited by
5 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献