Publisher
Springer International Publishing
Reference15 articles.
1. Bather, J.: Optimal decision procedures for finite Markov chains. Part II: communicating systems. Adv. Appl. Probab. 5(3), 521–540 (1973).
https://doi.org/10.2307/1425832
2. Advanced Lectures in Mathematics;Ehrhard Behrends,2000
3. Bradtke, S.J.: Incremental dynamic programming for on-line adaptive optimal control. Ph.D. Thesis, University of Massachusetts, Amherst, MA, USA (1994)
4. Bradtke, S.J., Barto, A.G.: Linear least-squares algorithms for temporal difference learning. Mach. Learn. 22, 33–57 (1996).
https://doi.org/10.1023/A:1018056104778
5. Filar, J.A., Schultz, T.A.: Communicating MDPs: equivalence and LP properties. Oper. Res. Lett. 7(6), 303–307 (1988).
https://doi.org/10.1016/0167-6377(88)90062-4
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献