Author:
Zhang Haifei,Hong Ying,Qiu Jianlin
Funder
Universities Natural Science Research Project of Jiangsu Province
Universities Natural Science Research Project of Anhui Province
Publisher
Springer Science and Business Media LLC
Subject
Computer Networks and Communications,Software
Reference26 articles.
1. Sutton, R.S., Barto, G.A.: Reinforcement Learning. MIT Press, Cambridge (1998)
2. Koller, D., Parr, R.: Policy iteration for factored MDPs. In: Proceedings of the 16th Conference on Uncertain in Artificial Intelligence, Stanford, USA (2000)
3. Andoh, A., Kobayashi, T., Kuzuoka, H., Tsujikawa, T., Suzuki, Y.: Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path. Mach. Learn. 71(1), 89–129 (2008)
4. Tsitsiklis, J.N., Van Roy, B.: An analysis of temporal-difference learning with function approximation. IEEE Trans. Autom. Control 42, 674–690 (1997)
5. Geist, M., Pietquin, O.: Parametric value function approximation: a unified view. In: Proceedings of the IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, Piscataway, USA (2011)
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献