Policy Gradient Reinforcement Learning with Separated Knowledge: Environmental Dynamics and Action-Values in Policies-Reference-Cited by-同舟云学术

Policy Gradient Reinforcement Learning with Separated Knowledge: Environmental Dynamics and Action-Values in Policies

Published:2016 Issue:3 Volume:136 Page:282-289
ISSN:0385-4221
Container-title:IEEJ Transactions on Electronics, Information and Systems
language:en
Short-container-title:IEEJ Trans.EIS

Author:

Ishihara Seiji¹,Igarashi Harukazu²

Affiliation:

1. School of Science and Engineering, Tokyo Denki University

2. Faculty of Engineering, Shibaura Institute of Technology

Publisher

Institute of Electrical Engineers of Japan (IEE Japan)

Subject

Electrical and Electronic Engineering

Link

https://www.jstage.jst.go.jp/article/ieejeiss/136/3/136_282/_pdf

Reference16 articles.

1. (1) R. S. Sutton and A. G. Barto: Reinforcement Learning, MIT Press, Cambridge (1998)

2. (2) R. J. Williams: “Simple Statistical Gradient-following Algorithms for Connectionist Reinforcement Learning”, Machine Learning, Vol. 8, pp. 229-256 (1992)

3. (3) H. Kimura, M. Yamamura, and S. Kobayashi: “Reinforcement Learning in Partially Observable Markov Decision Processes: A Stochastic Gradient Method”, Journal of the Japanese Society for Artificial Intelligence, Vol. 11, No. 5, pp. 761-768 (1996) (in Japanese)

4. (4) L. C. Baird and A. W. Moore: “Gradient Descent for General Reinforcement Learning”, Advances in Neural Information Processing Systems 11, MIT Press, pp. 968-974 (1999)

5. (5) R. S. Sutton, D. McAllester, S. Singh, and Y. Mansour: “Policy Gradient Methods for Reinforcement Learning with Function Approximation”, Advances in Neural Information Processing Systems 12, MIT Press, pp. 1057-1063 (2000)

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Multi-Agent Reinforcement Learning by a Policy Gradient Method with Energy-Based Policies of a Boltzmann Machine;Journal of Japan Society for Fuzzy Theory and Intelligent Informatics;2022-08-15