Using Expectation-Maximization for Reinforcement Learning-Reference-Cited by-同舟云学术

Using Expectation-Maximization for Reinforcement Learning

Published:1997-02-01 Issue:2 Volume:9 Page:271-278
ISSN:0899-7667
Container-title:Neural Computation
language:en
Short-container-title:Neural Computation

Author:

Dayan Peter¹,Hinton Geoffrey E.²

Affiliation:

1. Department of Brain and Cognitive Sciences, Center for Biological and Computational Learning, Massachusetts Institute of Technology, Cambridge, MA 02139 USA

2. Department of Computer Science, University of Toronto, Toronto M5S 1A4, Canada

Abstract

We discuss Hinton's (1989) relative payoff procedure (RPP), a static reinforcement learning algorithm whose foundation is not stochastic gradient ascent. We show circumstances under which applying the RPP is guaranteed to increase the mean return, even though it can make large changes in the values of the parameters. The proof is based on a mapping between the RPP and a form of the expectation-maximization procedure of Dempster, Laird, and Rubin (1977).

Publisher

MIT Press - Journals

Subject

Cognitive Neuroscience,Arts and Humanities (miscellaneous)

Link

https://www.mitpressjournals.org/doi/pdf/10.1162/neco.1997.9.2.271

Reference5 articles.

1. Pattern-recognizing stochastic learning automata

2. A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains

3. Connectionist learning procedures

4. Simple statistical gradient-following algorithms for connectionist reinforcement learning

Cited by 77 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Proximal evolutionary strategy: improving deep reinforcement learning through evolutionary policy optimization;Memetic Computing;2024-08-17

2. A Probabilistic Treatment of (PO)MDPs with Multiplicative Reward Structure;2024 European Control Conference (ECC);2024-06-25

3. End-to-end Multi-Objective Deep Reinforcement Learning for Autonomous Navigation;2023 IEEE International Conference on Real-time Computing and Robotics (RCAR);2023-07-17

4. A review on reinforcement learning algorithms and applications in supply chain management;International Journal of Production Research;2022-11-03

5. Inferring Neural Activity Before Plasticity: A Foundation for Learning Beyond Backpropagation;2022-05-18