Posterior Weighted Reinforcement Learning with State Uncertainty-Reference-Cited by-同舟云学术

Posterior Weighted Reinforcement Learning with State Uncertainty

Published:2010-05 Issue:5 Volume:22 Page:1149-1179
ISSN:0899-7667
Container-title:Neural Computation
language:en
Short-container-title:Neural Computation

Author:

Larsen Tobias¹,Leslie David S.²,Collins Edmund J.²,Bogacz Rafal³

Affiliation:

1. Department of Computer Science, University of Bristol, Bristol, BS8 1UB, U.K.

2. Department of Mathematics, University of Bristol, Bristol, BS8 1TW, U.K

3. Department of Computer Science, University of Bristol, Bristol, BS8 1UB, U.K

Abstract

Reinforcement learning models generally assume that a stimulus is presented that allows a learner to unambiguously identify the state of nature, and the reward received is drawn from a distribution that depends on that state. However, in any natural environment, the stimulus is noisy. When there is state uncertainty, it is no longer immediately obvious how to perform reinforcement learning, since the observed reward cannot be unambiguously allocated to a state of the environment. This letter addresses the problem of incorporating state uncertainty in reinforcement learning models. We show that simply ignoring the uncertainty and allocating the reward to the most likely state of the environment results in incorrect value estimates. Furthermore, using only the information that is available before observing the reward also results in incorrect estimates. We therefore introduce a new technique, posterior weighted reinforcement learning, in which the estimates of state probabilities are updated according to the observed rewards (e.g., if a learner observes a reward usually associated with a particular state, this state becomes more likely). We show analytically that this modified algorithm can converge to correct reward estimates and confirm this with numerical experiments. The algorithm is shown to be a variant of the expectation-maximization algorithm, allowing rigorous convergence analyses to be carried out. A possible neural implementation of the algorithm in the cortico-basal-ganglia-thalamic network is presented, and experimental predictions of our model are discussed.

Publisher

MIT Press - Journals

Subject

Cognitive Neuroscience,Arts and Humanities (miscellaneous)

Link

https://www.mitpressjournals.org/doi/pdf/10.1162/neco.2010.01-09-948

Reference41 articles.

1. Learning the value of information in an uncertain world

2. Dynamics of stochastic approximation algorithms

3. Optimal decision-making theories

4. The Basal Ganglia and Cortex Implement Optimal Decision Making Between Alternative Actions

Cited by 15 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Adaptive Integration of Perceptual and Reward Information in an Uncertain World;2024-04-28

2. Inferring neural activity before plasticity as a foundation for learning beyond backpropagation;Nature Neuroscience;2024-01-03

3. Reinforcement Learning Under Uncertainty: Expected Versus Unexpected Uncertainty and State Versus Reward Uncertainty;Computational Brain & Behavior;2023-03-20

4. Uncertainty-bounded reinforcement learning for revenue optimization in air cargo: a prescriptive learning approach;Knowledge and Information Systems;2022-08-02

5. Choice is a tricky thing: Integrating sophisticated choice models with learning processes to better account for complex choice behavior.;Decision;2022-07