Dopamine transients encode reward prediction errors independent of learning rates-Reference-Cited by-同舟云学术

Dopamine transients encode reward prediction errors independent of learning rates

Published:2024-04-18 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Mah Andrew,Golden Carla E.M.,Constantinople Christine M.

Abstract

SummaryBiological accounts of reinforcement learning posit that dopamine encodes reward prediction errors (RPEs), which are multiplied by a learning rate to update state or action values. These values are thought to be represented in synaptic weights in the striatum, and updated by dopamine-dependent plasticity, suggesting that dopamine release might reflect the product of the learning rate and RPE. Here, we leveraged the fact that animals learn faster in volatile environments to characterize dopamine encoding of learning rates in the nucleus accumbens core (NAcc). We trained rats on a task with semi-observable states offering different rewards, and rats adjusted how quickly they initiated trials across states using RPEs. Computational modeling and behavioral analyses showed that learning rates were higher following state transitions, and scaled with trial-by-trial changes in beliefs about hidden states, approximating normative Bayesian strategies. Notably, dopamine release in the NAcc encoded RPEs independent of learning rates, suggesting that dopamine-independent mechanisms instantiate dynamic learning rates.

Publisher

Cold Spring Harbor Laboratory

Reference69 articles.

1. Sutton, R. S. & Barto, A. G . Reinforcement learning: An introduction (MIT press, 2018).

2. An Approximately Bayesian Delta-Rule Model Explains the Dynamics of Belief Updating in a Changing Environment

3. Learning the value of information in an uncertain world

4. Surprise Signals in Anterior Cingulate Cortex: Neuronal Encoding of Unsigned Reward Prediction Errors Driving Adjustment in Behavior

5. Rational regulation of learning dynamics by pupil-linked arousal systems