A vector reward prediction error model explains dopaminergic heterogeneity-Reference-Cited by-同舟云学术

A vector reward prediction error model explains dopaminergic heterogeneity

Published:2022-03-02 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Lee Rachel S.^ORCID,Engelhard Ben,Witten Ilana B.^ORCID,Daw Nathaniel D.^ORCID

Abstract

The hypothesis that midbrain dopamine (DA) neurons broadcast an error signal for the prediction of reward (reward prediction error, RPE) is among the great successes of computational neuroscience1–3. However, recent results contradict a core aspect of this theory: that the neurons uniformly convey a scalar, global signal. Instead, when animals are placed in a high-dimensional environment, DA neurons in the ventral tegmental area (VTA) display substantial heterogeneity in the features to which they respond, while also having more consistent RPE-like responses at the time of reward. Here we introduce a new “Vector RPE” model that explains these findings, by positing that DA neurons report individual RPEs for a subset of a population vector code for an animal’s state (moment-to-moment situation). To investigate this claim, we train a deep reinforcement learning model on a navigation and decision-making task, and compare the Vector RPE derived from the network to population recordings from DA neurons during the same task. The Vector RPE model recapitulates the key features of the neural data: specifically, heterogeneous coding of task variables during the navigation and decision-making period, but uniform reward responses. The model also makes new predictions about the nature of the responses, which we validate. Our work provides a path to reconcile new observations of DA neuron heterogeneity with classic ideas about RPE coding, while also providing a new perspective on how the brain performs reinforcement learning in high dimensional environments.

Publisher

Cold Spring Harbor Laboratory

Reference89 articles.

1. Houk, J. C. , Adams, J. L. & Barto, A. G. A Model of How the Basal Ganglia Generate and Use Neural Signals that Predict Reinforcement, Models of Information Processing in the Basal Ganglia (eds. JC Houk , JL Davis and DG Beiser ), 249/270. (1995).

2. A framework for mesencephalic dopamine systems based on predictive Hebbian learning

3. A Neural Substrate of Prediction and Reward

4. A cellular mechanism of reward-related learning

5. Reward and choice encoding in terminals of midbrain dopamine neurons depends on striatal target

Cited by 30 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Explaining dopamine through prediction errors and beyond;Nature Neuroscience;2024-07-25

2. Humans forage for reward in reinforcement learning tasks;2024-07-08

3. Mesostriatal dopamine is sensitive to changes in specific cue-reward contingencies;Science Advances;2024-05-31

4. Memory-specific encoding activities of the ventral tegmental area dopamine and GABA neurons;eLife;2024-03-21

5. Pre-existing visual responses in a projection-defined dopamine population explain individual learning trajectories;2024-02-28