Long-Term Reward Prediction in TD Models of the Dopamine System-Reference-Cited by-同舟云学术

Long-Term Reward Prediction in TD Models of the Dopamine System

Published:2002-11-01 Issue:11 Volume:14 Page:2567-2583
ISSN:0899-7667
Container-title:Neural Computation
language:en
Short-container-title:Neural Computation

Author:

Daw Nathaniel D.¹,Touretzky David S.¹

Affiliation:

1. Computer Science Department and Center for the Neural Basis of Cognition, Carnegie Mellon University, Pittsburgh, PA 15213, U.S.A.

Abstract

This article addresses the relationship between long-term reward predictions and slow-timescale neural activity in temporal difference (TD) models of the dopamine system. Such models attempt to explain how the activity of dopamine (DA) neurons relates to errors in the prediction of future rewards. Previous models have been mostly restricted to short-term predictions of rewards expected during a single, somewhat artificially defined trial. Also, the models focused exclusively on the phasic pause-and-burst activity of primate DA neurons; the neurons' slower, tonic background activity was assumed to be constant. This has led to difficulty in explaining the results of neurochemical experiments that measure indications of DA release on a slow timescale, results that seem at first glance inconsistent with a reward prediction model. In this article, we investigate a TD model of DA activity modified so as to enable it to make longer-term predictions about rewards expected far in the future. We show that these predictions manifest themselves as slow changes in the baseline error signal, which we associate with tonic DA activity. Using this model, we make new predictions about the behavior of the DA system in a number of experimental situations. Some of these predictions suggest new computational explanations for previously puzzling data, such as indications from microdialysis studies of elevated DA activity triggered by aversive events.

Publisher

MIT Press - Journals

Subject

Cognitive Neuroscience,Arts and Humanities (miscellaneous)

Link

https://www.mitpressjournals.org/doi/pdf/10.1162/089976602760407973

Reference21 articles.

1. How the Basal Ganglia Use Parallel Excitatory and Inhibitory Learning Pathways to Selectively Respond to Unexpected Rewarding Cues

2. Electrical stimulation of reward sites in the ventral tegmental area increases dopamine transmission in the nucleus accumbens of the rat

3. Time, rate, and conditioning.

4. Blocking and enhancement of fear conditioning by appetitive CSs

5. An electrophysiological characterization of ventral tegmental area dopaminergic neurons during differential pavlovian fear conditioning in the awake rabbit

Cited by 56 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Explaining dopamine through prediction errors and beyond;Nature Neuroscience;2024-07-25

2. Dopamine transients follow a striatal gradient of reward time horizons;Nature Neuroscience;2024-02-06

3. Striatal dopamine integrates cost, benefit, and motivation;Neuron;2023-11

4. Local and global reward learning in the lateral frontal cortex show differential development during human adolescence;PLOS Biology;2023-03-02

5. Performance-gated deliberation: A context-adapted strategy in which urgency is opportunity cost;PLOS Computational Biology;2022-05-26