Neuronal implementation of the temporal difference learning algorithm in the midbrain dopaminergic system-Reference-Cited by-同舟云学术

Neuronal implementation of the temporal difference learning algorithm in the midbrain dopaminergic system

Published:2023-10-30 Issue:45 Volume:120 Page:
ISSN:0027-8424
Container-title:Proceedings of the National Academy of Sciences
language:en
Short-container-title:Proc. Natl. Acad. Sci. U.S.A.

Author:

Stetsenko Anya¹,Koos Tibor¹^ORCID

Affiliation:

1. Center for Molecular and Behavioral Neuroscience, Rutgers University, Newark, NJ 07102

Abstract

The temporal difference learning (TDL) algorithm has been essential to conceptualizing the role of dopamine in reinforcement learning (RL). Despite its theoretical importance, it remains unknown whether a neuronal implementation of this algorithm exists in the brain. Here, we provide an interpretation of the recently described signaling properties of ventral tegmental area (VTA) GABAergic neurons and show that a circuitry of these neurons implements the TDL algorithm. Specifically, we identified the neuronal mechanism of three key components of the TDL model: a sustained state value signal encoded by an afferent input to the VTA, a temporal differentiation circuit formed by two types of VTA GABAergic neurons the combined output of which computes momentary reward prediction (RP) as the derivative of the state value, and the computation of reward prediction errors (RPEs) in dopamine neurons utilizing the output of the differentiation circuit. Using computational methods, we also show that this mechanism is optimally adapted to the biophysics of RPE signaling in dopamine neurons, mechanistically links the emergence of conditioned reinforcement to RP, and can naturally account for the temporal discounting of reinforcement. Elucidating the implementation of the TDL algorithm may further the investigation of RL in biological and artificial systems.

Funder

HHS | NIH | National Institute of Neurological Disorders and Stroke

Publisher

Proceedings of the National Academy of Sciences

Subject

Multidisciplinary

Link

https://pnas.org/doi/pdf/10.1073/pnas.2309015120

Reference69 articles.

1. Neuron-type-specific signals for reward and punishment in the ventral tegmental area

2. Arithmetic and local circuitry underlying dopamine prediction errors

3. R. S. Sutton, A. G. Barto, Reinforcement Learning: An Introduction (MIT Press, Cambridge, MA, 1998).

4. GABAA receptor stimulation blocks NMDA-induced bursting of dopaminergic neurons in vitro by decreasing input resistance

5. Associative learning mediates dynamic shifts in dopamine signaling in the nucleus accumbens