Predictive reward-prediction errors of climbing fiber inputs integrate modular reinforcement learning with supervised learning-Reference-Cited by-同舟云学术

Predictive reward-prediction errors of climbing fiber inputs integrate modular reinforcement learning with supervised learning

Published:2023-03-13 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Hoang Huu,Tsutsumi Shinichiro,Matsuzaki Masanori,Kano Masanobu,Toyama Keisuke,Kitamura Kazuo,Kawato Mitsuo

Abstract

AbstractAlthough the cerebellum is typically linked to supervised learning algorithms, it also exhibits extensive connections to reward processing. In this study, we investigated the cerebellum’s role in executing reinforcement learning algorithms, with a particular emphasis on essential reward-prediction errors. We employed the Q-learning model to accurately reproduce the licking responses of mice in a Go/No-go auditory-discrimination task. This method enabled the calculation of reinforcement learning variables, such as reward, predicted reward, and reward-prediction errors in each learning trial. By tensor component analysis of two-photon Ca2+imaging data, we found that climbing fiber inputs of the two distinct components, which were specifically activated during Go and No-go cues in the learning process, showed an inverse relationship with predictive reward-prediction errors. Given the hypothesis of bidirectional parallel-fiber Purkinje-cell synaptic plasticity, Purkinje cells in these components could develop specific motor commands for their respective auditory cues, guided by the predictive reward-prediction errors from their climbing fiber inputs. These results indicate a possible role of context-specific actors in modular reinforcement learning, integrating with cerebellar supervised learning capabilities.

Publisher

Cold Spring Harbor Laboratory

Reference65 articles.

1. Striatal dopamine explains novelty-induced behavioral dynamics and individual variability in threat prediction;Neuron,2022

2. A theory of cerebellar function

3. A gradual temporal shift of dopamine responses mirrors the progression of temporal difference error in machine learning

4. Midbrain Dopamine Neurons Encode a Quantitative Reward Prediction Error Signal

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Spiking network model of the cerebellum as a reinforcement learning machine;2024-06-28

2. The cognitive reality monitoring network and theories of consciousness;Neuroscience Research;2024-04