Influences of Reinforcement and Choice Histories on Choice Behavior in Actor-Critic Learning-Reference-Cited by-同舟云学术

Influences of Reinforcement and Choice Histories on Choice Behavior in Actor-Critic Learning

Published:2022-07-11 Issue: Volume: Page:
ISSN:2522-0861
Container-title:Computational Brain & Behavior
language:en
Short-container-title:Comput Brain Behav

Author:

Katahira Kentaro^ORCID,Kimura Kenta

Abstract

AbstractReinforcement learning models have been used in many studies in the fields of neuroscience and psychology to model choice behavior and underlying computational processes. Models based on action values, which represent the expected reward from actions (e.g., Q-learning model), have been commonly used for this purpose. Meanwhile, the actor-critic learning model, in which the policy update and evaluation of an expected reward for a given state are performed in separate systems (actor and critic, respectively), has attracted attention due to its ability to explain the characteristics of various behaviors of living systems. However, the statistical property of the model behavior (i.e., how the choice depends on past rewards and choices) remains elusive. In this study, we examine the history dependence of the actor-critic model based on theoretical considerations and numerical simulations while considering the similarities with and differences from Q-learning models. We show that in actor-critic learning, a specific interaction between past reward and choice, which differs from Q-learning, influences the current choice. We also show that actor-critic learning predicts qualitatively different behavior from Q-learning, as the higher the expectation is, the less likely the behavior will be chosen afterwards. This study provides useful information for inferring computational and psychological principles from behavior by clarifying how actor-critic learning manifests in choice behavior.

Funder

Japan Society for the Promotion of Science

Publisher

Springer Science and Business Media LLC

Subject

Developmental and Educational Psychology,Neuropsychology and Physiological Psychology

Link

https://link.springer.com/content/pdf/10.1007/s42113-022-00145-2.pdf

Reference58 articles.

1. Barto, A. G. (1995). Adaptive critics and the basal ganglia. In: Models of information processing in the basal ganglia (pp. 215–232). MA, USA: MIT Press Cambridge.

2. Barto, A. G., Sutton, R. S., & Anderson, C. W. (1983). Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics, 5, 834–846.

3. Bennett, D., Niv, Y., & Langdon, A. J. (2021). Value-free reinforcement learning: Policy optimization as a minimal model of operant behavior. Current Opinion in Behavioral Sciences, 41, 114–121.

4. Collins, A. G., & Frank, M. J. (2014). Opponent actor learning (OpAL): Modeling interactive effects of striatal dopamine on reinforcement learning and choice incentive. Psychological Review, 121(3), 337.

5. Corrado, G., & Doya, K. (2007). Understanding neural coding through the model-based analysis of decision making. Journal of Neuroscience, 27(31), 8178.

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Does the reliability of computational models truly improve with hierarchical modeling? Some recommendations and considerations for the assessment of model parameter reliability;Psychonomic Bulletin & Review;2024-05-08

2. Autoshaped impulsivity: Some explorations with a neural network model;Behavioural Processes;2024-05

3. Active reinforcement learning versus action bias and hysteresis: control with a mixture of experts and nonexperts;PLOS Computational Biology;2024-03-29

4. Autonomous air traffic separation assurance through machine learning;Journal of Industrial and Management Optimization;2024