Dynamic reinforcement learning reveals time-dependent shifts in strategy during reward learning-Reference-Cited by-同舟云学术

Dynamic reinforcement learning reveals time-dependent shifts in strategy during reward learning

Published:2024-02-28 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Venditto Sarah Jo C^ORCID,Miller Kevin J^ORCID,Brody Carlos D^ORCID,Daw Nathaniel D^ORCID

Abstract

AbstractDifferent brain systems have been hypothesized to subserve multiple “experts” that compete to generate behavior. In reinforcement learning, two general processes, one model-free (MF) and one model-based (MB), are often modeled as a mixture of agents (MoA) and hypothesized to capture differences between automaticity vs. deliberation. However, shifts in strategy cannot be captured by a static MoA. To investigate such dynamics, we present the mixture-of-agents hidden Markov model (MoA-HMM), which simultaneously learns inferred action values from a set of agents and the temporal dynamics of underlying “hidden” states that capture shifts in agent contributions over time. Applying this model to a multi-step,reward-guided task in rats reveals a progression of within-session strategies: a shift from initial MB exploration to MB exploitation, and finally to reduced engagement. The inferred states predict changes in both response time and OFC neural encoding during the task, suggesting that these states are capturing real shifts in dynamics.

Publisher

Cold Spring Harbor Laboratory

Reference55 articles.

1. The Anterior Cingulate Cortex Predicts Future States to Mediate Model-Based Action Selection

2. Ashwood ZC , Roy NA , Bak JH . Inferring Learning Rules from Animal Decision-Making. NeurIPS. 2020; p. 12.

3. Mice alternate between discrete strategies during perceptual decision-making

4. Theory of Choice in Bandit, Information Sampling and Foraging Tasks