Dopamine encoding of novelty facilitates efficient uncertainty-driven exploration-Reference-Cited by-同舟云学术

Dopamine encoding of novelty facilitates efficient uncertainty-driven exploration

Published:2024-04-16 Issue:4 Volume:20 Page:e1011516
ISSN:1553-7358
Container-title:PLOS Computational Biology
language:en
Short-container-title:PLoS Comput Biol

Author:

Wang Yuhao^ORCID,Lak Armin,Manohar Sanjay G.,Bogacz Rafal^ORCID

Abstract

When facing an unfamiliar environment, animals need to explore to gain new knowledge about which actions provide reward, but also put the newly acquired knowledge to use as quickly as possible. Optimal reinforcement learning strategies should therefore assess the uncertainties of these action–reward associations and utilise them to inform decision making. We propose a novel model whereby direct and indirect striatal pathways act together to estimate both the mean and variance of reward distributions, and mesolimbic dopaminergic neurons provide transient novelty signals, facilitating effective uncertainty-driven exploration. We utilised electrophysiological recording data to verify our model of the basal ganglia, and we fitted exploration strategies derived from the neural model to data from behavioural experiments. We also compared the performance of directed exploration strategies inspired by our basal ganglia model with other exploration algorithms including classic variants of upper confidence bound (UCB) strategy in simulation. The exploration strategies inspired by the basal ganglia model can achieve overall superior performance in simulation, and we found qualitatively similar results in fitting model to behavioural data compared with the fitting of more idealised normative models with less implementation level detail. Overall, our results suggest that transient dopamine levels in the basal ganglia that encode novelty could contribute to an uncertainty representation which efficiently drives exploration in reinforcement learning.

Funder

Biotechnology and Biological Sciences Research Council

Medical Research Council

Wellcome Trust

Royal Society

National Institute for Healthcare Research

James S. McDonnell Foundation

Publisher

Public Library of Science (PLoS)

Reference49 articles.

1. Learning the payoffs and costs of actions;M Möller;PLOS Computational Biology,2019

2. Restless Bandits: Activity Allocation in a Changing World;P Whittle;Journal of Applied Probability,1988

3. Humans use directed and random exploration to solve the explore–exploit dilemma;RC Wilson;Journal of Experimental Psychology: General,2014

4. Bandit Processes and Dynamic Allocation Indices;JC Gittins;Journal of the Royal Statistical Society: Series B (Methodological),1979

5. The Multi-Armed Bandit Problem: Decomposition and Computation;MN Katehakis;Mathematics of Operations Research,1987

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Predictive coding model can detect novelty on different levels of representation hierarchy;2024-06-10