Performance-gated deliberation: A context-adapted strategy in which urgency is opportunity cost
-
Published:2022-05-26
Issue:5
Volume:18
Page:e1010080
-
ISSN:1553-7358
-
Container-title:PLOS Computational Biology
-
language:en
-
Short-container-title:PLoS Comput Biol
Author:
Puelma Touzel MaximilianORCID,
Cisek Paul,
Lajoie GuillaumeORCID
Abstract
Finding the right amount of deliberation, between insufficient and excessive, is a hard decision making problem that depends on the value we place on our time. Average-reward, putatively encoded by tonic dopamine, serves in existing reinforcement learning theory as the opportunity cost of time, including deliberation time. Importantly, this cost can itself vary with the environmental context and is not trivial to estimate. Here, we propose how the opportunity cost of deliberation can be estimated adaptively on multiple timescales to account for non-stationary contextual factors. We use it in a simple decision-making heuristic based on average-reward reinforcement learning (AR-RL) that we call Performance-Gated Deliberation (PGD). We propose PGD as a strategy used by animals wherein deliberation cost is implemented directly as urgency, a previously characterized neural signal effectively controlling the speed of the decision-making process. We show PGD outperforms AR-RL solutions in explaining behaviour and urgency of non-human primates in a context-varying random walk prediction task and is consistent with relative performance and urgency in a context-varying random dot motion task. We make readily testable predictions for both neural activity and behaviour.
Funder
IVADO
NSERC
Fonds de Recherche du Québec - Santé
Canada CIFAR AI Chair program
Publisher
Public Library of Science (PLoS)
Subject
Computational Theory and Mathematics,Cellular and Molecular Neuroscience,Genetics,Molecular Biology,Ecology,Modeling and Simulation,Ecology, Evolution, Behavior and Systematics
Reference63 articles.
1. Pain-Cost and Opportunity-Cost;DI Green;The Quarterly Journal of Economics,1894
2. Rational Choice, Context Dependence, and the Value of Information in European Starlings (Sturnus vulgaris);E Freidin;Science,2011
3. Dewanto V, Dunn G, Eshragh A, Gallagher M, Roosta F. Average-reward model-free reinforcement learning: a systematic review and literature mapping; 2021. Available from: https://arxiv.org/abs/2010.08920.
4. Long-term reward prediction in TD models of the dopamine system;ND Daw;Neural computation,2002