Author:
Dezfouli Amir,Balleine Bernard W.,Nock Richard
Abstract
AbstractWithin a rational framework a decision-maker selects actions based on the reward-maximisation principle which stipulates they acquire outcomes with the highest values at the lowest cost. Action selection can be divided into two dimensions: selecting an action from several alternatives, and choosing its vigor, i.e., how fast the selected action should be executed. Both of these dimensions depend on the values of the outcomes, and these values are often affected as more outcomes are consumed, and so are the actions. Despite this, previous works have addressed the computational substrates of optimal actions only in the specific condition that the values of outcomes are constant, and it is still unknown what the optimal actions are when the values of outcomes are non-stationary. Here, based on an optimal control framework, we derive a computational model for optimal actions under non-stationary outcome values. The results imply that even when the values of outcomes are changing, the optimal response rate is constant rather than decreasing. This finding shows that, in contrast to previous theories, the commonly observed changes in the actions cannot be purely attributed to the changes in the outcome values. We then prove that this observation can be explained based on the uncertainty about temporal horizons; e.g., in the case of experimental protocols, the session duration. We further show that when multiple outcomes are available, the model explains probability matching as well as maximisation choice strategies. The model provides, therefore, a quantitative analysis of optimal actions and explicit predictions for future testing.
Publisher
Cold Spring Harbor Laboratory