Undermatching is a consequence of policy compression-Reference-Cited by-同舟云学术

Undermatching is a consequence of policy compression

Published:2022-05-29 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Bari Bilal A.^ORCID,Gershman Samuel J.^ORCID

Abstract

AbstractThe matching law describes the tendency of agents to match the ratio of choices allocated to the ratio of rewards received when choosing among multiple options (Herrnstein, 1961). Perfect matching, however, is infrequently observed. Instead, agents tend to undermatch, or bias choices towards the poorer option. Overmatching, or the tendency to bias choices towards the richer option, is rarely observed. Despite the ubiquity of undermatching, it has received an inadequate normative justification. Here, we assume agents not only seek to maximize reward, but also seek to minimize cognitive cost, which we formalize as policy complexity (the mutual information between actions and states of the environment). Policy complexity measures the extent to which an agent’s policy is state-dependent. Our theory states that capacity-constrained agents (i.e., agents that must compress their policies to reduce complexity), can only undermatch or perfectly match, but not overmatch, consistent with the empirical evidence. Moreover, we validate a novel prediction about which task conditions exaggerate undermatching. Finally, we argue that a reduction in undermatching with higher dopamine levels in patients with Parkinson’s disease is consistent with an increased policy complexity.Significance statementThe matching law describes the tendency of agents to match the ratio of choices allocated to different options to the ratio of reward received. For example, if option A yields twice as much reward as option B, matching states that agents will choose option A twice as much. However, agents typically undermatch: they choose the poorer option more frequently than expected. Here, we assume that agents seek to simultaneously maximize reward and minimize the complexity of their action policies. We show that this theory explains when and why undermatching occurs. Neurally, we show that policy complexity, and by extension undermatching, is controlled by tonic dopamine, consistent with other evidence that dopamine plays an important role in cognitive resource allocation.

Publisher

Cold Spring Harbor Laboratory

Reference62 articles.

1. The generalized matching law as a predictor of choice between cocaine and food in rhesus monkeys

2. OVERMATCHING IN RATS: THE BARRIER CHOICE PARADIGM

3. Dynamic decision making and value computations in medial frontal cortex;International Review of Neurobiology,2021

4. Stable representations of decision variables for flexible behavior;Neuron,2019

5. ON TWO TYPES OF DEVIATION FROM THE MATCHING LAW: BIAS AND UNDERMATCHING1

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Bayesian Reinforcement Learning With Limited Cognitive Load;Open Mind;2024

2. Mechanisms of adjustments to different types of uncertainty in the reward environment across mice and monkeys;2022-10-04