Task Learnability Modulates Surprise but Not Valence Processing for Reinforcement Learning in Probabilistic Choice Tasks-Reference-Cited by-同舟云学术

Task Learnability Modulates Surprise but Not Valence Processing for Reinforcement Learning in Probabilistic Choice Tasks

Published:2021-12-06 Issue:1 Volume:34 Page:34-53
ISSN:0898-929X
Container-title:Journal of Cognitive Neuroscience
language:en
Short-container-title:

Author:

Wurm Franz¹²³,Walentowska Wioleta⁴⁵,Ernst Benjamin¹,Severo Mario Carlo⁵,Pourtois Gilles⁵,Steinhauser Marco¹

Affiliation:

1. Catholic University of Eichstätt-Ingolstadt, Germany

2. Leiden University

3. Leiden Institute for Brain and Cognition

4. Jagiellonian University, Krakow, Poland

5. Ghent University

Abstract

Abstract The goal of temporal difference (TD) reinforcement learning is to maximize outcomes and improve future decision-making. It does so by utilizing a prediction error (PE), which quantifies the difference between the expected and the obtained outcome. In gambling tasks, however, decision-making cannot be improved because of the lack of learnability. On the basis of the idea that TD utilizes two independent bits of information from the PE (valence and surprise), we asked which of these aspects is affected when a task is not learnable. We contrasted behavioral data and ERPs in a learning variant and a gambling variant of a simple two-armed bandit task, in which outcome sequences were matched across tasks. Participants were explicitly informed that feedback could be used to improve performance in the learning task but not in the gambling task, and we predicted a corresponding modulation of the aspects of the PE. We used a model-based analysis of ERP data to extract the neural footprints of the valence and surprise information in the two tasks. Our results revealed that task learnability modulates reinforcement learning via the suppression of surprise processing but leaves the processing of valence unaffected. On the basis of our model and the data, we propose that task learnability can selectively suppress TD learning as well as alter behavioral adaptation based on a flexible cost–benefit arbitration.

Funder

Fonds Wetenschappelijk Onderzoek

Narodowa Agencja Wymiany Akademickiej

National Science Centre of Poland

Publisher

MIT Press - Journals

Subject

Cognitive Neuroscience

Link

https://direct.mit.edu/jocn/article-pdf/34/1/34/1976366/jocn_a_01777.pdf

Reference92 articles.

1. Medial prefrontal cortex as an action–outcome predictor;Alexander;Nature Neuroscience,2011