Risk-Sensitive Reinforcement Learning

Author:

Shen Yun1,Tobia Michael J.2,Sommer Tobias2,Obermayer Klaus3

Affiliation:

1. Technical University, 10587 Berlin, Germany

2. University Medical Center Hamburg-Eppendorf, 20246 Hamburg, Germany

3. Technical University, 10587 Berlin, Germany, and Bernstein Center for Computational Neuroscience Berlin, 10115 Berlin, Germany

Abstract

We derive a family of risk-sensitive reinforcement learning methods for agents, who face sequential decision-making tasks in uncertain environments. By applying a utility function to the temporal difference (TD) error, nonlinear transformations are effectively applied not only to the received rewards but also to the true transition probabilities of the underlying Markov decision process. When appropriate utility functions are chosen, the agents’ behaviors express key features of human behavior as predicted by prospect theory (Kahneman & Tversky, 1979 ), for example, different risk preferences for gains and losses, as well as the shape of subjective probability curves. We derive a risk-sensitive Q-learning algorithm, which is necessary for modeling human behavior when transition probabilities are unknown, and prove its convergence. As a proof of principle for the applicability of the new framework, we apply it to quantify human behavior in a sequential investment task. We find that the risk-sensitive variant provides a significantly better fit to the behavioral data and that it leads to an interpretation of the subject's responses that is indeed consistent with prospect theory. The analysis of simultaneously measured fMRI signals shows a significant correlation of the risk-sensitive TD error with BOLD signal change in the ventral striatum. In addition we find a significant correlation of the risk-sensitive Q-values with neural activity in the striatum, cingulate cortex, and insula that is not present if standard Q-values are used.

Publisher

MIT Press - Journals

Subject

Cognitive Neuroscience,Arts and Humanities (miscellaneous)

Cited by 49 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Safety Optimized Reinforcement Learning via Multi-Objective Policy Optimization;2024 IEEE International Conference on Robotics and Automation (ICRA);2024-05-13

2. Robust Quadrupedal Locomotion via Risk-Averse Policy Learning;2024 IEEE International Conference on Robotics and Automation (ICRA);2024-05-13

3. Finding the optimal exploration-exploitation trade-off online through Bayesian risk estimation and minimization;Artificial Intelligence;2024-05

4. Distributional offline continuous-time reinforcement learning with neural physics-informed PDEs (SciPhy RL for DOCTR-L);Neural Computing and Applications;2023-12-15

5. DRL Trading with CPT Actor and Truncated Quantile Critics;4th ACM International Conference on AI in Finance;2023-11-25

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3