Affiliation:
1. Department of Statistical Sciences University of Toronto Toronto Canada
2. Oxford‐Man Institute University of Oxford Oxford United Kingdom
Abstract
AbstractWe develop an approach for solving time‐consistent risk‐sensitive stochastic optimization problems using model‐free reinforcement learning (RL). Specifically, we assume agents assess the risk of a sequence of random variables using dynamic convex risk measures. We employ a time‐consistent dynamic programming principle to determine the value of a particular policy, and develop policy gradient update rules that aid in obtaining optimal policies. We further develop an actor–critic style algorithm using neural networks to optimize over policies. Finally, we demonstrate the performance and flexibility of our approach by applying it to three optimization problems: statistical arbitrage trading strategies, financial hedging, and obstacle avoidance robot control.
Funder
Natural Sciences and Engineering Research Council of Canada
Subject
Applied Mathematics,Economics and Econometrics,Social Sciences (miscellaneous),Finance,Accounting
Reference67 articles.
1. Acciaio B. &Penner I.(2011).Dynamic risk measures. InAdvanced mathematical methods for finance(pp. 1–34). Springer.
2. On the theory of policy gradient methods: Optimality, approximation, and distribution shift;Agarwal A.;Journal of Machine Learning Research,2021
3. Constrained Risk-Averse Markov Decision Processes
4. Al‐Aradi A. Correia A. Naiff D. Jardim G. &Saporito Y.(2018).Solving nonlinear and high‐dimensional partial differential equations via deep learning.arXiv preprint arXiv:1811.08782.
5. Coherent Measures of Risk
Cited by
7 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献