Abstract
Formulae are presented for the variance and higher moments of the present value of single-stage rewards in a finite Markov decision process. Similar formulae are exhibited for a semi-Markov decision process. There is a short discussion of the obstacles to using the variance formula in algorithms to maximize the mean minus a multiple of the standard deviation.
Publisher
Cambridge University Press (CUP)
Subject
Statistics, Probability and Uncertainty,General Mathematics,Statistics and Probability
Cited by
15 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. The optimal probability of the risk for finite horizon partially observable Markov decision processes;AIMS Mathematics;2023
2. Risk-Aware Reinforcement Learning for Multi-Period Portfolio Selection;Machine Learning and Knowledge Discovery in Databases;2023
3. IGN : Implicit Generative Networks;2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA);2022-12
4. Distributional Actor-Critic Ensemble for Uncertainty-Aware Continuous Control;2022 International Joint Conference on Neural Networks (IJCNN);2022-07-18
5. Risk-sensitive reinforcement learning;Proceedings of the First ACM International Conference on AI in Finance;2020-10-15