Author:
Wen Xian,Huo Haifeng,Cui Jinhua
Abstract
<abstract><p>This paper investigates the optimality of the risk probability for finite horizon partially observable discrete-time Markov decision processes (POMDPs). The probability of the risk is optimized based on the criterion of total rewards not exceeding the preset goal value, which is different from the optimal problem of expected rewards. Based on the Bayes operator and the filter equations, the optimization problem of risk probability can be equivalently reformulated as filtered Markov decision processes. As an advantage of developing the value iteration technique, the optimality equation satisfied by the value function is established and the existence of the risk probability optimal policy is proven. Finally, an example is given to illustrate the effectiveness of using the value iteration algorithm to compute the value function and optimal policy.</p></abstract>
Publisher
American Institute of Mathematical Sciences (AIMS)
Reference25 articles.
1. N. Bauerle, U. Rieder, Markov decision processes with applications to finance, Heidelberg: Springer, 2011. https://doi.org/10.1007/978-3-642-18324-9
2. J. Janssen, R. Manca, Semi-Markov risk models for finance, insurance and reliability, New York: Springer, 2006. https://doi.org/10.1007/0-387-70730-1
3. X. P. Guo, O. Hernández-Lerma, Continuous-time Markov decision processes: Theorey and applications, Berlin: Springer-Verlag, 2009. https://doi.org/10.1007/978-3-642-02547-1
4. M. J. Sobel, The variance of discounted Markov decision processes, J. Appl. Probab., 19 (1982), 794–802. https://doi.org/10.1017/s0021900200023123
5. Y. Ohtsubo, K. Toyonaga, Optimal policy for minimizing risk models in Markov decision processes, J. Math. Anal. Appl., 271 (2002), 66–81. https://doi.org/10.1016/s0022-247x(02)00097-5