Finite-horizon dynamic optimisation when the terminal reward is a concave functional of the distribution of the final state-Reference-Cited by-同舟云学术

Finite-horizon dynamic optimisation when the terminal reward is a concave functional of the distribution of the final state

Published:1998-03 Issue:1 Volume:30 Page:122-136
ISSN:0001-8678
Container-title:Advances in Applied Probability
language:en
Short-container-title:Advances in Applied Probability

Author:

Collins E. J.,McNamara J. M.

Abstract

We consider a problem similar in many respects to a finite horizon Markov decision process, except that the reward to the individual is a strictly concave functional of the distribution of the state of the individual at final time T. Reward structures such as these are of interest to biologists studying the fitness of different strategies in a fluctuating environment. The problem fails to satisfy the usual optimality equation and cannot be solved directly by dynamic programming. We establish equations characterising the optimal final distribution and an optimal policy π*. We show that in general π* will be a Markov randomised policy (or equivalently a mixture of Markov deterministic policies) and we develop an iterative, policy improvement based algorithm which converges to π*. We also consider an infinite population version of the problem, and show that the population cannot do better using a coordinated policy than by each individual independently following the individual optimal policy π*.

Publisher

Cambridge University Press (CUP)

Subject

Applied Mathematics,Statistics and Probability

Reference17 articles.

1. Optimal Mixed Strategies in Stochastic Environments

2. The variance of discounted Markov decision processes

3. ON POPULATION GROWTH IN A RANDOMLY VARYING ENVIRONMENT

Cited by 7 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Markov decision processes with risk-sensitive criteria: an overview;Mathematical Methods of Operations Research;2024-04

2. More Risk-Sensitive Markov Decision Processes;Mathematics of Operations Research;2014-02

3. Markov Decision Processes with Average-Value-at-Risk criteria;Mathematical Methods of Operations Research;2011-09-28

4. Optimal policies for constrained average-cost Markov decision processes;TOP;2009-07-23

5. High-order extensions of the Double Chain Markov Model;Stochastic Models;2002-05-30