Author:
Baykal-Gürsoy M.,Gürsoy K.
Abstract
Considered are semi-Markov decision processes (SMDPs) with finite state and action spaces. We study two criteria: the expected average reward per unit time subject to a sample path constraint on the average cost per unit time and the expected time-average variability. Under a certain condition, for communicating SMDPs, we construct (randomized) stationary policies that are ε-optimal for each criterion; the policy is optimal for the first criterion under the unichain assumption and the policy is optimal and pure for a specific variability function in the second criterion. For general multichain SMDPs, by using a state space decomposition approach, similar results are obtained.
Publisher
Cambridge University Press (CUP)
Subject
Industrial and Manufacturing Engineering,Management Science and Operations Research,Statistics, Probability and Uncertainty,Statistics and Probability
Cited by
15 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献