What is an Optimal Policy in Time-Average MDP?-Reference-Cited by-同舟云学术

What is an Optimal Policy in Time-Average MDP?

Published:2023-09-28 Issue:2 Volume:51 Page:30-32
ISSN:0163-5999
Container-title:ACM SIGMETRICS Performance Evaluation Review
language:en
Short-container-title:SIGMETRICS Perform. Eval. Rev.

Author:

Gast Nicolas¹,Gaujal Bruno¹,Khun Kimang¹

Affiliation:

1. Univ. Grenoble Alpes, Inria, Grenoble, France

Abstract

This paper discusses the notion of optimality for time-average MDPs. We argue that while most authors claim to use the "average reward" criteria, the notion that is implicitly used is in fact the notion of what we call Bellman optimality. We show that it does not coincide with other existing notions of optimality, like gain-optimality and bias-optimality but has strong connection with canonical-policies (policies that are optimal for any finite horizons) as well as value iteration and policy iterations algorithms.

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Networks and Communications,Hardware and Architecture,Software

Link

https://dl.acm.org/doi/pdf/10.1145/3626570.3626582

Reference6 articles.

1. Handbook of Markov Decision Processes

2. Nicolas Gast , Bruno Gaujal , and Kimang Khun . Computing Whittle (and Gittins) index in subcubic time. arXiv preprint arXiv:2203.05207 , 2022 . Nicolas Gast, Bruno Gaujal, and Kimang Khun. Computing Whittle (and Gittins) index in subcubic time. arXiv preprint arXiv:2203.05207, 2022.

3. Martin L. Puterman . Markov Decision Processes: Discrete Stochastic Dynamic Programming . John Wiley & Sons, Inc. , USA , 2 nd edition, 2005 . Martin L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, Inc., USA, 2nd edition, 2005.

4. The Functional Equations of Undiscounted Markov Renewal Programming

5. Richard S Sutton and Andrew G Barto . Reinforcement learning: An introduction . MIT press , 2018 . Richard S Sutton and Andrew G Barto. Reinforcement learning: An introduction. MIT press, 2018.