Optimizing Stochastic Control through State TransitionSeparability and Resource-Utility Exchange-Reference-Cited by-同舟云学术

Optimizing Stochastic Control through State TransitionSeparability and Resource-Utility Exchange

Published:2024-09-05 Issue:2 Volume:52 Page:30-32
ISSN:0163-5999
Container-title:ACM SIGMETRICS Performance Evaluation Review
language:en
Short-container-title:SIGMETRICS Perform. Eval. Rev.

Author:

Liu Larkin¹,Liu Shiqi²,Jusup Matej³

Affiliation:

1. Technische Universitat Munchen, , APO AA

2. Ecole Polytechnique, , APO AA

3. ETH Z¨urich, , APO AA

Abstract

In the realm of stochastic control, particularly in the fields of economics and engineering, Markov Decision Processes (MDP's) are employed to represent various processes ranging from asset management to transportation logistics. Upon closer examination these constrained MDP's often exhibit specific causal structures concerning the dynamics of transitions and rewards. Thus, leveraging this structure can facilitate computational simplifications for determining the optimal policy. This study introduces a framework, which we denote as SD-MDP, in which we disentangle the causal structure of state transition and reward function dynamics. Through this method, we are able to establish theoretical guarantees on improvements in computational efficiency compared to standard MDP solver (such as linear programming). We further derive error bounds on the optimal value approximation via Monte Carlo simulation for this family of stochastic control problems.

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/3695411.3695423

Reference4 articles.

1. Bandit Processes and Dynamic Allocation Indices

2. G.H. Hardy J.E. Littlewood and G. P´olya. 1952. Inequalities. Cambridge Mathematical Library. Cambridge University Press. isbn: 9780521358804.

3. Yangyi Lu, Amirhossein Meisami, and Ambuj Tewari. 2022. Efficient reinforcement learning with prior causal knowledge. In Conference on Causal Learning and Reasoning. PMLR, 526--541.

4. Progressive hedging innovations for a class of stochastic mixed-integer resource allocation problems