SIFTER-Reference-Cited by-同舟云学术

SIFTER

Published:2022-09 Issue:1 Volume:16 Page:90-98
ISSN:2150-8097
Container-title:Proceedings of the VLDB Endowment
language:en
Short-container-title:Proc. VLDB Endow.

Author:

Skitsas Konstantinos¹,Papageorgiou Ioannis G.²,Talebi Mohammad Sadegh³,Kantere Verena²,Katehakis Michael N.⁴,Karras Panagiotis¹

Affiliation:

1. Aarhus University

2. NTU Athens

3. University of Copenhagen

4. Rutgers University

Abstract

Can we solve finite-horizon Markov decision processes (FHMDPs) while raising low memory requirements? Such models find application in many cases where a decision-making agent needs to act in a probabilistic environment, from resource management to medicine to service provisioning. However, computing optimal policies such an agent should follow by dynamic programming value iteration raises either prohibitive space complexity, or, in reverse, non-scalable time complexity requirements. This scalability question has been largely neglected. In this paper, we propose SIFTER (Space Efficient Finite Horizon MDPs), a suite of algorithms that achieve a golden middle between space and time requirements. Our former algorithm raises space complexity growing with the square root of the horizon's length without a time-complexity overhead, while the latter's space requirements depend only logarithmically in horizon length with a corresponding logarithmic time complexity overhead. A thorough experimental study under diverse settings confirms that SIFTER algorithms achieve the predicted gains, while approximation techniques do not achieve the same combination of time efficiency, space efficiency, and result quality.

Publisher

Association for Computing Machinery (ACM)

Subject

General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development

Link

https://dl.acm.org/doi/pdf/10.14778/3561261.3561269

Reference31 articles.

1. Mohammad Gheshlaghi Azar , Ian Osband , and Rémi Munos . 2017 . Minimax Regret Bounds for Reinforcement Learning . In ICML (Proc. of Machine Learning Research , Vol. 70). 263-- 272 . Mohammad Gheshlaghi Azar, Ian Osband, and Rémi Munos. 2017. Minimax Regret Bounds for Reinforcement Learning. In ICML (Proc. of Machine Learning Research, Vol. 70). 263--272.

2. A finite-horizon Markov decision process model for cancer chemotherapy treatment planning: an application to sequential treatment decision making in clinical trials

3. A Markovian Decision Process;Bellman Richard E.;Journal of Mathematics and Mechanics,1957

4. Dimitri P. Bertsekas . 2017. Dynamic programming and optimal control ( 4 th ed.). Vol. 1 . Athena Scientific . Dimitri P. Bertsekas. 2017. Dynamic programming and optimal control (4th ed.). Vol. 1. Athena Scientific.

5. Hippolyte Bourel , Odalric Maillard , and Mohammad Sadegh Talebi . 2020 . Tightening Exploration in Upper Confidence Reinforcement Learning . In ICML (Proc. of Machine Learning Research , Vol. 119). 1056-- 1066 . Hippolyte Bourel, Odalric Maillard, and Mohammad Sadegh Talebi. 2020. Tightening Exploration in Upper Confidence Reinforcement Learning. In ICML (Proc. of Machine Learning Research, Vol. 119). 1056--1066.

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Endorsers measurement for decarbonised processed food supply chain through newly befitted interval valued neutrosophic vague sets;Annals of Operations Research;2024-04-03

2. Optimal activation of halting multi‐armed bandit models;Naval Research Logistics (NRL);2023-08-16