A Sojourn-Based Approach to Semi-Markov Reinforcement Learning-Reference-Cited by-同舟云学术

A Sojourn-Based Approach to Semi-Markov Reinforcement Learning

Published:2022-06-25 Issue:2 Volume:92 Page:
ISSN:0885-7474
Container-title:Journal of Scientific Computing
language:en
Short-container-title:J Sci Comput

Author:

Ascione Giacomo^ORCID,Cuomo Salvatore

Abstract

AbstractIn this paper we introduce a new approach to discrete-time semi-Markov decision processes based on the sojourn time process. Different characterizations of discrete-time semi-Markov processes are exploited and decision processes are constructed by their means. With this new approach, the agent is allowed to consider different actions depending also on the sojourn time of the process in the current state. A numerical method based on Q-learning algorithms for finite horizon reinforcement learning and stochastic recursive relations is investigated. Finally, we consider two toy examples: one in which the reward depends on the sojourn-time, according to the gambler’s fallacy; the other in which the environment is semi-Markov even if the reward function does not depend on the sojourn time. These are used to carry on some numerical evaluations on the previously presented Q-learning algorithm and on a different naive method based on deep reinforcement learning.

Funder

Ministero dell’Istruzione, dell’Università e della Ricerca

Publisher

Springer Science and Business Media LLC

Subject

Computational Theory and Mathematics,General Engineering,Theoretical Computer Science,Software,Applied Mathematics,Computational Mathematics,Numerical Analysis

Link

https://link.springer.com/content/pdf/10.1007/s10915-022-01876-x.pdf

Reference39 articles.

1. Abounadi, J., Bertsekas, D., Borkar, V.S.: Learning algorithms for Markov decision processes with average cost. SIAM J. Control. Optim. 40(3), 681–698 (2001)

2. Ascione, G., Leonenko, N., Pirozzi, E.: Non-local solvable birth-death processes. J. Theor. Probab. 35, 1284–1323 (2022)

3. Ascione, G., Leonenko, N., Pirozzi, E.: Time-non-local Pearson diffusions. J. Stat. Phys. 183(3), 1–42 (2021)

4. Asmussen, S.: Applied probability and queues, vol. 51. Springer Science & Business Media, Germany (2008)

5. Barbu, V.S., Limnios, N.: Semi-Markov chains and hidden semi-Markov models toward applications: their use in reliability and DNA analysis, vol. 191. Springer Science & Business Media, Germany (2009)

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Discrete-Time Semi-Markov Chains;Probability and Its Applications;2023

2. Intelligent air defense task assignment based on hierarchical reinforcement learning;Frontiers in Neurorobotics;2022-12-01