Abstract
AbstractIn this paper we introduce a new approach to discrete-time semi-Markov decision processes based on the sojourn time process. Different characterizations of discrete-time semi-Markov processes are exploited and decision processes are constructed by their means. With this new approach, the agent is allowed to consider different actions depending also on the sojourn time of the process in the current state. A numerical method based on Q-learning algorithms for finite horizon reinforcement learning and stochastic recursive relations is investigated. Finally, we consider two toy examples: one in which the reward depends on the sojourn-time, according to the gambler’s fallacy; the other in which the environment is semi-Markov even if the reward function does not depend on the sojourn time. These are used to carry on some numerical evaluations on the previously presented Q-learning algorithm and on a different naive method based on deep reinforcement learning.
Funder
Ministero dell’Istruzione, dell’Università e della Ricerca
Publisher
Springer Science and Business Media LLC
Subject
Computational Theory and Mathematics,General Engineering,Theoretical Computer Science,Software,Applied Mathematics,Computational Mathematics,Numerical Analysis
Reference39 articles.
1. Abounadi, J., Bertsekas, D., Borkar, V.S.: Learning algorithms for Markov decision processes with average cost. SIAM J. Control. Optim. 40(3), 681–698 (2001)
2. Ascione, G., Leonenko, N., Pirozzi, E.: Non-local solvable birth-death processes. J. Theor. Probab. 35, 1284–1323 (2022)
3. Ascione, G., Leonenko, N., Pirozzi, E.: Time-non-local Pearson diffusions. J. Stat. Phys. 183(3), 1–42 (2021)
4. Asmussen, S.: Applied probability and queues, vol. 51. Springer Science & Business Media, Germany (2008)
5. Barbu, V.S., Limnios, N.: Semi-Markov chains and hidden semi-Markov models toward applications: their use in reliability and DNA analysis, vol. 191. Springer Science & Business Media, Germany (2009)
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献