1. Younes, H. L. and Littman, M. L. 2004. PPDDL1.0: An extension to PDDL for expressing planning domains with probabilistic effects. Techn. Rep. CMU-CS-04-162.
2. Answer set programming for non-stationary Markov decision processes
3. Wang, Y. 2020. ywang485/pbcplus2mdp: pbcplus2mdp v0.1.
4. Watkins, C. J. C. H. 1989. Learning from Delayed Rewards. Ph.D. thesis, King’s College, Cambridge, UK.