1. Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming, vol. 414. Wiley, Hoboken (2009)
2. Lecture Notes in Computer Science;H Zhang,2014
3. Dietterich, T.G.: The maxq method for hierarchical reinforcement learning. In: ICML, pp. 118–126. Citeseer (1998)
4. Bai, A., Wu, F., Chen, X.: Online planning for large mdps with maxq decomposition. In: Proceedings of the Autonomous Robots and Multirobot Systems Workshop, at AAMAS 2012, June 2012
5. Lecture Notes in Computer Science;A Bai,2013