1. Puterman, M.L.: Markov decision processes: discrete stochastic dynamic programming, vol. 414. Wiley. com (2009)
2. Dietterich, T.G.: The maxq method for hierarchical reinforcement learning. In: ICML, pp. 118–126. Citeseer (1998)
3. Bai, A., Wu, F., Chen, X.: Online planning for large mdps with maxq decomposition. In: Proceedings of the Autonomous Robots and Multirobot Systems Workshop, at AAMAS 2012 (June 2012)
4. LNAI;A. Bai,2013
5. LNAI;H. Akiyama,2013