1. Asmuth, J., Li, L., Littman, M. L., Nouri, A., & Wingate, D. (2009). A Bayesian sampling approach to exploration in reinforcement learning. In Proceedings of conference on uncertainty in artificial intelligence (pp. 19–26). AUAI Press.
2. Atkeson, C. G., & Santamaria, J. C. (1997). A comparison of direct and model-based reinforcement learning. In Proceedings of international conference on robotics and automation (Vol. 4, pp. 3557–3564). IEEE.
3. Bertsekas, D. P., & Tsitsiklis, J. N. (1996). Neuro-dynamic programming. Belmont, MA: Athena Scientific.
4. Bishop, C. M. (2006). Pattern recognition and machine learning (information science and statistics). Secaucus, NJ: Springer.
5. Deisenroth, M., & Rasmussen, C. E. (2011). PILCO: A model-based and data-efficient approach to policy search. In Proceedings international conference on machine learning (pp. 465–472).