1. Bellman, R. (2010). Dynamic Programming, Princeton University Press. Princeton Landmarks in Mathematics and Physics.
2. Pontryagin, L. (1987). Mathematical Theory of Optimal Processes, Taylor & Francis. Classics of Soviet Mathematics.
3. Heess, N., Dhruva, T., Sriram, S., Lemmon, J., Merel, J., Wayne, G., Tassa, Y., Erez, T., Wang, Z., and Eslami, S.M.A. (2017). Emergence of Locomotion Behaviours in Rich Environments. arXiv.
4. Tassa, Y., Erez, T., and Todorov, E. (2012, January 7–12). Synthesis and stabilization of complex behaviors through online trajectory optimization. Proceedings of the 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2012, Vilamoura, Algarve, Portugal.
5. Schulman, J., Moritz, P., Levine, S., Jordan, M.I., and Abbeel, P. (2016, January 2–4). High-Dimensional Continuous Control Using Generalized Advantage Estimation. Proceedings of the 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico.