1. Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction (2nd ed.). MIT Press.
2. Fazel, M., Ge, R., Kakade, S., & Mesbahi, M. (2018). Global convergence of policy gradient methods for the linear quadratic regulator. In Proceedings of the 35th international conference on machine learning. vol. 80, pp. 1467–1476.
3. Bu, J., Mesbahi, A., Fazel, M., & Mesbahi, M. (2019). LQR through the lens of first order methods: Discrete-time case. arXiv:1907.08921 (arXiv e-preprint).
4. Hu, B., Zhang, K., Li, N., Mesbahi, M., Fazel, M., & Başar, T. (2022). Towards a theoretical foundation of policy optimization for learning control policies. Annual Review of Control, Robotics, and Autonomous Systems, 6(1), 123–158. https://doi.org/10.1146/annurev-control-042920-020021
5. Mohammadi, H., Zare, A., Soltanolkotabi, M., & Jovanovic, M. R. (2022). Convergence and sample complexity of gradient methods for the model-free linear-quadratic regulator problem. IEEE Transactions on Automatic Control, 67(5), 2435–2450.