1. Agarwal, A., Kakade, S.M., Lee, J.D., Mahajan, G.: On the theory of policy gradient methods: optimality, approximation, and distribution shift. arXiv:1908.00261 (2019)
2. Beck, A., Teboulle, M.: Mirror descent and nonlinear projected subgradient methods for convex optimization. SIAM J. Optim. 27, 927–956 (2003)
3. Bellman, R., Dreyfus, S.: Functional approximations and dynamic programming. Math. Tables Other Aids Comput. 13(68), 247–251 (1959)
4. Bhandari, J., Russo, D.: A Note on the Linear Convergence of Policy Gradient Methods. arXiv e-prints arXiv:2007.11120 (2020)
5. Cen, S., Cheng, C., Chen, Y., Wei, Y., Chi, Y.: Fast Global Convergence of Natural Policy Gradient Methods with Entropy Regularization. arXiv e-prints arXiv:2007.06558 (2020)