1. Agarwal, A., Kakade, S.M., Jason D.L., Mahajan, G.: Optimality and approximation with policy gradient methods in Markov decision processes. In: Conference on Learning Theory, PMLR, pp. 64–66 (2020)
2. Agazzi, A., Lu, J.: Global optimality of softmax policy gradient with single hidden layer neural networks in the mean-field regime. arXiv preprint arXiv:2010.11858 (2020)
3. Baird, L.: Residual algorithms: reinforcement learning with function approximation. In: Machine Learning Proceedings, vol. 30–37 (1995)
4. Bardi, M., Capuzzo-Dolcetta, I.: Optimal Control and Viscosity Solutions of Hamilton-Jacobi-Bellman Equations. Birkhäuser, Boston (2018)
5. Barto, A.G., Sutton, R.S., Anderson, C.W.: Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Trans. Syst. Man Cybern. Syst. 5, 834–846 (1983)