1. Alekh A, Kakade Sham M, Lee Jason D, Gaurav M (2021) On the theory of policy gradient methods: optimality, approximation, and distribution shift. J Mach Learn Res 22(98):1–76
2. Amari SI (1998) Natural gradient works efficiently in learning. Neural Comput 10(2):251–276
3. Amari SI, Douglas SC (1998) Why natural gradient? In Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP’98 (Cat. No. 98CH36181), 2:1213–1216. IEEE
4. Aoki M (2016) Optimization of stochastic systems: topics in discrete-time systems. Elsevier, Amsterdam
5. Bertsekas Dimitri P (1995) Dynamic programming and optimal control, vol 1. Athena scientific, Belmont