1. Achiam J (2016) Easy monotonic policy iteration [J]. arXiv:1602. 09118
2. Duan Y, Chen X, Houthooft R, et al. (2016) Benchmarking deep reinforcement learning for continuous control [J]. Proceedings of The 33rd International Conference on Machine Learning, p 1329–1338
3. Haviv M, Van Der Heyden, L (1984) Perturbation bounds for the stationary probabilities of a finite markov chain. Adv Appl Probab 16(4):804–818. ISSN 00018678. URL
http://www.jstor.org/stable/142734
4. Kakade, Sham (2001a) A natural policy gradient. In: NIPS, volume 14, p 1531–1538
5. Kakade S, Langford J (2002) Approximately optimal approximate reinforcement learning. Nineteenth International Conference on Machine Learning. Morgan Kaufmann Publishers Inc., p 267–274