1. Achiam, J., Held, D., Tamar, A., & Abbeel, P. (2017). Constrained policy optimization. In Proceedings of the 34th international conference on machine learning (pp. 22–31).
2. Agarwal, R., Schwarzer, M., Castro, P.S., Courville, A.C., & Bellemare, M. (2021). Deep reinforcement learning at the edge of the statistical precipice. In: Proceedings of the 35th conference on neural information processing systems (pp. 29304–29320).
3. Anschel, O., Baram, N., & Shimkin, N. (2017). Averaged-DQN: variance reduction and stabilization for deep reinforcement learning. In Proceedings of the 34th international conference on machine learning (pp. 176–185).
4. Csiszár, I. (1964). Eine informationstheoretische ungleichung und ihre anwendung auf beweis der ergodizitaet von markoffschen ketten. Magyer Tud. Akadémia Matematikai Kutató Intézetének Közleményei, 8, 85–108.
5. Dasagi, V., Bruce, J., Peynot, T., & Leitner, J. (2019). Ctrl-z: recovering from instability in reinforcement learning. CoRR arXiv:1910.03732 .