1. Achiam, J., Knight, E., Abbeel, P.: Towards characterizing divergence in deep q-learning. arXiv:1903.08894 (2019)
2. Ahmed, Z., Roux, N.L., Norouzi, M., Schuurmans, D.: Understanding the impact of entropy on policy optimization. arXiv:1811.11214 (2019)
3. Baird, L.C., Klopf, A.H.: Technical Report WL-TR-93-1147. Wright-Patterson AIr Force Base, Ohio, Wright Laboratory (1993)
4. Boyan, J.A., Moore, A.W.: Generalization in reinforcement learning: safely approximating the value function. In: Advances in Neural Information Processing Systems, pp. 369–376 (1995)
5. Colas, C., Sigaud, O., Oudeyer, P.Y.: GEP-PG: Decoupling Exploration and Exploitation in Deep Reinforcement Learning Algorithms. arXiv:1802.05054 (2018)