1. Achbany, Y., Fouss, F., Yen, L., Pirotte, A., Saerens, M.: Tuning continual exploration in reinforcement learning: an optimality property of the Boltzmann strategy. Neurocomputing 71(13–15), 2507–2520 (2008)
2. Barth-Maron, G., et al.: Distributed distributional deterministic policy gradients. arXiv preprint arXiv:1804.08617 (2018)
3. Brockman, G., et al.: OpenAI gym. arXiv preprint arXiv:1606.01540 (2016)
4. Cesa-Bianchi, N., Gentile, C., Lugosi, G., Neu, G.: Boltzmann exploration done right. In: Advances in Neural Information Processing Systems, pp. 6284–6293 (2017)
5. Fujimoto, S., Hoof, H., Meger, D.: Addressing function approximation error in actor-critic methods. In: International Conference on Machine Learning, pp. 1582–1591 (2018)