1. Abbeel, P., Quigley, M., Ng, A.Y.: Using inaccurate models in reinforcement learning. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 1–8 (2006)
2. Bajaj, C., Nguyen, M.: DPO: differential reinforcement learning with application to optimal configuration search (2024). arXiv:2404.15617
3. Bergstra, J., Bardenet, R., Bengio, Y., Kégl, B.: Algorithms for hyper-parameter optimization. In: Shawe-Taylor, J., Zemel, R., Bartlett, P., Pereira, F., Weinberger, K.Q. (eds.) Adv. Neural Inf. Process. Syst. 24 (2011) (Curran Associates, Inc.)
4. Böttcher, L., Antulov-Fantulin, N., Asikis, T.: AI pontryagin or how artificial neural networks learn to control dynamical systems. Nat. Commun. 13(1), 333 (2022)
5. Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Tang, J., Zaremba, W.: OpenAI gym. John Schulman (2016)