1. A. Juliani, V. P. Berges, E. Vckay, Y. Gao, H. Henry, M. Mattar, D. Lange, Unity: A general platform for intelligent agents, arXiv preprint arXiv:1809.02627v2 (2020).
2. J. Schulman, F. Wolski, P. Dhariwal, A. Radford, O. Klimov, Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017).
3. T. Haarnoja, A. Zhou, P. Abbeel, S. Levine, Soft actorcritic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research, 80 (2018) 1861–1870.
4. J. Ho, S. Ermon, Generative adversarial imitation learning. Advances in neural information processing systems, (2016) 4565–4573.
5. A. Hussein, M. M. Gaber, E. Elyan, C. Jayne, Imitation Learning: A Survey of Learning Methods. ACM Computing Surveys (CSUR), 50(2) (2017) 1–35 https://doi.org/10.1145/3054912.