1. Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Mané, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Viégas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y., Zheng, X.: TensorFlow: Large-scale machine learning on heterogeneous systems. https://www.tensorflow.org/(2015)
2. Abdolmaleki, A., Simões, D., Lau, N., Reis, L. P., Neumann, G., Sarıel, S., Lee, D.D.: Learning a Humanoid Kick with Controlled Distance. In: Behnke, S., Sheh, R. (eds.) Robocup 2016: Robot world cup XX, 45–57. Springer International Publishing, Cham (2017)
3. Abreu, M., Reis, L.P., Lau, N.: Learning to Run Faster in a Humanoid Robot Soccer Environment through Reinforcement Learning. In: Chalup, S., Niemueller, T., Suthakorn, J., Williams, M.A. (eds.) Robocup 2019: Robot World Cup XXIII, 3-15. Springer International Publishing, Cham (2019)
4. Boedecker, J., Asada, M.: Simspark–concepts and application in the robocup 3d soccer simulation league SIMPAR-2008 Workshop on The Universe of RoboCup Simulators (2008)
5. Brafman, R. I., Tennenholtz, M.: R-MAX – a general polynomial time algorithm for near-optimal reinforcement learning. J. Mach. Learn. Res. 3(Oct), 213–231 (2002)