1. Achiam, J., et al.: Gpt-4 technical report. arXiv preprint arXiv:2303.08774 (2023)
2. Feng, Y., Chen, X., Lin, B.Y., Wang, P., Yan, J., Ren, X.: Scalable multi- hop relational reasoning for knowledge-aware question answering. arXiv preprint arXiv:2005.00646 (2020)
3. Juliani, A.: Simple reinforcement learning with tensorflow part 8: asynchronous actor-critic agents (A3C). Medium, Dated: Dec 16 (2016)
4. Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: a survey. J. Artif. Intel. Res. 4, 237–285 (1996)
5. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)