1. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, I. Polosukhin, Attention is all you need, in Advances in neural information processing systems 30. ed. by I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, R. Garnett (Curran Associates Inc, USA, 2017)
2. A. Sherstinsky, Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Phys. D 404, 132306 (2020)
3. J.D.M.W.C. Kenton, L.K. Toutanova, BERT: Pre-training of deep bidirectional transformers for language understanding, in Proceedings of NAACL-HLT. (Association for Computational Linguistics, USA, 2019), pp.4171–4186
4. T. Brown, B. Mann, N. Ryder, M. Subbiah, J.D. Kaplan, P. Dhariwal et al., Language models are few-shot learners. Adv. Neural. Inf. Process. Syst. 33, 1877–1901 (2020)
5. J. Wu, R. Antonova, A. Kan, M. Lepert, A. Zeng, S. Song et al., Tidybot: personalized robot assistance with large language models. Auton. Robot. 47(8), 1087–1102 (2020)