1. Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A.N., Kaiser L., Polosukhin I., Attention Is All You Need (2023), arXiv:1706.03762
2. Brown T.B., Mann B., Ryder N., Subbiah M., Kaplan J., Dhariwal P., Neelakantan A., Shyam P., Sastry G., Askell A. et al., Language Models are Few-Shot Learners (2020), arXiv:2005.14165
3. LLaMA: Open and Efficient Foundation Language Models, author=Hugo Touvron and Thibaut Lavril and Gautier Izacard and Xavier Martinet and Marie-Anne Lachaux and Timothée Lacroix and Baptiste Rozière and Naman Goyal and Eric Hambro and Faisal Azhar and Aurelien Rodriguez and Armand Joulin and Edouard Grave and Guillaume Lample (2023), arXiv:2302.13971
4. Devlin J., Chang M.W., Lee K., Toutanova K., BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (2019), arXiv:1810.04805
5. Li G., Zhao X., Wang X., Quantum Self-Attention Neural Networks for Text Classification (2023), arXiv:2205.05625