1. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. arXiv:1706.03762 (2017)
2. Amatriain, X., Sankar, A., Bing, J., Bodigutla, P.K., Hazen, T.J., Kazi, M: Transformer models: an introduction and catalog. arXiv:2302.07730 (2023)
3. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805 (2018)
4. Aakanksha Chowdhery, A. et al.: PaLM: scaling language modeling with pathways. arXiv:2204.02311 (2022)
5. Zong, M., Krishnamachari, B.: A survey on GPT-3. arXiv:2212.00857 (2022)