1. Lin, T., Wang, Y., Liu, X., Qiu, X.: A survey of transformers. AI open 3, 111–132 (2022)
2. Vaswani, A., et al.: Attention is all you need. In: 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA
3. Mohammed, A.H., Ali, A.H.: Survey of BERT (Bidirectional encoder representation transformer) types. J. Phys. Conf. Ser. 1963(1), 012173 (2021)
4. Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding (2019). arXiv:1810.04805
5. Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach (2019). arXiv:1907.11692