1. Attention is all you need;Vaswani;Adv. Neural Inf. Process. Syst.,2017
2. Mars, M. (2022). From Word Embeddings to Pre-Trained Language Models: A State-of-the-Art Walkthrough. Appl. Sci., 12.
3. Garrido-Muñoz, I., Montejo-Ráez, A., Martínez-Santiago, F., and Ureña-López, L.A. (2021). A survey on bias in deep NLP. Appl. Sci., 11.
4. Devlin, J., Chang, M.W., Lee, K., and Toutanova, K. (2019, January 2–7). BERT. Proceedings of the NAACL HLT 2019—2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA.
5. Sun, Z., Yu, H., Song, X., Liu, R., Yang, Y., and Zhou, D. (2020). MobileBERT. arXiv.