1. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., and Polosukhin, I. (2017). Attention is all you need. Adv. Neural Inf. Process. Syst., 30.
2. Kenton, J.D.M.W.C., and Toutanova, L.K. (2019, January 2–7). Bert: Pre-training of deep bidirectional transformers for language understanding. Proceedings of the naacL-HLT, Minneapolis, MN, USA.
3. Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.A., Lacroix, T., Rozière, B., Goyal, N., Hambro, E., and Azhar, F. (2023). Llama: Open and efficient foundation language models. arXiv.
4. Liu, Y., Pan, D., Zhang, H., and Zhong, K. (2023). Degradation-Trend-Aware Deep Neural Network with Attention Mechanism for Bearing Remaining Useful Life Prediction. IEEE Trans. Artif. Intell., 1–15.
5. Kitaev, N., Kaiser, Ł., and Levskaya, A. (2020). Reformer: The efficient transformer. arXiv.