1. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., and Polosukhin, I. (2017, January 4–9). Attention Is All You Need. Proceedings of the Advances in Neural Information Processing Systems 30 (NIPS 2017), Long Beach, CA, USA.
2. Yenduri, G., Srivastava, G., Maddikunta, P.K.R., Jhaveri, R.H., Wang, W., Vasilakos, A.V., and Gadekallu, T.R. (2023). Generative Pre-Trained Transformer: A Comprehensive Review on Enabling Technologies, Potential Applications, Emerging Challenges, and Future Directions. arXiv.
3. Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019). BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. arXiv.
4. Zeng, K.G., Dutt, T., Witowski, J., Kranthi Kiran, G.V., Yeung, F., Kim, M., Kim, J., Pleasure, M., Moczulski, C., and Lopez, L.J.L. (2023). Improving Information Extraction from Pathology Reports Using Named Entity Recognition. Res. Sq., rs.3.rs-3035772.
5. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., and Gelly, S. (2021). An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv.