1. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I. Attention is all you need. Adv neural Inf Process Syst. 2017;30.
2. Vajjala S, Majumder B, Gupta A, Surana H. Practical natural language processing: a comprehensive guide to building real-world NLP systems. O'Reilly Media; 2020.
3. Devlin J, Chang M, Lee K, Toutanova K. BERT: pre-training of deep bidirectional transformers for language understanding. North American Chapter of the Association for Computational Linguistics; 2019.
4. Radford A, et al. Language models are unsupervised multitask learners. OpenAI blog. 2019;1(8):9.
5. Yang Z, Dai Z, Yang Y, Carbonell J, Salakhutdinov RR, Le QV. Xlnet: Generalized autoregressive pretraining for language understanding. Adv neural Inf Process Syst. 2019;32.