1. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L. and Polosukhin, I. (2017) Attention Is All You Need. Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, 4-9 December 2017, 6000-6010.
2. Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) network
3. Devlin, J., Chang, M.-W., Lee, K. and Toutanova, K. (2018) BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. Proceedings of NAACL-HLT 2019, Minneapolis, 2-7 June 2019, 4171-4186.
4. Hierarchical Transformers for Long Document Classification
5. Beltagy, I., Peters, M.E. and Cohan, A. (2020) Longformer: The Long-Document Transformer.