1. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.
2. Shaw, P., Uszkoreit, J., Vaswani, A.(2018). Self-attention with relative position representations. arXiv preprint arXiv:1803.02155.
3. Ghojogh B., Ghodsi, A. (2020). Attention mechanism, transformers, bert, and gpt: tutorial and survey
4. Devlin, J., Chang, M.-W., Lee, K., Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. ArXiv preprint arXiv:1810.04805.
5. Liu, Q., Kusner, M. J., Blunsom, P. (2020). A survey on contextual Embeddings. arXiv preprint arXiv:2003.07278.