1. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser Ł, Polosukhin I. Attention is all you need. In: Proceedings of the 31st Conference on Neural Information Processing Systems. 2017, 5998–6008
2. Guo M, Zhang Y, Liu T. Gaussian transformer: a lightweight approach for natural language inference. In: Proceedings of the 33rd AAAI Conference on Artificial Intelligence. 2019, 6489–6496
3. Yu A W, Dohan D, Luong M T, Zhao R, Chen K, Norouzi M, Le Q V. QANet: combining local convolution with global self-attention for reading comprehension. In: Proceedings of the 6th International Conference on Learning Representations. 2018
4. Dai Z, Yang Z, Yang Y, Carbonell J, Le Q V, Salakhutdinov R. Transformer-XL: attentive language models beyond a fixed-length context. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019, 2978–2988
5. Devlin J, Chang M W, Lee K, Toutanova K. BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019, 4171–4186