1. Attention is all you need;Vaswani,2017
2. Character-level language modeling with deeper self-attention;Al-Rfou,2019
3. Bert: Pre-training of deep bidirectional transformers for language understanding;Devlin,2019
4. Transformer-XL: Attentive language models beyond a fixed-length context;Dai,2019
5. Y. Lu, Z. Li, D. He, Z. Sun, B. Dong, T. Qin, L. Wang, T.-Y. Liu, Understanding and improving transformer from a multi-particle dynamic system point of view, arXiv preprint arXiv:1906.02762 (2019).