1. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is all you need. Advances in Neural Information Processing Systems. 2017;. DOI: 10.48550/arXiv.1706.03762
2. Devlin J, Chang M-W, Lee K, Toutanova K. Bert: Pre-Training of Deep Bidirectional Transformers for Language Understanding. arXiv preprint arXiv:1810.04805. 2018. DOI: 10.48550/arXiv.1810.04805
3. Lei S, Yi W, Ying C, Ruibin W. Review of attention mechanism in natural language processing. Data Analysis and Knowledge Discovery. 2020;:1-14. DOI: 10.11925/infotech.2096-3467.2019.1317
4. Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. Communications of the ACM. 2012;:84-90. DOI: 10.1145/3065386
5. Redmon J, Divvala S, Girshick R, Farhadi A. You only look once: Unified, real-time object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016:779-788