1. LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).
2. Hochreiter, S. & Schmidhuber, J. ürgen. Long short-term memory. Neural Comput. 9, 1735–1780 (1997).
3. Bengio, Y., Ducharme, R. & Vincent, P. A neural probabilistic language model. In Advances in Neural Information Processing Systems. 13 (2000).
4. Vaswani, A. et al. Attention is all you need. In Advances in Neural Information Processing Systems. 30 (2017).
5. Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. BERT: pre-training of deep bidirectional transformers for language understanding. In Proc. the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 1, 4171–4186 (2019).