1. Rumelhart, D.E., Hinton, G.E., Williams, R.J., et al.: Learning representations by back-propagating errors. Nature 323(6088), 696–699 (1988)
2. Bahdanau, D., Cho, K., Bengio, Y., et al.: Neural machine translation by jointly learning to align and translate. In: International Conference on Learning Representations (2015)
3. Luong, M., Pham, H., Manning, C.D., et al.: Effective approaches to attention-based neural machine translation. In: Empirical Methods in Natural Language Processing, pp. 1412–1421 (2015). https://doi.org/10.18653/v1/d15-1166
4. Klein, T., Nabi, M.: Attention is (not) all you need for commonsense reasoning. In: Meeting of the Association for Computational Linguistics, pp. 4831–4836 (2019). https://doi.org/10.18653/v1/p19-1477
5. Tan, Z., Wang, M., Xie, J., et al.: Deep semantic role labeling with self-attention. In: National Conference on Artificial Intelligence, pp. 4929–4936 (2018)