1. Gehring, J., Auli, M., Grangier, D., and Dauphin, Y. (2017). “A Convolutional Encoder Model for Neural Machine Translation.” In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 123–135.
2. Liu, L., Utiyama, M., Finch, A., and Sumita, E. (2016). “Neural Machine Translation with Supervised Attention.” In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp. 3093–3102.
3. Ma, C., Tamura, A., Utiyama, M., Zhao, T., and Sumita, E. (2020). “Encoder-Decoder Attention /= Word Alignment: Axiomatic Method of Learning Word Alignments for Neural Machine Translation.” Journal of Natural Language Processing, 27 (3), pp. 531–552.
4. Sutskever, I., Vinyals, O., and Le, Q. V. (2014). “Sequence to Sequence Learning with Neural Networks.” In Advances in Neural Information Processing Systems (27), pp. 3104–3112.
5. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., and Polosukhin, I. (2017). “Attention is All You Need.” In Advances in Neural Information Processing Systems (30), pp. 5998–6008.