1. [1] N. Kalchbrenner and P. Blunsom, “Recurrent continuous translation models,” Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, Washington, USA, pp.1700-1709, Association for Computational Linguistics, Oct. 2013.
2. [2] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by jointly learning to align and translate,” 3rd International Conference on Learning Representations, San Diego, CA, USA, Conference Track Proceedings, May 2015.
3. [3] J. Gehring, M. Auli, D. Grangier, D. Yarats, and Y.N. Dauphin, “Convolutional sequence to sequence learning,” Proceedings of the 34th International Conference on Machine Learning, ed. D. Precup and Y.W. Teh, International Convention Centre, Sydney, Australia, pp.1243-1252, Aug. 2017.
4. [4] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, L.u. Kaiser, and I. Polosukhin, “Attention is all you need,” Proceedings of Advances in Neural Information Processing Systems 30, ed. I. Guyon, U.V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, pp.5998-6008, Curran Associates, Inc., Dec. 2017.
5. [5] T. Luong, H. Pham, and C.D. Manning, “Effective approaches to attention-based neural machine translation,” Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, pp.1412-1421, Association for Computational Linguistics, Sept. 2015. 10.18653/v1/d15-1166