1. Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: Ghahramani Z, Welling M, Cortes C, Lawrence N, Weinberger KQ (eds) Advances in neural information processing systems, vol 37. Curran Associates, Inc, London
2. Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473
3. Gehring J, Auli M, Grangier D, Dauphin Y (2017) A convolutional encoder model for neural machine translation. In: Proceedings of the 55th annual meeting of the Association for computational linguistics, vol 1, Long Papers, Vancouver, Canada, pp 123–135, https://doi.org/10.18653/v1/P17-1012, https://www.aclweb.org/anthology/P17-1012
4. Sennrich R, Haddow B, Birch A (2016) Neural machine translation of rare words with subword units. In: Proceedings of the 54th annual meeting of the Association for computational linguistics, vol 1, Long Papers, Berlin, Germany, pp 1715–1725, https://doi.org/10.18653/v1/P16-1162, https://www.aclweb.org/anthology/P16-1162
5. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Lu, Polosukhin I (2017) Attention is all you need. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds) Advances in neural information processing systems, Curran Associates, Inc., vol 30, https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf