1. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: Proceedings of the 3rd International Conference on Learning Representations, San Diego, CA, USA (2015)
2. Bangalore, S., Riccardi, G.: Finite-state models for lexical reordering in spoken language translation. In: Proceedings of Sixth International Conference on Spoken Language Processing, Beijing, China (2000)
3. Bisazza, A., Federico, M.: A survey of word reordering in statistical machine translation: computational models and language phenomena. Comput. Linguist. 42(2), 163–205 (2016)
4. Brown, P.F., Pietra, V.J.D., Pietra, S.A.D., Mercer, R.L.: The mathematics of statistical machine translation: parameter estimation. Comput. Linguist. 19(2), 263–311 (1993)
5. Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the NAACL: HLT, Minneapolis, Minnesota (Volume 1: Long and Short Papers), pp. 4171–4186. ACL (2019)