1. Ba, J. L., Kiros, J. R., and Hinton, G. E. (2016). “Layer Normalization.” arXiv preprint arXiv:1607.06450.
2. Banerjee, S. and Lavie, A. (2005). “METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments.” In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, pp. 65–72, Ann Arbor, Michigan. Association for Computational Linguistics.
3. Buckman, J. and Neubig, G. (2018). “Neural Lattice Language Models.” Transactions of the Association for Computational Linguistics, 6, pp. 529–541.
4. Deguchi, H., Utiyama, M., Tamura, A., Ninomiya, T., and Sumita, E. (2020). “Bilingual Subword Segmentation for Neural Machine Translation.” In Proceedings of the 28th International Conference on Computational Linguistics, pp. 4287–4297, Barcelona, Spain (Online). International Committee on Computational Linguistics.
5. Fan, A., Bhosale, S., Schwenk, H., Ma, Z., El-Kishky, A., Goyal, S., Baines, M., Celebi, O., Wenzek, G., Chaudhary, V., Goyal, N., Birch, T., Liptchinsky, V., Edunov, S., Grave, E., Auli, M., and Joulin, A. (2020). “Beyond English-Centric Multilingual Machine Translation.” arXiv preprint arXiv:2010.11125.