Evaluating NMT using the non-inferiority principle

Author:

do Campo Bayón MaríaORCID,Sánchez-Gijón Pilar

Abstract

Abstract The aim of this article is to propose a new neural machine translation (NMT) evaluation method based on the non-inferiority principle. In order to do that, we evaluate raw machine translation (MT) in terms of naturalness, which for this research is defined as not just the lack of fluency errors but also meeting the linguistic expectations of Galician end users when reading original texts in Galician. Our main objective is, in the first place, to validate the new methodology presented in our previous study by evaluating an NMT engine from Spanish into Galician for the social media domain that was retrained with a new Twitter corpus. This new methodology and NMT engine were applied after analyzing the conclusions of a pilot survey conducted among Twitter users to evaluate their perception of tweets translated from Spanish into Galician with our NMT engine created with a corpus of tweets. As in our preliminary study, our aim is to propose a robust quality approximation method based on the reception parameters of end users’ perceptions. This new survey was conducted in December of 2022 with the participation of 228 Galician-speaking Twitter users. Among the main changes proposed are the inclusion of more information about the participant profile, so the non-inferiority principle can be also evaluated according to these parameters; the inclusion of a new typology of tweets, the threads; the provision of context by means of presenting the tweets in their original display as shown in the Twitter app; a change in the number of tweets evaluated and the number of different questionnaires; the change in the distribution of the questionnaires; and the inclusion of an error classification human evaluation conducted by professional linguists to correlate the findings. We will present the steps carried out following the conclusions of the pilot study, describe the new study’s design, analyze the new findings, and present the final conclusions regarding the engine and the evaluation method based on the non-inferiority principle. Finally, we will also provide some examples of the use of this new methodology in the translation industry.

Publisher

Cambridge University Press (CUP)

Reference34 articles.

1. Has Machine Translation Achieved Human Parity? A Case for Document-level Evaluation

2. Castilho, S. , Moorkens, J. , et al. (2017). A comparative quality evaluation of PBSMT and NMT using professional translators. In Proceedings of Machine Translation, Summit XVI: Research Track.

3. do Campo, M. and Sánchez-Gijón, P. (2022). Evaluating NMT: Superior, Inferior, or Equivalent to Texts Originally Written by Humans. In Proceedings of the New Trends in Translation and Technology, Rhodes, July.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3