Automatic assessment of spoken-language interpreting based on machine-translation evaluation metrics-Reference-Cited by-同舟云学术

Automatic assessment of spoken-language interpreting based on machine-translation evaluation metrics

Published:2022-03-04 Issue:1 Volume:25 Page:109-143
ISSN:1384-6647
Container-title:Interpreting. International Journal of Research and Practice in Interpreting
language:en
Short-container-title:INTP

Author:

Lu Xiaolei¹^ORCID,Han Chao¹^ORCID

Affiliation:

1. Xiamen University

Abstract

Abstract Automated metrics for machine translation (MT) such as BLEU are customarily used because they are quick to compute and sufficiently valid to be useful in MT assessment. Whereas the instantaneity and reliability of such metrics are made possible by automatic computation based on predetermined algorithms, their validity is primarily dependent on a strong correlation with human assessments. Despite the popularity of such metrics in MT, little research has been conducted to explore their usefulness in the automatic assessment of human translation or interpreting. In the present study, we therefore seek to provide an initial insight into the way MT metrics would function in assessing spoken-language interpreting by human interpreters. Specifically, we selected five representative metrics – BLEU, NIST, METEOR, TER and BERT – to evaluate 56 bidirectional consecutive English–Chinese interpretations produced by 28 student interpreters of varying abilities. We correlated the automated metric scores with the scores assigned by different types of raters using different scoring methods (i.e., multiple assessment scenarios). The major finding is that BLEU, NIST, and METEOR had moderate-to-strong correlations with the human-assigned scores across the assessment scenarios, especially for the English-to-Chinese direction. Finally, we discuss the possibility and caveats of using MT metrics in assessing human interpreting.

Publisher

John Benjamins Publishing Company

Subject

Linguistics and Language,Language and Linguistics

Link

http://www.jbe-platform.com/deliver/fulltext/intp.00076.lu.pdf

Reference42 articles.

1. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments;Banerjee,2005

2. Re-evaluating the role of BLEU in machine translation research;Callison-Bruch,2006

3. Prosodic correlates of perceived quality and fluency in simultaneous interpreting

4. Automatische Evaluation der Humanübersetzung: BLEU vs. METEOR

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. COVERT SIMULTANEOUS POST-EDITING IN ONLINE ASSESSMENT OF STUDENTS’ SIGHT TRANSLATION;Advanced Education;2024-08-14

2. Machine-learning based automatic assessment of communication in interpreting;Frontiers in Communication;2023-01-24

3. Can automated machine translation evaluation metrics be used to assess students’ interpretation in the language learning classroom?;Computer Assisted Language Learning;2021-08-28