Towards Automatic Error Analysis of Machine Translation Output-Reference-Cited by-同舟云学术

Towards Automatic Error Analysis of Machine Translation Output

Published:2011-12 Issue:4 Volume:37 Page:657-688
ISSN:0891-2017
Container-title:Computational Linguistics
language:en
Short-container-title:Computational Linguistics

Author:

Popović Maja¹,Ney Hermann¹

Affiliation:

1. RWTH Aachen University

Abstract

Evaluation and error analysis of machine translation output are important but difficult tasks. In this article, we propose a framework for automatic error analysis and classification based on the identification of actual erroneous words using the algorithms for computation of Word Error Rate (WER) and Position-independent word Error Rate (PER), which is just a very first step towards development of automatic evaluation measures that provide more specific information of certain translation problems. The proposed approach enables the use of various types of linguistic knowledge in order to classify translation errors in many different ways. This work focuses on one possible set-up, namely, on five error categories: inflectional errors, errors due to wrong word order, missing words, extra words, and incorrect lexical choices. For each of the categories, we analyze the contribution of various POS classes. We compared the results of automatic error analysis with the results of human error analysis in order to investigate two possible applications: estimating the contribution of each error type in a given translation output in order to identify the main sources of errors for a given translation system, and comparing different translation outputs using the introduced error categories in order to obtain more information about advantages and disadvantages of different systems and possibilites for improvements, as well as about advantages and disadvantages of applied methods for improvements. We used Arabic–English Newswire and Broadcast News and Chinese–English Newswire outputs created in the framework of the GALE project, several Spanish and English European Parliament outputs generated during the TC-Star project, and three German–English outputs generated in the framework of the fourth Machine Translation Workshop. We show that our results correlate very well with the results of a human error analysis, and that all our metrics except the extra words reflect well the differences between different versions of the same translation system as well as the differences between different translation systems.

Publisher

MIT Press - Journals

Subject

Artificial Intelligence,Computer Science Applications,Linguistics and Language,Language and Linguistics

Link

https://www.mitpressjournals.org/doi/pdf/10.1162/COLI_a_00072

Reference3 articles.

1. Hierarchical Phrase-Based Translation

2. Linguistically Annotated Reordering: Evaluation and Analysis

Cited by 33 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Performance and perception: machine translation post-editing in Chinese-English news translation by novice translators;Humanities and Social Sciences Communications;2023-11-09

2. End-to-End page-Level assessment of handwritten text recognition;Pattern Recognition;2023-10

3. Towards a Digital Assessment: Artificial Intelligence Assisted Error Analysis in ESL;Integrated Journal for Research in Arts and Humanities;2023-07-29

4. Examining Manual and Automatic MT Evaluation: A Grey Relational Analysis for Chinese-Portuguese Translation Quality;2023 IEEE 3rd International Conference on Computer Communication and Artificial Intelligence (CCAI);2023-05-26

5. Discover, Explain, Improve: An Automatic Slice Detection Benchmark for Natural Language Processing;Transactions of the Association for Computational Linguistics;2023